Locating tables in scanned documents for reconstructing and republishing

Jahan, M.A.C. Akmal; Ragel, Roshan G.

Please use this identifier to cite or link to this item: http://ir.lib.seu.ac.lk/handle/123456789/3126

Full metadata record

DC Field	Value	Language
dc.contributor.author	Jahan, M.A.C. Akmal	-
dc.contributor.author	Ragel, Roshan G.	-
dc.date.accessioned	2018-09-11T04:40:05Z	-
dc.date.available	2018-09-11T04:40:05Z	-
dc.date.issued	2014-12-22	-
dc.identifier.citation	7th International Conference on "Information and Automation for Sustainability". 22nd-24th Dec, 2014. Colombo, Sri Lanka.	en_US
dc.identifier.issn	2151-1802	-
dc.identifier.uri	http://ir.lib.seu.ac.lk/handle/123456789/3126	-
dc.identifier.uri	https://doi.org/10.1109/ICIAFS.2014.7069552	-
dc.description.abstract	Pool of knowledge available to the mankind depends on the source of learning resources, which can vary from ancient printed documents to present electronic material. The rapid conversion of material available in traditional libraries to digital form needs a significant amount of work if we are to maintain the format and the look of the electronic documents as same as their printed counterparts. Most of the printed documents contain not only characters and its formatting but also some associated non text objects such as tables, charts and graphical objects. It is challenging to detect them and to concentrate on the format preservation of the contents while reproducing them. To address this issue, we propose an algorithm using local thresholds for word space and line height to locate and extract all categories of tables from scanned document images. From the experiments performed on 298 documents, we conclude that our algorithm has an overall accuracy of about 75% in detecting tables from the scanned document images. Since the algorithm does not completely depend on rule lines, it can detect all categories of tables in a range of scanned documents with different font types, styles and sizes to extract their formatting features. Moreover, the algorithm can be applied to locate tables in multi column layouts with small modification in layout analysis. Treating tables with their existing formatting features will tremendously help the reproducing of printed documents for reprinting and updating purposes.	en_US
dc.language.iso	en_US	en_US
dc.publisher	IEEE	en_US
dc.subject	OCR-optical character recognition	en_US
dc.subject	Table detection	en_US
dc.subject	Format preservation	en_US
dc.title	Locating tables in scanned documents for reconstructing and republishing	en_US
dc.type	Article	en_US
Appears in Collections:	Research Articles

Files in This Item:

There are no files associated with this item.

Show simple item record