標題: A multi-plane approach for text segmentation of complex document images
作者: Chen, Yen-Lin
Wu, Bing-Fei
電控工程研究所
Institute of Electrical and Control Engineering
關鍵字: Document image processing;Text extraction;Image segmentation;Multilevel thresholding;Region segmentation;Complex document images
公開日期: 1-七月-2009
摘要: This study presents a new method, namely the multiplane segmentation approach, for segmenting and extracting textual objects from various real-life complex document images. The proposed multi-plane segmentation approach first decomposes the document image into distinct object planes to extract and separate homogeneous objects including textual regions of interest, non-text objects such as graphics and pictures, and background textures. This process consists of two stages-localized histogram multilevel thresholding and multi-plane region matching and assembling. Then a text extraction procedure is applied Oil the resultant planes to detect and extract textual objects with different characteristics in the respective planes. The proposed approach processes document images regionally and adaptively according to their respective local features. Hence detailed characteristics of the extracted textual objects, Particularly small characters with thin strokes, as well as gradational illuminations of characters, can be well-preserved. Moreover, this way also allows background objects with uneven, gradational, and sharp variations in contrast, illumination, and texture to be handled easily and well. Experimental results on real-life complex document images demonstrate that the proposed approach is effective in extracting textual objects with Various illuminations, sizes, and font styles from various types of complex document images. (C) 2008 Elsevier Ltd. All rights reserved.
URI: http://dx.doi.org/10.1016/j.patcog.2008.10.032
http://hdl.handle.net/11536/7018
ISSN: 0031-3203
DOI: 10.1016/j.patcog.2008.10.032
期刊: PATTERN RECOGNITION
Volume: 42
Issue: 7
起始頁: 1419
結束頁: 1444
顯示於類別:期刊論文


文件中的檔案:

  1. 000265365500020.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。