標題: | 對手填文件影像之內容特徵作抽取,分類與應用 Extraction, Classification, and Utilization of Content Features in Written Document Images |
作者: | 陳宥光 Yow-Kuang Chen 蔡文祥 Dr. Wen-Hsiang Tsai 資訊科學與工程研究所 |
關鍵字: | 文件影像;抽取;分類;應用;區塊大小分類法;群聚;特徵;Document Image;Extraction;Classification;Utilization;Area Thresholding;Grouping;Features |
公開日期: | 1998 |
摘要: | 本論文發展了一套對手填文件影像之內容特徵作抽取、分類與應用的系統。此系統主要可以分為三個主要步驟,第一個步驟是抽取手填文件影像之內容特徵,第二個步驟是將抽取之後的影像元件做分類,最後一步驟是對分類之後的結果作重現與儲存。在第一個步驟中,可再分為兩大類,即內容增強與元件抽取。在內容增強的步驟中,我們提出了幾種遮罩來修補表格框線抽取過程中所造成的破碎字。此外,我們也提出了兩種雜訊消除的方法,一種是消除因框線去除所造成的雜訊;另一種是消除因掃瞄影像所造成的雜訊。在元件抽取的步驟中,我們用尋找連接元件的方法,和幾項我們提出的合併與切割區塊的技巧,將影像中的內容元件完整的抽取出來。接著,我們提出了五種對內容元件分類的方法,可將元件分成預先印刷元件、手寫填入元件或是圖形元件並且可以辨認出印刷的分號。最後,我們對分類後的結果作表格重建與儲存。如此一來,我們同時達到了表格美化、數位化與壓縮的效果。良好實驗的結果證明了我們所提出的方法是可行的。 In this study, we propose a system for extraction, classification, and utilization of content features in written document images. This system process consists of three major stages. The first stage is the extraction of content features in written document images; the second stage is the classification of the components that are produced from the first stage; the last stage is the utilization of classification results, which include reconstruction and record. There are two chief steps in the first stage: content enhancement and component extraction. In content enhancement, we propose a masking method to restore broken strokes, which are produced by frame structure removal. Additionally, two noise reduction methods are proposed. One is used to reduce the noise resulting from the frame structure removal, and the other is used to reduce the noise resulting from the scanning operation. In component extraction, we propose several merging and splitting methods and use the region growing method to extract components from the image completely. Then five classification methods are proposed to classify components into three types: preprinted, handwritten, and graphic and to recognize printed colons. Finally, we finish the form reconstruction work and store the classified result. At this moment, digitization, layout enhancement and compression of written documents are also achieved. Experimental results show the feasibility and practicability of the proposed approaches. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#NT870394011 http://hdl.handle.net/11536/64149 |
Appears in Collections: | Thesis |