標題: 以影像分析技術作表格辨認與重建
Recognition and Reconstruction of Data Forms by Image Analysis Techniques
作者: 曾煥恩
Tseng, Huan-En
蔡文祥
Wen-Hsiang Tsai
資訊科學與工程研究所
關鍵字: 表格;辨認;重建;Data Form;Recognition;Reconstruction
公開日期: 1997
摘要: 本論文發展了一套表格處理系統。此系統主要可分為兩個部份,一個 是對空白表格的分析與瞭解;另一個則是已填表格的處理,包含還原成空 白表格以及表格重建。對一張空白表格,我們分析該表格內的框線、字元 以及填寫區域的所在位置,同時對表格內的欄位關係作瞭解,並將這些結 果儲存在表格資料庫中。利用這些結果,我們也提供一些人工介面,方便 使用者利用電腦填寫表格。對一張已填表格,我們首先判斷在表格資料庫 中是否有存在該表格的空白版本,亦即作表格辨認。在此,表格辨認的依 據並不採用表格的框線結構,而是利用表格內之重要的水平及垂直線段以 及表格的長和寬。另外,我們也提出了一個線段偵測法則可以快速地偵測 表格內有文字雜訊干擾的重要線段。若表格辨認結果為無,我們可以直接 將該表格還原成空白表格,然後再分析與瞭解該空白表格。若有,我們則 可以將資料庫中的空白表格套入該已填表格內,將手寫字體抽取出來,並 由使用者將這些手寫字體重新鍵入,最後我們將此已填表格重建並儲存在 資料庫中。如此一來,我們同時達到了表格的美化、數位化與壓縮。在這 個系統中,核心技術為表格辨認。藉由實驗的結果,可以證明我們提出的 系統與方法是可行的。 A form processing system, which consists of two major functions, is developed. One is understanding of blank data forms while the other one is processing of filled-in data forms, which can be divided into recovery of blanks forms from filled- in forms and reconstruction of filled-in data forms. When processing blank data forms, frames, printed characters, and fields to fill in are extracted and the relationships among fields are detected. The results are saved in a form database. A form filling module is also developed for filling form by computers conveniently. For filled-in forms, the first thing is form recognition. In this study, form recognition is based on the significant vertical and horizontal lines, instead of the layout structures of forms. Moreover, the width and height of forms are also used for form recognition. A modified Hough transform method is proposed, which can detect significant lines from noisy forms quickly. If the result of recognition says that there exists no blank data form corresponding to an input filled-in form in the form database, the filled-in form is converted into a blank form. The resulting blank data form is analyzed and understood next. Otherwise, a form registration method is performed in order to find out all the handwritten characters in filled-in fields. An interface also is provided to type in the handwritten characters. Finally, the filled-in form can be reconstructed and stored in database. At this moment, digitization, layout enhancement, and compression of filled-in forms are achieved. Experimental results show the feasibility and practicability of the proposed approaches.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT860394087
http://hdl.handle.net/11536/62921
顯示於類別:畢業論文