Title: 表格公文處理系統之設計
Design of a Form Document Processing System
Authors: 賴青志
Lai, Ching-Zhi
李錫堅
Hsi-Jian Lee
資訊科學與工程研究所
Keywords: 表格公文;文件分析;文字辨識;文字切割;表格辨識;文件認知;Form Document;Document Analysis;Character Recognition;Character Segmentation;Form Type Recognition;Document Understanding
Issue Date: 1997
Abstract: 在這篇論文中,我們提出了一個更有效率的方法來處理表格公文。在學習 階段中,分析輸入的表格以獲得實體及邏輯資訊來建構文件的樣本。在辨 識階段中,系統從待辨識的表格文件中抽取出欄位,並用以辨識文件的類 別,然後從該類別的樣本中獲取已知的正確資訊。我們也提出一個合併了 相連元件法(Connected-Component-BasedMethod)與投影法(Projection- Profile-Based Method)的文字切割方法(Character Segmentation Method) 以獲得更好的文字切割結果。為了提升文字的辨識率,我們從測 試的文件中建立了常用詞字典,並以之來更正部份文字辨識所產生的錯誤 。在我們的實驗中,文件類別的辨識率為100%,文字的切割率為96.3%, 而文字的辨識率為96.6%。我們系統的效率由此證實。 In this thesis we present a system to process form documents more efficiently.In the learning phase, the input document is analyzed to obtain physical and logicalinformation to construct the document template. In the recognition phase, the system uses the extracted fields of an unknown document to determine the document type and then obtains field knowledge from the document template. We also present a character segmentation method which combines the connected-component-based andthe projection- profile-based methods to segment those characters in form documents more efficiently. To increase the recognition rate of our system, we correct somewrong character recognition results by using the frequently-used words we collected from test documents. In our experiments, the accurate rate of document typedetermination is 100%, the character segmentation rate is 96.3% and the characterrecognition rate is 96.6%. This proves the effectiveness of the proposed system.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT860392028
http://hdl.handle.net/11536/62758
Appears in Collections:Thesis