Title: 表格公文處理系統之設計
Design of a Form Document Processing System
Authors: 賴青志
Lai, Ching-Zhi
李錫堅
Hsi-Jian Lee
資訊科學與工程研究所
Keywords: 表格公文;文件分析;文字辨識;文字切割;表格辨識;文件認知;Form Document;Document Analysis;Character Recognition;Character Segmentation;Form Type Recognition;Document Understanding
Issue Date: 1997
Abstract: 在這篇論文中,我們提出了一個更有效率的方法來處理表格公文。在學習
階段中,分析輸入的表格以獲得實體及邏輯資訊來建構文件的樣本。在辨
識階段中,系統從待辨識的表格文件中抽取出欄位,並用以辨識文件的類
別,然後從該類別的樣本中獲取已知的正確資訊。我們也提出一個合併了
相連元件法(Connected-Component-BasedMethod)與投影法(Projection-
Profile-Based Method)的文字切割方法(Character Segmentation
Method) 以獲得更好的文字切割結果。為了提升文字的辨識率,我們從測
試的文件中建立了常用詞字典,並以之來更正部份文字辨識所產生的錯誤
。在我們的實驗中,文件類別的辨識率為100%,文字的切割率為96.3%,
而文字的辨識率為96.6%。我們系統的效率由此證實。
In this thesis we present a system to process form documents
more efficiently.In the learning phase, the input document is
analyzed to obtain physical and logicalinformation to construct
the document template. In the recognition phase, the system uses
the extracted fields of an unknown document to determine the
document type and then obtains field knowledge from the document
template. We also present a character segmentation method which
combines the connected-component-based andthe projection-
profile-based methods to segment those characters in form
documents more efficiently. To increase the recognition rate of
our system, we correct somewrong character recognition results
by using the frequently-used words we collected from test
documents. In our experiments, the accurate rate of document
typedetermination is 100%, the character segmentation rate is
96.3% and the characterrecognition rate is 96.6%. This proves
the effectiveness of the proposed system.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT860392028
http://hdl.handle.net/11536/62758
Appears in Collections:Thesis