標題: 公文表格之手寫中文字切割
Chinese Handwritten Character Segmentation in Form Documents
作者: 吳志宏
Wu, Chih-Hung
李錫堅
Hsi-Jian Lee
資訊科學與工程研究所
關鍵字: 手寫中文字切割;去雜訊;投影;切字程序修改;Handwritten Chinese Character Segmentation;Noise Removal;Projection;Character Segmentation Revision
公開日期: 1996
摘要: 本論文在介紹在已知表格中切割手寫中文字的方法. 我們對於輸入的公文 影像會先做一些前處理的工作. 這些工作包括有二值化, 去雜訊, 以及欄 位內容抽取. 我們對每一個欄位的內容先做投影的分析. 利用對於投影的 分析來將一整行的文字分成一個個部份. 接下來我們將上面找出來的投影 區塊分成四類, 分別是"mark", "half-word", "one-word", 和"two- word". 我們先將大的投影區塊分割成和該行文字平均大小相近的小區塊. 然後再依據一般寫字的習慣將較小的投影區塊合併. 為了減少由於中文數 目字與文字的某個部份間的模糊情形所產生的錯誤, 對於"half-word"區 塊, 我們將之送入一個統計式中文辨識系統做辨識. 依據辨識的結果決定 是否該區塊需與其它的區塊合併. 除此之外, 系統並且讓使用者可以線上 修改切字的結果.在測試的文件影像中共有1319個中文字, 不加入OCR 系 統切出率為91.76%. 如果加入OCR系統的抽出率增加為92.34% In this thesis, we introduce a method to segment handwritte Chinese charactersin form documents with know structure. In the first step, we perform some preprocessing operations to input form documents. These operations include binarization proposed by Niblack, noise removal, and text-line extraction. We then use projection profile analysis method to segment a text-line image to individual subimages. We classify the projection blocks found in previous step to four types, "mark", "half-word", "one-word", and "two-word". Then, we split large projection blocks to two or more blocks with heights close to the average character height in a text-line. We merge projection blocks that are small with some rules. In order to reduce the errors generated from the ambiguties between Chinese numeric characters and a component of a Chinese character, we introduce the OCR system to our character segmentation process. A "half-word" block is sent to a statistic character recognition module. We aAccording to the recognition result, we decide whether to merge it with other projection blocks. The system we propose a lso let users edit the segmentation results manually. There are totally 1319 Chinese characters in the test samples. The correct segmentation rate without OCR is 91.76%. The correct segmentation rate is increased to 92.34% with the help of the OCR system.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT850392012
http://hdl.handle.net/11536/61759
顯示於類別:畢業論文