公文表格之手寫中文字切割

Full metadata record

DC Field	Value	Language
dc.contributor.author	吳志宏	en_US
dc.contributor.author	Wu, Chih-Hung	en_US
dc.contributor.author	李錫堅	en_US
dc.contributor.author	Hsi-Jian Lee	en_US
dc.date.accessioned	2014-12-12T02:17:13Z	-
dc.date.available	2014-12-12T02:17:13Z	-
dc.date.issued	1996	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#NT850392012	en_US
dc.identifier.uri	http://hdl.handle.net/11536/61759	-
dc.description.abstract	本論文在介紹在已知表格中切割手寫中文字的方法. 我們對於輸入的公文影像會先做一些前處理的工作. 這些工作包括有二值化, 去雜訊, 以及欄位內容抽取. 我們對每一個欄位的內容先做投影的分析. 利用對於投影的分析來將一整行的文字分成一個個部份. 接下來我們將上面找出來的投影區塊分成四類, 分別是"mark", "half-word", "one-word", 和"two- word". 我們先將大的投影區塊分割成和該行文字平均大小相近的小區塊. 然後再依據一般寫字的習慣將較小的投影區塊合併. 為了減少由於中文數目字與文字的某個部份間的模糊情形所產生的錯誤, 對於"half-word"區塊, 我們將之送入一個統計式中文辨識系統做辨識. 依據辨識的結果決定是否該區塊需與其它的區塊合併. 除此之外, 系統並且讓使用者可以線上修改切字的結果.在測試的文件影像中共有1319個中文字, 不加入OCR 系統切出率為91.76%. 如果加入OCR系統的抽出率增加為92.34% In this thesis, we introduce a method to segment handwritte Chinese charactersin form documents with know structure. In the first step, we perform some preprocessing operations to input form documents. These operations include binarization proposed by Niblack, noise removal, and text-line extraction. We then use projection profile analysis method to segment a text-line image to individual subimages. We classify the projection blocks found in previous step to four types, "mark", "half-word", "one-word", and "two-word". Then, we split large projection blocks to two or more blocks with heights close to the average character height in a text-line. We merge projection blocks that are small with some rules. In order to reduce the errors generated from the ambiguties between Chinese numeric characters and a component of a Chinese character, we introduce the OCR system to our character segmentation process. A "half-word" block is sent to a statistic character recognition module. We aAccording to the recognition result, we decide whether to merge it with other projection blocks. The system we propose a lso let users edit the segmentation results manually. There are totally 1319 Chinese characters in the test samples. The correct segmentation rate without OCR is 91.76%. The correct segmentation rate is increased to 92.34% with the help of the OCR system.	zh_TW
dc.language.iso	zh_TW	en_US
dc.subject	手寫中文字切割	zh_TW
dc.subject	去雜訊	zh_TW
dc.subject	投影	zh_TW
dc.subject	切字程序修改	zh_TW
dc.subject	Handwritten Chinese Character Segmentation	en_US
dc.subject	Noise Removal	en_US
dc.subject	Projection	en_US
dc.subject	Character Segmentation Revision	en_US
dc.title	公文表格之手寫中文字切割	zh_TW
dc.title	Chinese Handwritten Character Segmentation in Form Documents	en_US
dc.type	Thesis	en_US
dc.contributor.department	資訊科學與工程研究所	zh_TW
Appears in Collections:	Thesis