英文字母與數字之辨識

Full metadata record

DC Field	Value	Language
dc.contributor.author	鄭泰銘	en_US
dc.contributor.author	CHENG, TAI-MING	en_US
dc.contributor.author	李錫堅	en_US
dc.contributor.author	Hsi-Jian Lee	en_US
dc.date.accessioned	2014-12-12T02:20:20Z	-
dc.date.available	2014-12-12T02:20:20Z	-
dc.date.issued	1998	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#NT870392064	en_US
dc.identifier.uri	http://hdl.handle.net/11536/64087	-
dc.description.abstract	在某些應用領域中, 例如名片辨識, 我們必須在沒有整篇文件資訊的情況下, 辨識一行文字. 這篇論文中, 我們提供了一種方法, 針對任何一行獨立文字, 做出正確的辨識. 此方法主要包括三部份, 即前處理、文字辨識核心和後處理. 首先將一行單行的二值化影像做水平校正, 控制傾斜角度於 0.3 度以內, 然後偵測此行文字是否為斜體字. 在抽取所有的連通元件 (connected components) 之後 , 經過適當的合併與去雜訊處理, 根據先前偵測到的傾斜角度做垂直方向的平移, 然後平滑化. 抽取出來的元件, 由一個「雙核心」架構的核心程式辨識, 視其為斜體或正體而定, 由這兩個核心其中之一做辨識, 並且, 嘗試切割辨識結果較差之元件, 因為某些元件可能包含不止一個字元, 而是多個字元相連而成. 切割的方法是利用搜尋樹的 branch-and-bound 先深 (depth-first) 搜尋. 最後, 元件的垂直位置與字元高度可用來檢查辨識結果. 將一些不可能的字元排除之後, 正確的字元就可以提升到第一名. 此外, 我們提出了一個決定空白字元的方法. 對於某些大小寫外型相同的字元, 我們也可以由其垂直位置與字元高度來判斷其為大寫或小寫. 我們從 107 張英文名片上剪取 646 行的單行文字, 作為測試樣本. 水平校正的正確率為 99.23%; 斜體字判斷的正確率為 100%, 相連文字有 93.18% 被正確地切割出來. 核心方面, 正體與斜體的正確率分別達到了 99.07% 與 98.53%.	zh_TW
dc.description.abstract	In this thesis, we design a procedure for recognizing single text lines. In certain applications, single text lines are to be recognized without any whole-document information. This procedure consists of three parts: pre-processing, character recognition kernel, and post-processing. In the first phase, the skewing angle and italicness of the binarized image of a single text line are detected. After all connected components being extracted and proper combination/deletion, the vertical positions of components are shifted. Images are smoothed then. The components are to be recognized and, if necessary, segmented, using a dual-kernel according yto whether it is an italic text line or a roman one. Touching charcters are segmented using branch-and-bound tree traversal. Finally, vertical position information is used to post-process the recognition results. Some impossibilities are rejected and the correct class is eventually promoted to the first candidate. An approach to determining space characters using the profile is introduced. Characters that have the same shape in capital and lower case are justified according to their heights. In our experiments, we tested 646 text lines cut from English business name cards. The accuracy of skewing-angle detection was 99.23%. The accuracy of italicness detection was 100%. 93.18% of touching characters were correctly segmented. The character recognition rates for correctly segmented or un-touched roman and italic characters were 99.07 and 98.53 respectively.	en_US
dc.language.iso	en_US	en_US
dc.subject	文字辨識	zh_TW
dc.subject	文件分析	zh_TW
dc.subject	英文字母	zh_TW
dc.subject	統計式圖形辨識	zh_TW
dc.subject	雙核心架構	zh_TW
dc.subject	水平校正	zh_TW
dc.subject	連字切割	zh_TW
dc.subject	斜體字偵測	zh_TW
dc.subject	Character Recognition	en_US
dc.subject	Document Analysis	en_US
dc.subject	English Alphabets	en_US
dc.subject	Statistical Pattern Recognition	en_US
dc.subject	Dual-kernel Architecture	en_US
dc.subject	De-skewing	en_US
dc.subject	Touching Character Segmentation	en_US
dc.subject	Detection of Italic Text Lines	en_US
dc.title	英文字母與數字之辨識	zh_TW
dc.title	Character Recognition of English Alphabets and Numerals	en_US
dc.type	Thesis	en_US
dc.contributor.department	資訊科學與工程研究所	zh_TW
Appears in Collections:	Thesis