中文名片辨識系統之設計

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.author	李盛弘	en_US
dc.contributor.author	Lee, Shan-Hung	en_US
dc.contributor.author	李錫堅	en_US
dc.contributor.author	Lee Hsi-Jian	en_US
dc.date.accessioned	2014-12-12T02:18:37Z	-
dc.date.available	2014-12-12T02:18:37Z	-
dc.date.issued	1997	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#NT860392029	en_US
dc.identifier.uri	http://hdl.handle.net/11536/62759	-
dc.description.abstract	摘要本論文提出一套中文名片辨識系統. 在第一階段, 我們對於輸入的名片影像先做一些前處理的工作. 這些工作包括二值化, 相連元件抽取, 區域二值化, 以及污點去除. 然後我們依序抽取欄位, 在欄位抽取之後, 我們根據字元的寬高比(width-to-height ratio)和字元間距來作字元切割. 在橫式名片中, 我們將字元分成兩類: 半字(half-character)和全字(full-character). 我們將連續的兩個或三個半字根據其大小和間距來合併. 在直式的名片中, 我們將字元依據形狀分成六類, 再依據事先定義的規則合併連續的字元. 在字元抽取出來之後, 我們將字元送到以統計式為基礎的多字型字元辨識模組去處理.這個辨識模組包含多字型中文字元辨識器, 單一字型中文字元辨識器, 以及英文數字及標點符號辨識器. 中文辨識模組採用分群(clustering)來縮短辨識所需的時間. 最後, 我們再根據名欄位的特性和關鍵字元來判別欄位. 使用者可利用滑鼠來修正字元切割和選擇正確的候選字. 然後使用者可將辨識結果存入資料庫, 使用者可將相同屬性的名片存放於同一目錄並且可作查詢的動作. 我們測試了33張橫式名片和29張直式名片. 橫式名片的字元抽出率( extraction rate)和正確率(accuracy rate)分別為95.79%及93.44%. 直式名片則分別為96.10%及93.95%.多字型中文字的辨識率為91.68%. 橫式名片和直式名片的欄位判斷正確率分別為81.54%和86.62%. Abstract In this thesis, we design an automatic understanding system for Chinesebusiness cards. In the first step, we perform preprocessing operation in cardimages. These operations include binarization, connected-component extraction,local thresholding and noise deletion. Then we group each item line by line.After item grouping, we perform character segmentation according to width-to-height ratio and gaps. In horizontal segmentation, we classify the charactercomponents in each group into two categories: half-characters and full-characters.We merge consecutive two or three half-characters according to size and gapsamong them. In vertical segmentation, we classify the character components into six categories according to the shape of character components. We merge consectivecharacter components that satisfy the criteria we define on the components. After character extraction, we send characters to the statistical multi-fontrecognition module. The recognition modules include mixed-font Chinese recognizer,specifif-font recognizer and alphanumeric letters and punctuation marks recognizer.Chinese character recognizer applies the clustering operation first to shortenthe recognition time. Finally, we identify item attributes according to characteristics and key-characters. A user can edit segmentation results using the mouse or select correct character candidates in a menu activated by clicking the right mousebutton. The user can save recognition results in the database and can manage thedatabase easily by creating a directory to hold cards with similar attributesand query information in the database. In our experiments, we test 33 horizontal cards and 29 certical ones. Thecharacter extraction rate and accuracy rate of our system are 95.79% and 93.44%for horizontal cards and 96.10% and 93.95% for vertical ones, respectively.Character recognition rate is 91.68% for multi-font character recognition withclustering. The accuracy rate for item identification of horizontal cards and vertical cards are 91.54% and 86.62%, respectively.	zh_TW
dc.language.iso	zh_TW	en_US
dc.subject	字元切割	zh_TW
dc.subject	多字型字元辨識	zh_TW
dc.subject	資料庫	zh_TW
dc.subject	Segmentation	en_US
dc.subject	Multi-font character recognition	en_US
dc.subject	Database	en_US
dc.title	中文名片辨識系統之設計	zh_TW
dc.title	Design of a Chinese Business Card Understanding System	en_US
dc.type	Thesis	en_US
dc.contributor.department	資訊科學與工程研究所	zh_TW
顯示於類別：	畢業論文