完整後設資料紀錄
DC 欄位語言
dc.contributor.author李盛弘en_US
dc.contributor.authorLee, Shan-Hungen_US
dc.contributor.author李錫堅en_US
dc.contributor.authorLee Hsi-Jianen_US
dc.date.accessioned2014-12-12T02:18:37Z-
dc.date.available2014-12-12T02:18:37Z-
dc.date.issued1997en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#NT860392029en_US
dc.identifier.urihttp://hdl.handle.net/11536/62759-
dc.description.abstract摘要 本論文提出一套中文名片辨識系統. 在第一階段, 我們對於輸入 的名片影像先做一些前處理的工作. 這些工作包括二值化, 相連元件抽 取, 區域二值化, 以及污點去除. 然後我們依序抽取欄位, 在欄位抽取之 後, 我們根據字元的寬高比(width-to-height ratio)和字元間距來作字 元切割. 在橫式名片中, 我們將字元分成兩類: 半字(half-character)和 全字(full-character). 我們將連續的兩個或三個半字根據其大小和間距 來合併. 在直式的名片中, 我們將字元依據形狀分成六類, 再依據事先定 義的規則合併連續的字元. 在字元抽取出來之後, 我們將字元送到以 統計式為基礎的多字型字元辨識模組去處理.這個辨識模組包含多字型中 文字元辨識器, 單一字型中文字元辨識器, 以及英文數字及標點符號辨識 器. 中文辨識模組採用分群(clustering)來縮短辨識所需的時間. 最 後, 我們再根據名欄位的特性和關鍵字元來判別欄位. 使用者可利用滑鼠 來修正字元切割和選擇正確的候選字. 然後使用者可將辨識結果存入資料 庫, 使用者可將相同屬性的名片存放於同一目錄並且可作查詢的動作. 我們測試了33張橫式名片和29張直式名片. 橫式名片的字元抽出率( extraction rate)和正確率(accuracy rate)分別為95.79%及93.44%. 直 式名片則分別為96.10%及93.95%.多字型中文字的辨識率為91.68%. 橫式 名片和直式名片的欄位判斷正確率分別為81.54%和86.62%. Abstract In this thesis, we design an automatic understanding system for Chinesebusiness cards. In the first step, we perform preprocessing operation in cardimages. These operations include binarization, connected-component extraction,local thresholding and noise deletion. Then we group each item line by line.After item grouping, we perform character segmentation according to width-to-height ratio and gaps. In horizontal segmentation, we classify the charactercomponents in each group into two categories: half-characters and full-characters.We merge consecutive two or three half-characters according to size and gapsamong them. In vertical segmentation, we classify the character components into six categories according to the shape of character components. We merge consectivecharacter components that satisfy the criteria we define on the components. After character extraction, we send characters to the statistical multi-fontrecognition module. The recognition modules include mixed-font Chinese recognizer,specifif-font recognizer and alphanumeric letters and punctuation marks recognizer.Chinese character recognizer applies the clustering operation first to shortenthe recognition time. Finally, we identify item attributes according to characteristics and key-characters. A user can edit segmentation results using the mouse or select correct character candidates in a menu activated by clicking the right mousebutton. The user can save recognition results in the database and can manage thedatabase easily by creating a directory to hold cards with similar attributesand query information in the database. In our experiments, we test 33 horizontal cards and 29 certical ones. Thecharacter extraction rate and accuracy rate of our system are 95.79% and 93.44%for horizontal cards and 96.10% and 93.95% for vertical ones, respectively.Character recognition rate is 91.68% for multi-font character recognition withclustering. The accuracy rate for item identification of horizontal cards and vertical cards are 91.54% and 86.62%, respectively.zh_TW
dc.language.isozh_TWen_US
dc.subject字元切割zh_TW
dc.subject多字型字元辨識zh_TW
dc.subject資料庫zh_TW
dc.subjectSegmentationen_US
dc.subjectMulti-font character recognitionen_US
dc.subjectDatabaseen_US
dc.title中文名片辨識系統之設計zh_TW
dc.titleDesign of a Chinese Business Card Understanding Systemen_US
dc.typeThesisen_US
dc.contributor.department資訊科學與工程研究所zh_TW
顯示於類別:畢業論文