完整後設資料紀錄
DC 欄位語言
dc.contributor.author陳嘉亨en_US
dc.contributor.authorChia-Heng Chenen_US
dc.contributor.author蔡文祥en_US
dc.contributor.authorWen-Hsiang Tsaien_US
dc.date.accessioned2014-12-12T02:27:51Z-
dc.date.available2014-12-12T02:27:51Z-
dc.date.issued2001en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#NT900394070en_US
dc.identifier.urihttp://hdl.handle.net/11536/68597-
dc.description.abstract利用影像分析及文字辨識的技巧,我們提出一個可以自動建構電子書的方法。文字辨識的主要工作,是希望能辨識多種類的文字。此方法不需要使用文字影像資料來學習,而是直接使用系統字型來當做參考文字。在我們的方法中,有四個階段:文字型別的分類、文字辨識、書頁版面的分析,以及電子書的建構展示。在文字型別的分類階段,我們處理四種文字型別,第一種型別是標題中的中文字,而其餘三種型別則為文章中的中文字、英數文字和標點符號。我們利用決策樹提出一個對文字型別作分類的方法。在文字辨識的階段中,首先我們提出一個不需學習參考資料而直接使用系統字型資料的方法。接著,針對文章中的中文字,我們提出一個利用決策樹及樣板相配來辨識印刷中文字的方法。而針對標題中的中文字、文章中的英數字和標點符號,我們也提出一個主要是利用樣板相配的辨識方法。在這些方法中,成對的影像組成成分廣泛地被利用來協助文字辨識的工作。在書頁版面的分析階段中,我們利用矩量保持二值化及區塊生長技術,從影像內容中取得所有的連接小塊。針對書頁影像中不同的組成成分,我們使用不同的壓縮技術來壓縮它們,以改善整體的壓縮率。良好的實驗結果,顯示了我們所提出方法的可行性。zh_TW
dc.description.abstractBased on image analysis and character recognition techniques, a system for digitizing a printed book automatically into a digital version is proposed. In the major work of character recognition, multi-type characters can be recognized. And no character image data need be used for learning; the system fonts are used as the reference characters directly. There exist four phases in the proposed system processes: character type classification, character recognition, page layout analysis, and digital book construction and display. In the phase of character type classification, four types of characters are dealt with, including Chinese characters in titles, and Chinese, alphanumerical, and punctuation characters in texts. A decision-tree method for classifying these character types is proposed. In the phase of character recognition, a method, which uses directly system font data without reference data learning, is proposed first. For printed Chinese characters in texts, a method to recognize them based on decision trees and template matching is proposed next. And for the other miscellaneous types of characters including Chinese characters in titles, and alphanumerical characters and punctuation characters in texts, a method based mainly on template matching is also proposed to recognize them. In these methods, pairs of image components are used extensively to help the recognition work. In the phase of page layout analysis, all the connected components are segmented out of image contents effectively using moment-preserving thresholding and region-growing techniques. Then, different compression techniques are utilized to reduce the data volumes of different components in the page images to improve the overall compression ratio for the resulting digital book. Good experimental results reveal the feasibility of the proposed methods.en_US
dc.language.isoen_USen_US
dc.subject多種類文字辨識zh_TW
dc.subject文字型別分類zh_TW
dc.subject文字辨識zh_TW
dc.subject系統字型資料zh_TW
dc.subject決策樹zh_TW
dc.subject樣板相配zh_TW
dc.subject電子書zh_TW
dc.subjectmulti-type character recognitionen_US
dc.subjectcharacter type classificationen_US
dc.subjectoptical character recognitionen_US
dc.subjectsystem font dataen_US
dc.subjectdecision treeen_US
dc.subjecttemplate matchingen_US
dc.subjectdigital booken_US
dc.title利用決策樹方法及直接使用系統字型資料作多種類文字辨識及電子書自動建構zh_TW
dc.titleMulti-Class Character Recognition by Decision-Tree Approaches and Direct Use of System Font Data for Automatic Digital Book Constructionen_US
dc.typeThesisen_US
dc.contributor.department資訊科學與工程研究所zh_TW
顯示於類別:畢業論文