中文雜誌內對中英文字與圖混合之切字

Full metadata record

DC Field	Value	Language
dc.contributor.author	鄭紹余	en_US
dc.contributor.author	Shau-Yu Cheng	en_US
dc.contributor.author	李錫堅	en_US
dc.contributor.author	Hsi-Jian Lee	en_US
dc.date.accessioned	2014-12-12T02:20:17Z	-
dc.date.available	2014-12-12T02:20:17Z	-
dc.date.issued	1998	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#NT870392024	en_US
dc.identifier.uri	http://hdl.handle.net/11536/64044	-
dc.description.abstract	一般的文件處理系統包含兩個部份:文字切割與文字辨識。在本論文提出了一個有效率之文字切割系統。這個系統含有兩個模組: 文件分析與文字切割。在文件分析部份，我們先進行縮圖與抽取連通元件(Connected-Components) ，接著將連通元件分為圖形或文字元件。在抽取出文件上之文字元件後，我們將文字元件合併成文字區塊，並檢查圖元件內是否有文字元件。若有，則抽取出來並合併至文字區塊中。最後，對所有的文字區塊切割出一行行之文字。當區塊的文字行被切開後，針對每個文字區塊，我們先檢查區塊中是否有首字放大情形。若有，則抽取之。最後，我們針對每個文字行執行文字切割以切出中文､英文與數字。在我們的實驗中，文字切割的正確率約98.9% ，對於一份內含1158個的文件所需時間為5秒。由此證明了我們系統的效率。	zh_TW
dc.description.abstract	A general document processing system usually includes two major modules: character segmentation module and character recognition module. In this thesis, we present an automatic system to segment characters efficiently. Our character segmentation system contains two modules: document layout analysis and character segmentation. In the document layout analysis module, we first perform image reduction and connected-components extraction. In the component classification procedure, the connected-components be classified as image components or text components. In the block segmentation procedure, we merge all text components into text blocks . The extraction of text components from image components can group all text components into text blocks. Finally, we perform text line segmentation to segment all text lines in the text blocks. After all text lines have been segmented, we found and extracted the initial caps if they exist in the text blocks. Finally we segment the Chinese characters, English letters and numerals in the character segmentation module. In our experiment, the character segmentation rate of our system is about 98.9% and the processing time is about 5 seconds per page with 1158 characters. This proves the effectiveness of our proposed system.	en_US
dc.language.iso	en_US	en_US
dc.subject	切字	zh_TW
dc.subject	character segmentation	en_US
dc.title	中文雜誌內對中英文字與圖混合之切字	zh_TW
dc.title	Character Segmentation in Chinese Magazines with Mixed Alphabets, Numerals and Figures	en_US
dc.type	Thesis	en_US
dc.contributor.department	資訊科學與工程研究所	zh_TW
Appears in Collections:	Thesis