Full metadata record
DC FieldValueLanguage
dc.contributor.author陳貴青en_US
dc.contributor.authorGuey-Ching Chenen_US
dc.contributor.author林錫寬en_US
dc.contributor.authorShir-Kuan Linen_US
dc.date.accessioned2014-12-12T02:26:26Z-
dc.date.available2014-12-12T02:26:26Z-
dc.date.issued2000en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#NT890591016en_US
dc.identifier.urihttp://hdl.handle.net/11536/67782-
dc.description.abstract  圖文分割是文件分析中一個相當重要的過程,好的圖文分割演算法可以使一份數位文件中的圖形與文字分隔結果更加正確,或者是能節省下更多的時間。而依欲分割的數位文件色彩特性,可以區分成單色文件及彩色文件圖文分割兩大類,一般來說由於彩色文件擁有不確定的背景、圖形及文字色彩,有的甚至有圖文的交疊作用,因此在處理上將比單色文件的圖文分割來的複雜。
本論文利用七個處理程序,針對彩色數位文件做圖文分割,對圖文交疊及字串傾斜的情況亦可有效處理,此七個步驟依序為:1.以色彩群聚(Color Clustering)將相近的色彩聚成數個標準類別;2.以邊緣檢測的結果標示區塊;3.以區域成長法則補償小區塊;4.以聚類分析做色彩分類;
5.以Run Length Smoothing做區塊結合;6.利用過濾條件擷取文字字;7.以投影法校正傾斜的文字字串。
本論文將以 Borland C++ Builder 程式語言建立圖文分割演算法及使用者操作介面,配合個人電腦及掃瞄器來取得欲處理的彩色數位文件,以這些彩色數位文件來驗證本處理方法的可行性,最後則將以市售OCR軟體對處理前後的結果做比較與討論。
zh_TW
dc.description.abstractSegmentation of pictures and texts is an important phase of document analysis,a good algorithm can make the result correcter or reduce processing time. According to the feature of colour information of digital documents, this task can be classified into two types: monochrome documents segmentation and color documents segmentation. Commonly, the components
(text, picture, background)in color documents have uncertain colour, sometimes text string is embedded in color images. Because of these reasons, it is much more difficult to separate text from color documents than monochrome documents.
We present a text segmentation scheme, using seven phases to deal with digital colour documents. This scheme is also useful for complicated documents, for example, text is embedded in color images or text string is skew. The seven phases are: 1.color clustering: classify image color according to several standard color; 2.detect edge and label block: use the result of edge detection to label block; 3.region growing: use the region growing rule to compensate small blocks; 4.color classification: classify the block according to color; 5.run length smoothing: merge the near block; 6.filter: extrace the text block; 7.profile projection: correct the skew text string
.
We uses Borland C++ Builder Language to accomplish the user interface and algorithm, the digital color documents are gotten by scanner. We use the OCR software to recognize our experimental results. Finally, we aim at the results to discuss.
en_US
dc.language.isozh_TWen_US
dc.subject圖文分割zh_TW
dc.subject圖文分離zh_TW
dc.subject文件分析zh_TW
dc.subject字串擷取zh_TW
dc.subject文字分割zh_TW
dc.subjectsegmentationen_US
dc.subjectdocument analysisen_US
dc.subjecttext extractionen_US
dc.subjecttext segmentationen_US
dc.title彩色混合模式封面文字字串之分割zh_TW
dc.titleText string segmentation from colored mixed-mode coversen_US
dc.typeThesisen_US
dc.contributor.department電控工程研究所zh_TW
Appears in Collections:Thesis