標題: 中文報紙影像中之文章分析與文字切割
Article Analysis and Title Character Segmentation in Chinese Newspaper
作者: 尹守綸
Shou-Lun Yin
李錫堅
Hsi-Jian Lee
資訊科學與工程研究所
關鍵字: 報紙;文字切割;文章分析;讀序;圖文分離;newspaper;character segmentation;article analysis;reading order;picture and text extraction
公開日期: 1999
摘要: 一般的文件處理系統包含二個主要的部分:文字切割與文字辨識。在本論文中提出了一個有效的文字切割系統。 這個系統包含了二個模組:文章分析與標題文字切割。在文章分析的部分,我們先進行縮圖與抽取連通元件(Connected-Components),接著把大的連通元件區分成圖片,表格,長條圖,文章區塊;再把小的連通元件區分成內文區塊,標題區塊。接著我們把報紙中的文則給抽取出來。 在文字切割的模組中我們首先把 Bi-lines (標題中夾雜的二行小字),給抽取出來,再把標題文字給切割出來,接著,我們把連字的部分切割出來,再把量詞上的數字給切割出來。 在我們的實驗中,文字切割的正確率約為 98.9%,而區塊抽取的正確率約為 97%,由此證明了我們系統的效率。
In this thesis, we present an automatic system to segment title characters in newspaper efficiently.The character segmentation system contains two modules: article analysis and character segmentation. In the article analysis module, we first perform image reduction and connected-component extraction. The large connected-components are next classified as picture blocks, table blocks, graph blocks, and frame blocks, and the small components are classified into text components or title components. After large block classification, we merge all text components into text blocks and merge all title components into title blocks. An article in newspaper is then extracted by performing six relation tests. In the character segmentation module, we extract bi-lines from title blocks. Then we segment Chinese characters, English letters and numerals in title lines. Touched characters are separated according to the average size. In our experiments, the character segmentation rate is about 98.9%. The correct block classification rate is about 97%. This shows the effectiveness of our proposed system.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT880392058
http://hdl.handle.net/11536/65458
顯示於類別:畢業論文