標題: 以由下往上的方式作彩色文件分析與重排
A Bottom-Up Approach to Color Image Document Analysis and Rearrangement
作者: 詹前裕
Chian-Yu Chan
蔡文祥
Wen-Hsiang Tsai
資訊科學與工程研究所
關鍵字: 文件分析;文件重排;彩色文件;區塊抽取;文字切割;documnet analysis;document rearrangement;color document;block extraction;character segmentation
公開日期: 1998
摘要: 本論文提出一以下往上方式作文件分析與重排的方法。首先,提出用邊緣偵測及區塊生長演算法找基本區塊的方法。利用這種方法有幾個好處,第一可以避免彩色圖片經過印刷、掃描後的雜訊干擾。第二,可以快速地取得基本區塊。接下來利用一新提出的區塊辨識演算法,利用區塊的結構以及統計特徵,判斷這些區塊為文字或是圖像區塊,再將相同屬性的區塊合併成大的區塊。在合併文字區塊的過程中,我們可以找到文字區塊的文字走向以及文字列印方向,還可以利用文字大小的特徵將文字區塊作簡單的分析。在文件重排方面,我們根據先前的結果,找到文章區塊的每一行文字,利用二值化後水平與垂直投影的特性,將每行文字作切字處理,進而找出文字區塊中的文字閱讀順序,將文章重排到新的文件中。良好的實驗結果,證明了所提出的方法是可行而且實用的。
A bottom-up approach to image document analysis and rearrangement is proposed in this study. First, an edge detection algorithm and a region-growing algorithm are proposed to extract the basic blocks. Two advantages can be obtained by employing these algorithms. First, the distortion effect caused by printing or scanning can be avoided. Second, these techniques are faster than color-quantization techniques. After basic block extraction, several features are used to classify extracted blocks into text blocks and graphic blocks. Some improved methods for merging text blocks into text areas and graphic blocks into graphic areas are also proposed. In the stage of text block merging, the direction of article reading is obtained. Some information is used to analyze text areas. For article rearrangement, we can use the segmented result to obtain the text lines in the text area, then segment the characters by characteristics of vertical and horizontal projections. After the reading orders of the characters in the text area are analyzed, article rearrangement is performed. Good experimental results prove the feasibility and practicability of the proposed approaches.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT870394008
http://hdl.handle.net/11536/64145
Appears in Collections:Thesis