標題: 在彩色文件分析中決定背景以及抽出物件並分類之研究
A Study on Background Determination and Object Extraction and Classification for Color Document Analysis
作者: 蔡俊明
Chun-Ming Tsai
李錫堅
Hsi-Jian Lee
資訊科學與工程研究所
關鍵字: 彩色文件分析;文件分析;決定背景;抽出物件;分類物件;明亮度;飽和度;Color Document Analysis;Document Analysis;Background Determination;Object Extraction;Object Classification;Luminance;Saturation
公開日期: 2001
摘要: 在本論文中,我們提出一些方法,來解決在自動彩色文件處理系統中,所遇到的一些問題。首先,因為光學文字辨識(Optical Character Recognition, OCR)核心,是由大量的字元樣本所訓練,而這些樣本,通常是單一的黑白字元影像。其次,要從大量的彩色文件中抽取文字,需要花很多時間。最後,在彩色文件中,由於排版設計上的需要,往往會將圖形和說明文字做傾斜印刷。因此,本論文要解決上述問題,其方法包括:決定彩色文件影像的背景顏色,抽取彩色文件影像中的物件以及切割彩色文件中具有傾斜的物件等。 文件的種類很多,在傳統二值化方法中,一種方法僅能處理少數種類灰階文件,無法以一種方法處理很多種類文件。其次,彩色文件的背景通常是一致的,而且前景文字通常印刷成很多顏色,當前景文字顏色接近背景顏色或與背景顏色相混時,我們利用傳統二值化方法來處理,其結果並不理想。另外,當彩色文件中,背景顏色占主要部分,而前景占少數並具有一些顏色,我們利用傳統二值化方法來處理,其結果亦不是很理想。所以,本論文首先提出一個以決策樹為基礎的背景決定方法,來決定彩色文件影像的背景顏色。我們先從彩色文件影像之明亮度(luminance)成份中,抽出四個統計特徵。利用這四個統計特徵和決策樹,決定要在明亮度中或飽和度(saturation)中或是同時在明亮度和飽和度成份中,來決定彩色文件之背景顏色。首先,當文件影像之背景與前景顏色少而且接近時,我們利用飽和度來分離背景和前景顏色;第二,當文件影像之前景是顯著的,我們利用明亮度來分離背景和前景顏色;第三,當文件影像之背景顯著而且集中在一定範圍,我們也利用明亮度來分離背景和前景顏色;第四,當文件影像之低明亮度(<60)成份很少時,我們利用飽和度來分離背景和前景顏色;不是以上情況時,我們同時利用明亮度和飽和度來分離背景和前景。我們實驗了很多種類文件,包括報紙、書法、發票、名片、廣告以及雜誌,在外型(shape)和相連成份(connected component)測量標準下,我們提出之決定背景顏色之方法,其效果比其他整體門檻化方法(global thresholding methods)和區域調整門檻化方法(local adaptive thresholding methods)為佳;在處理速度上,比區域調整方法為快,和整體門檻化方法差不多。另一方面,我們將所提方法和其他方法之結果,一起利用一商業光學文字辨識系統來切字和辨識,其辨識結果證明我們所提之決定背景顏色之方法,和其他二值化方法比較,我們方法的正確辨識率比較高。 一般而言,要利用傳統文件分析方法來處理大量彩色文件,仍然很沒有效率。本論文提出一更有效率而且有效的文件分析方法,從彩色文件中抽出前景物件,並將這些抽出之物件,分類為雜訊、水平條線、垂直條線、文字字元、文字行、文字區塊、帶狀、圖形和傾斜等物件。我們的文件分析方法包括決定背景顏色,由上而下抽出物件以及沒有使用參數之分類方法。其中前景物件抽取,是利用我們提出之物件邊緣偵測演算法,來偵測前景物件之邊緣,進而抽出物件,並利用四邊邊線來表示物件。這些被抽出的物件,形成一幾何樹狀結構,在樹狀結構中,包括終端和非終端節點,我們對這些終端節點,求其大小統計分佈圖,利用此分佈圖,將終端節點分類為雜訊、水平條線、垂直條線、文字字元、圖形和傾斜等物件。我們利用樹狀結構特性,父節點由子節點組成,以及子節點之長寬大小,將非終端節點指定為文字行、文字區塊以及帶狀物件。我們實驗了很多種類彩色文件,包括報紙、發票、名片、廣告以及雜誌,並將我們的文件分析結果和兩個商業文件分析軟體比較,實驗證明,我們所提出之由上而下之彩色文件分析系統是有效的。另外,我們的邊緣偵測演算法與投影外廓(projection-profiles)和相連單元(connected component)方法做執行複雜度之比較,實驗證明,我們所提出之邊緣偵測演算法比較有效率,一般而言,我們的時間複雜度,大約是文件中背景影像之大小。所以,在彩色文件中,前景物件越多,我們的處理速度越快。 一般而言,造成文件影像傾斜的原因,包括掃描文件時之人為疏忽和排版上或設計上之考量,將文件中部分圖形和說明文字印刷為傾斜。前一種問題,文獻上很多文章已討論過,後一種問題,較少文獻討論,本論文也針對印刷傾斜問題,提出一有效的直接傾斜切割方法。我們的傾斜切割方法,包括前述之決定彩色文件影像背景顏色,抽取前景物件方法。當前景物件被抽出後,利用終端物件分類方法來分類,如果終端物件被歸類為傾斜物件,我們所提出之直接傾斜切割方法會動作。我們的傾斜切割方法包括四步驟:確認傾斜物件、區分傾斜種類、單一物件傾斜角度之估測以及多重物件傾斜之切割。我們的方法主要是利用物件邊緣之描述和物件之排列特性,物件邊緣可以用上下邊界線(top-border and bottom-border lines)或左右邊界線(left-border and right-border lines))來描述,而利用物件之排列特性,是因為沒有傾斜之物件,其排列大都是水平或垂直方向,傾斜後,仍然依照一定角度排列著,利用這些特性,可以對傾斜物件加以確認、分類、角度估測以及傾斜切割。在我們的實驗中,將雜誌中具有傾斜印刷之文章,掃描成影像檔,再用我們所提出之方法加以處理,實驗證明,我們所提出之傾斜切割方法是有效的。
In this dissertation, we propose methods to solve the following problems in an automatic color document processing system. First, to recognize the characters, the recognition engines are usually trained from character samples, which are extracted from images. Second, it is time consuming to extract characters from color document images. Last, some of the images and captions are printed in skew for layout designing. The methods to explore these problems in this dissertation include binarization by determining background colors, object extraction and skew object segmentation in color document images. Several properties of color document images may yield unsatisfactory binarization results. For instance, when foreground colors are close to or mixed with background color or when foreground contains varying colors with very few pixels, it is difficult to binarize these document images. This dissertation proposes a decision-tree based background color determination to determine the background color of the color document images. Our method first extracts four statistical features from luminance for color document images. By using these four statistical features and the decision tree to decide whether luminance or saturation, or both are used to determine the background of the color document images. If the document image colors are concentrated within a limited range, saturation is employed. If the image foreground colors are significant, luminance is adopted. If the image background colors are concentrated within a limited range, luminance is also applied. If the total number of pixels with low luminance (less than 60) is limited, saturation is applied; else both luminance and saturation are employed. The experimental results on many kinds of the color document images, which include newspaper, calligraphy documents, uniform invoices, business cards, advertisements, and magazines show the proposed background color determination method generates better results than other available methods in shape and connected component measurements. For the processing time, our method is faster than local adaptive thresholding methods and is similar to global thresholsing methods. Also the background color determination method obtains higher character recognition accuracy than other comparable methods. This dissertation also proposes an efficient and effective document analysis system to analyze the layout of the color document images, which contains large photos. Our system includes background color determination, top-down object extraction, and parameter-free object classification methods. The top-down object extraction is based on the foreground object boundary detection algorithm to detect object and extract them to form the geometrical tree structure. The tree structure contains primitive and non-terminal object. We propose a parameter-free classification method based on the width and the height histogram of the primitive objects to classify the primitive objects into noises, horizontal lines, vertical lines, text characters, photos and skew objects. We also propose a bottom-up class assignment method to assign the non-terminal object as text lines, text blocks, and strip objects. The foreground object boundary detection algorithm includes boundary detect by horizontal scan and vertical scan. The experimental results on many kinds of color document images, which come from newspapers, uniform invoices, business cards and magazines, show our method is more effective than other two commercial systems. The time complexity is about O(NB), where NB is the total number of the background pixels. That is, the more the foreground pixels, the less the computation time. This dissertation also proposes an efficient and effective document analysis system to analyze the layout of the color document images, which contains internal skew objects. Our system consists of background color determination, top-down object extraction, parameter-free object classification, and skew object segmentation. During the primitive object classification step, if a primitive object is classified as a skew object, a skew object segmentation procedure is activated. The skew object segmentation procedure includes four steps: skew object identification, skew type discrimination, skew estimation for single object skew, and skew object segmentation for multiple objects skew. If the skew object is a photo, its bounding lines are not vertical or horizontal. If the skew objects are text lines, the caption, the centroids of the bounding rectangles of the characters in the text line lie approximately on straight line. By detecting the angle of the straight line, we can judge the skew angle of the text lines. If the skew object is aggregation of a photo and a caption, the multiple objects segmentation will be activated. Our method is based on the description of object boundary and the arrangement of the objects. The object boundary described by the left-border and the right-border lines or the top-border and the bottom-border lines, which obtained from the foreground object boundary detection algorithm. The object arrangement described that objects are not skew, they arrange with vertical or horizontal; objects are skew, they arrange with a certain angle. The experimental results on color document images with internal skew objects, which come from color magazines, show our skew object segmentation method is effective.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT900392111
http://hdl.handle.net/11536/68519
顯示於類別:畢業論文