標題: | 利用文件影像分析及分類壓縮技術建構數位圖書館電子書之研究 Book Content Digitization and Display for Digital Libraries by Document Image Analysis and Compression-By-Classification Techniques |
作者: | 陳世豪 Shih-hao Chen 蔡文祥 Wen-Hsiang Tsai 資訊科學與工程研究所 |
關鍵字: | 文件影像分析;分類壓縮;數位圖書館;電子書;Document Image Analysis;Compression-By-Classification;Digital Libraries;Electronic Book |
公開日期: | 1999 |
摘要: | 本論文發展了一套離線式書籍內容自動數位化及展示系統。首先我們利用掃描器的自動饋紙機將多頁的書籍內容掃描到電腦中。接著採用由下往上的方式作彩色文件分析,將書頁影像切割並區分成文字區塊及圖形區塊。為了節省書籍內容儲存的空間,我們提出了一個分類壓縮的方法來達到這個目的。在此方法中,我們首先基於圖形區塊的特性,使用決策樹來作影像內容的分類,並且使用全彩階層式矩量保持原理來作圖形減色。在影像內容分類後,我們提出了一個分類壓縮的方法,根據不同影像區塊的特性採用適當的壓縮演算法來壓縮,並再度利用減色的技術來消除彩色圖片經過印刷、掃描後的失真,並且保留最重要的少數顏色,以達到較好的壓縮效果。另外,根據書籍中不同書頁間所具有共同部份的特性,我們也提出一偵測重覆區塊的方法,用來找出不同書頁間的相同內容,進一步提高整體的壓縮率。最後,為了讓使用者可以清楚地閱讀電子書的內容,我們對書頁內容作了一些美化的動作,並提供了一個操作方便的使用者界面,讓使用者可以輕鬆地閱讀電子書。藉由實驗的結果,可以證明我們提出的系統與方法是可行的。 In this study, an offline automatic book content digitization and display system is developed. First, we utilize an automatic document feeder (ADF) to scan multiple book pages into a computer. Then, we segment and classify page images into text blocks and picture blocks by an adopted bottom-up segmentation approach. In order to save book content storage space, we employ a compression-by-classification approach. In the approach, first we propose an image content classification method using a decision tree to classify picture blocks into various types, based on the properties of picture blocks as well as the use of a full-color hierarchical moment-preserving color reduction method. After classification of page contents, we propose a content-based compression scheme, which compresses different image blocks by appropriate compression algorithms according to their image attributes. A color reduction algorithm is adopted to eliminate distortion caused by printing or scanning and preserve the most important colors in image blocks, achieving a great deal of compression effect. Besides, we propose a repetitive-pattern recognition approach to detect common parts among different page images in order to improve compression effect further. Finally, we enhance page contents and provide a user-friendly interface for book contents display and reading. Experimental results show the feasibility and practicability of the proposed approaches. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#NT880394040 http://hdl.handle.net/11536/65536 |
顯示於類別: | 畢業論文 |