標題: 以SVM與詮釋資料設計書籍分類系統
A Book Classification System Using SVM and Meta-information
作者: 林昕潔
Hsin-Chieh Lin
柯皓仁
楊維邦
Hao-Ren Ke
Wei-Pang Yang
資訊科學與工程研究所
關鍵字: 書籍自動分類;網路書店;圖書館;SVM;詮釋資料;Book Classification;SVM;Meta-Information;Support Vector Machine
公開日期: 2005
摘要: 本研究提出一套書籍自動分類系統的設計方法,用以有效地節省分類大批書籍時所需的人力,提升書籍分類的效率;透過學習的方式,本研究所提出的系統可以套用於不同的分類架構上,使書籍類別更具彈性、更切合資訊脈動與使用者的需求。 本研究以文件分類為基礎進行書籍分類,輔以專家的經驗挑選類別特徵,並且將書籍的詮釋資料加入,以提高分類成效。本研究將書籍資訊分為敘述資料(Description)與詮釋資料(Meta-information)兩部分,其中敘述資訊包含書名、書籍簡介與作者簡介,詮釋資料包含作者與出版社資訊。本研究所提出的方法分為三大步驟:1) 對敘述資料進行前置處理,透過特徵挑選公式過濾出具有類別代表性的特徵,接著借助專家的智慧增加或刪除特徵,並將專家所指定的特徵予以加權,再配合選取的特徵將敘述資料轉換為向量表示式後,透過Support Vector Machines (SVM)分類器產生分類模型;2) 分析統計書籍詮釋資料,發掘有助於書籍分類的資訊,且單獨運用這些資訊進行書籍分類;3) 以線性組合將SVM與詮釋資料的分類結果加以合併,完成書籍分類的工作。 在實驗中,使用的是「博客來網路書店」的書籍資訊,以9-fold cross validation的方式進行實驗,同時並列Accuracy與F-measure兩項評估數據,以求一個客觀整體的比較。實驗結果顯示,加入專家智慧挑選特徵並給予適當權重,可以提升SVM分類成果約5%,再將SVM分類結果融合詮釋資料中隱含的分類資訊,可再提高約5%,整體正確率達95%。
This thesis proposes an automatic book classification system to reduce the labor and time in classifying a batch of books. By means of machine learning, the proposed system can be utilized in various class structures. Using this system, the cataloging task in librares or online book stores can be more efficient than ever. The proposed system classifies books based on document classification. It uses experts’ knowledge in feature selection. The kernel classification algorithm used is Support Vector Machines (SVM), the result of which is integrated with books’ meta-information to improve the classification correctnesss. In the beginning, data of books are divided into description (book title and prospectus) and meta-information (author and publicher). Description of each book is preprocessesd and features of a book are selected according to term frequentcies and log likelihood ratio. In the next step, experts refine these selected features, and give a weight for those features selected by them. After feature selection, descriptions of books are transformed into vector forms and use the SVM classifier to learn and classifiy. On the other hand, the meta-information is statictically analysised and extracted some hidden information useful for book classification. The final step is to linearly combine the SVM classification and meta-information for obtaining the final classification. To prove the feasibility of our method, we use the book data in ‘books.com.tw’, experiment through 9-fold cross validation, and evaluate accuracy and f-measure. The experimental results show that adding experts’ knowledge may improve SVM results by 5%, and combining SVM and meta-information may hava an additional 5% improvement. The overall performance may achieve 95%.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009323593
http://hdl.handle.net/11536/79122
顯示於類別:畢業論文


文件中的檔案:

  1. 359304.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。