運用潛在語意索引的自動化文件分類

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.author	汪若文	en_US
dc.contributor.author	Juo-Wen Wang	en_US
dc.contributor.author	劉敦仁	en_US
dc.contributor.author	Duen-Ren Liu	en_US
dc.date.accessioned	2014-12-12T02:42:25Z	-
dc.date.available	2014-12-12T02:42:25Z	-
dc.date.issued	2003	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#GT008864515	en_US
dc.identifier.uri	http://hdl.handle.net/11536/75112	-
dc.description.abstract	在資訊擷取的領域中，資訊的搜尋與瀏覽是兩項非常重要的課題。雖然資訊的搜尋提供使用者快速找到所需資料的方法，但過渡依賴文字比對的檢索方式，無法有效處理同義字與一字多義等問題，加上使用者有時不見得能下達良好的搜尋條件，可能導致使用者無法找到真正所需的資料。因此要提供良好的資訊服務，除了提供資訊的搜尋外，透過良好的分類機制，提供資訊瀏覽的服務，是相當重要而具互補效果的功能。要提供相關的文件瀏覽服務，良好的文件分類是非常重要且基本的工作。文件的分類可分為兩個步驟：首先將文件以適當的適當的數學形式加以表述，其次是利用適當的分類演算法對文件進行自動分類。文件的分類是一種概念化的工作。傳統以向量空間法對文件進行表述，難以擺脫對於文字的直接依賴。潛在語意索引 (latent semantic indexing) 的目的在於發掘潛藏文件中的語意概念，而語意概念正好是文件分類的關鍵所在，因此將此技術應用於文件的分類，應有不錯的成效。本研究嘗試使用潛在語意索引技術進行文件的表述，配合中心向量法與 k-NN 兩種分類演算法進行自動化文件分類，探討其可行性與效果。另外並以向量空間法配合上述兩種分類演算法作為對照，比較兩者的分類效果。本研究探討的是單一分類的問題。研究結果顯示，利用潛在語意索引技術進行文件的表述，配合適當的分類演算法，可以得到穩定的分類結果，因此將潛在語意索引運用於自動化文件分類是可行的。但在本研究中，無論是搭配中心向量法或 k-NN 法，運用潛在語意索引的分類正確率都不及向量空間法。至於潛在語意索引技術是否較適合運用於多分類的問題，或是潛在語意索引技術與其他分類演算法搭配可得較佳分類結果，則有待進一步的研究探討。	zh_TW
dc.description.abstract	Search and browse are both important tasks in information retrieval. Search provides a way to find information rapidly, but relying on words makes it hard to deal with the problems of synonym and polysemy. Besides, users sometimes cannot provide suitable query and cannot find the information they really need. To provide good information services, the service of browse through good classification mechanism as well as information search are very important. There are two steps in classifying documents. The first is to present documents in suitable mathematical forms. The second is to classify documents automatically by using suitable classification algorithms. Classification is a task of conceptualization. Presenting documents in conventional vector space model cannot avoid relying on words explicitly. Latent semantic indexing (LSI) is developed to find the semantic concept of document, which may be suitable for the classification of documents. This thesis is intended to study the feasibility and effect of the classification of text documents by using LSI as the presentation of documents, and using both centroid vector and k-NN as the classification algorithms. The results are compared to those of the vector space model. This study deals with the problem of one-category classification. The results show that automatic classification of text documents by using LSI along with suitable classification algorithms is feasible. But the accuracy of classification by using LSI is not as good as by using vector space model. The effect of applying LSI on multi-category classification and the effect of combining LSI with other classification algorithms need further studies.	en_US
dc.language.iso	zh_TW	en_US
dc.subject	自動分類	zh_TW
dc.subject	資訊擷取	zh_TW
dc.subject	潛在語意索引	zh_TW
dc.subject	向量空間法	zh_TW
dc.subject	automatic classification	en_US
dc.subject	information retrieval	en_US
dc.subject	latent semantic indexing	en_US
dc.subject	vector space model	en_US
dc.title	運用潛在語意索引的自動化文件分類	zh_TW
dc.title	Automatic Classification of Text Documents by Using Latent Semantic Indexing	en_US
dc.type	Thesis	en_US
dc.contributor.department	管理學院資訊管理學程	zh_TW
顯示於類別：	畢業論文

文件中的檔案：

451501.pdf

若為 zip 檔案，請下載檔案解壓縮後，用瀏覽器開啟資料夾中的 index.html 瀏覽全文。