完整後設資料紀錄
DC 欄位語言
dc.contributor.author汪若文en_US
dc.contributor.authorJuo-Wen Wangen_US
dc.contributor.author劉敦仁en_US
dc.contributor.authorDuen-Ren Liuen_US
dc.date.accessioned2014-12-12T02:42:25Z-
dc.date.available2014-12-12T02:42:25Z-
dc.date.issued2003en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#GT008864515en_US
dc.identifier.urihttp://hdl.handle.net/11536/75112-
dc.description.abstract在資訊擷取的領域中,資訊的搜尋與瀏覽是兩項非常重要的課題。雖然資訊的搜尋提供使用者快速找到所需資料的方法,但過渡依賴文字比對的檢索方式,無法有效處理同義字與一字多義等問題,加上使用者有時不見得能下達良好的搜尋條件,可能導致使用者無法找到真正所需的資料。因此要提供良好的資訊服務,除了提供資訊的搜尋外,透過良好的分類機制,提供資訊瀏覽的服務,是相當重要而具互補效果的功能。要提供相關的文件瀏覽服務,良好的文件分類是非常重要且基本的工作。 文件的分類可分為兩個步驟:首先將文件以適當的適當的數學形式加以表述,其次是利用適當的分類演算法對文件進行自動分類。文件的分類是一種概念化的工作。傳統以向量空間法對文件進行表述,難以擺脫對於文字的直接依賴。潛在語意索引 (latent semantic indexing) 的目的在於發掘潛藏文件中的語意概念,而語意概念正好是文件分類的關鍵所在,因此將此技術應用於文件的分類,應有不錯的成效。 本研究嘗試使用潛在語意索引技術進行文件的表述,配合中心向量法與 k-NN 兩種分類演算法進行自動化文件分類,探討其可行性與效果。另外並以向量空間法配合上述兩種分類演算法作為對照,比較兩者的分類效果。 本研究探討的是單一分類的問題。研究結果顯示,利用潛在語意索引技術進行文件的表述,配合適當的分類演算法,可以得到穩定的分類結果,因此將潛在語意索引運用於自動化文件分類是可行的。但在本研究中,無論是搭配中心向量法或 k-NN 法,運用潛在語意索引的分類正確率都不及向量空間法。至於潛在語意索引技術是否較適合運用於多分類的問題,或是潛在語意索引技術與其他分類演算法搭配可得較佳分類結果,則有待進一步的研究探討。zh_TW
dc.description.abstractSearch and browse are both important tasks in information retrieval. Search provides a way to find information rapidly, but relying on words makes it hard to deal with the problems of synonym and polysemy. Besides, users sometimes cannot provide suitable query and cannot find the information they really need. To provide good information services, the service of browse through good classification mechanism as well as information search are very important. There are two steps in classifying documents. The first is to present documents in suitable mathematical forms. The second is to classify documents automatically by using suitable classification algorithms. Classification is a task of conceptualization. Presenting documents in conventional vector space model cannot avoid relying on words explicitly. Latent semantic indexing (LSI) is developed to find the semantic concept of document, which may be suitable for the classification of documents. This thesis is intended to study the feasibility and effect of the classification of text documents by using LSI as the presentation of documents, and using both centroid vector and k-NN as the classification algorithms. The results are compared to those of the vector space model. This study deals with the problem of one-category classification. The results show that automatic classification of text documents by using LSI along with suitable classification algorithms is feasible. But the accuracy of classification by using LSI is not as good as by using vector space model. The effect of applying LSI on multi-category classification and the effect of combining LSI with other classification algorithms need further studies.en_US
dc.language.isozh_TWen_US
dc.subject自動分類zh_TW
dc.subject資訊擷取zh_TW
dc.subject潛在語意索引zh_TW
dc.subject向量空間法zh_TW
dc.subjectautomatic classificationen_US
dc.subjectinformation retrievalen_US
dc.subjectlatent semantic indexingen_US
dc.subjectvector space modelen_US
dc.title運用潛在語意索引的自動化文件分類zh_TW
dc.titleAutomatic Classification of Text Documents by Using Latent Semantic Indexingen_US
dc.typeThesisen_US
dc.contributor.department管理學院資訊管理學程zh_TW
顯示於類別:畢業論文


文件中的檔案:

  1. 451501.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。