標題: | 目標導向之SOM應用於文件分群 Goal-Oriented SOM for Document Clustering |
作者: | 謝佩原 Pei-Yuan Hsieh 楊維邦 柯皓仁 Dr. Hao-Ren Ke Dr. Wei-Pang Yang 資訊科學與工程研究所 |
關鍵字: | Self-Organizing Map;目標導向;潛在語意分析;文件分群;Self-Organizing Map;Goal-Oriented;Latent Semantic Analysis;Document Clustering |
公開日期: | 2003 |
摘要: | 在這篇論文中,我們提出一目標導向之SOM (Goal-Oriented Self-Organizing Map, GOSOM) 來將文件依使用者的目標分群。GOSOM 是基於Self-Organzing Map (SOM) 加以改良,可讓使用者指定想要的分群結果種類。使用者指定的目標是透過潛在語意分析 (Latent Semantic Analysis, LSA) 方法,來分析其與輸入向量 (Input Vector) 的關係。GOSOM 適當地加強了輸入向量的特徵,以致於在分群過程中計算相似度時,可將分群結果導向使用者想要的目標。此外,我們也提出一個權重的多數決 (Weighted Majority Voting) 方法,將分群結果以使用者的觀點作適當標記。最後,GOSOM提供了一個使用者相關回饋 (User Relevance Feedback) 的機制,以改善分群的結果。我們實作了「目標導向文件分群系統」(Goal-Oriented Document Clustering system, GODOC) 來驗證GOSOM優於傳統的SOM模型。實驗結果證明,相較於傳統的SOM,GOSOM在準確率上 (Accuracy) 平均增加21.67%、求全率則 (Recall) 增加28.47%。 In this thesis, a Goal-Oriented Self-Organizing Map (GOSOM) is proposed to cluster documents according to user’s goals. GOSOM is motivated by Self-Organzing Map (SOM) model and allows the user to specify what kinds of results should be clustered. The specified goals are analyzed by Latent Semactic Analysis (LSA) to determine their relationships to input vectors. GOSOM properly enhances the features of input vectors when caculating similarity in the clustering process; in this manner, GOSOM is capable of guiding the clustering result toward user’s goals. Additionally, a weighted majority voting algorithm is provided to label the clustering result with respect to the specified goals. Furthermore, GOSOM presents a user relevance feedback mechanism to improve the performance of clustering. A system called Goal-Oriented Document Clustering system (GODOC) is implemented to verify that GOSOM is superior to convensional SOM. Experiment results show that GOSOM significantly improve 21.67% in accuracy, 28.47% in recall. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT009123531 http://hdl.handle.net/11536/52857 |
Appears in Collections: | Thesis |
Files in This Item:
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.