标题: 目标导向之SOM应用于文件分群
Goal-Oriented SOM for Document Clustering
作者: 谢佩原
Pei-Yuan Hsieh
杨维邦
柯皓仁
Dr. Hao-Ren Ke
Dr. Wei-Pang Yang
资讯科学与工程研究所
关键字: Self-Organizing Map;目标导向;潜在语意分析;文件分群;Self-Organizing Map;Goal-Oriented;Latent Semantic Analysis;Document Clustering
公开日期: 2003
摘要: 在这篇论文中,我们提出一目标导向之SOM (Goal-Oriented Self-Organizing Map, GOSOM) 来将文件依使用者的目标分群。GOSOM 是基于Self-Organzing Map (SOM) 加以改良,可让使用者指定想要的分群结果种类。使用者指定的目标是透过潜在语意分析 (Latent Semantic Analysis, LSA) 方法,来分析其与输入向量 (Input Vector) 的关系。GOSOM 适当地加强了输入向量的特征,以致于在分群过程中计算相似度时,可将分群结果导向使用者想要的目标。此外,我们也提出一个权重的多数决 (Weighted Majority Voting) 方法,将分群结果以使用者的观点作适当标记。最后,GOSOM提供了一个使用者相关回馈 (User Relevance Feedback) 的机制,以改善分群的结果。我们实作了“目标导向文件分群系统”(Goal-Oriented Document Clustering system, GODOC) 来验证GOSOM优于传统的SOM模型。实验结果证明,相较于传统的SOM,GOSOM在准确率上 (Accuracy) 平均增加21.67%、求全率则 (Recall) 增加28.47%。
In this thesis, a Goal-Oriented Self-Organizing Map (GOSOM) is proposed to cluster documents according to user’s goals. GOSOM is motivated by Self-Organzing Map (SOM) model and allows the user to specify what kinds of results should be clustered. The specified goals are analyzed by Latent Semactic Analysis (LSA) to determine their relationships to input vectors. GOSOM properly enhances the features of input vectors when caculating similarity in the clustering process; in this manner, GOSOM is capable of guiding the clustering result toward user’s goals. Additionally, a weighted majority voting algorithm is provided to label the clustering result with respect to the specified goals. Furthermore, GOSOM presents a user relevance feedback mechanism to improve the performance of clustering. A system called Goal-Oriented Document Clustering system (GODOC) is implemented to verify that GOSOM is superior to convensional SOM. Experiment results show that GOSOM significantly improve 21.67% in accuracy, 28.47% in recall.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009123531
http://hdl.handle.net/11536/52857
显示于类别:Thesis


文件中的档案:

  1. 353101.pdf
  2. 353102.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.