標題: | 應用在結構化文件之階層式文件分群法 A Level-wise Clustering Algorithm on Structured Documents |
作者: | 雷穎傑 Ying-Chieh Lei 曾憲雄 Shian-Shyong Tseng 資訊科學與工程研究所 |
關鍵字: | 文件分群;階層式文件;分群法;文件分類;document clustering;structued document;clustering;document classficatio |
公開日期: | 2002 |
摘要: | 文件分群是將分群的技術應用在文件管理上,透過分群的技術,可將相似的文件聚集在同一群,藉此有效管理和找尋眾多的電子文件。然而,現存的文件分群法甚少針對文件的階層架構進行考慮,導致分群的結果無法真正反應文件的特性。因此,藉由樹狀結構的書籍表示法,我們提出了階層式分群演算法,透過樹狀結構的階層特性和概念產生的方式,針對不同階層的內容進行分群處理。為了有效地紀錄分群結果和加速搜尋效率,我們提出了一階層式圖形架構儲存分群結果,且對於分群結果,我們亦提出了三種搜尋策略來滿足不同使用者使用需求。最後透過實驗結果的分析,顯示使用者可有效地對文件進行搜尋的動作。 Document clustering is the process of applying clustering technique to the document management. Similar documents can be grouped together by clustering technique, so that both managing and searching the documents can be efficient. But, most existing document clustering algorithms do not take the structure information of the document into consideration, so the clustering results can not reflect the characteristics of the documents fully. Therefore, we represent each document as a tree structure and propose a level-wise clustering algorithm to solve the problem. The clustering process applies the level property of the tree and is run level by level by the concept generation operation. In order to store the clustering results and search similar clusters efficiently, a multistage graph is proposed. Based on the multistage graph, three search strategies are provided to meet the needs of different uses. Finally, our experimental results show that the similarity search is efficient and the accuracy of the search is acceptable. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#NT910394025 http://hdl.handle.net/11536/70197 |
顯示於類別: | 畢業論文 |