標題: An integration of Word Net and fuzzy association rule mining for multi-label document clustering
作者: Chen, Chun-Ling
Tseng, Frank S. C.
Liang, Tyne
資訊工程學系
Department of Computer Science
關鍵字: Fuzzy association rule mining;Text mining;Document clustering;WordNet;Frequent itemsets
公開日期: 1-十一月-2010
摘要: With the rapid growth of text documents, document clustering has become one of the main techniques for organizing large amount of documents into a small number of meaningful clusters. However, there still exist several challenges for document clustering, such as high dimensionality, scalability, accuracy, meaningful cluster labels, overlapping clusters, and extracting semantics from texts. In order to improve the quality of document clustering results, we propose an effective Fuzzy-based Multi-label Document Clustering (FMDC) approach that integrates fuzzy association rule mining with an existing ontology WordNet to alleviate these problems. In our approach, the key terms will be extracted from the document set, and the initial representation of all documents is further enriched by using hypernyms of WordNet in order to exploit the semantic relations between terms. Then, a fuzzy association rule mining algorithm for texts is employed to discover a set of highly-related fuzzy frequent itemsets, which contain key terms to be regarded as the labels of the candidate clusters. Finally, each document is dispatched into more than one target cluster by referring to these candidate clusters, and then the highly similar target clusters are merged. We conducted experiments to evaluate the performance based on Classic, Re0, R8, and WebKB datasets. The experimental results proved that our approach outperforms the influential document clustering methods with higher accuracy. Therefore, our approach not only provides more general and meaningful labels for documents, but also effectively generates overlapping clusters. (C) 2010 Elsevier B.V. All rights reserved.
URI: http://dx.doi.org/10.1016/j.datak.2010.08.003
http://hdl.handle.net/11536/150143
ISSN: 0169-023X
DOI: 10.1016/j.datak.2010.08.003
期刊: DATA & KNOWLEDGE ENGINEERING
Volume: 69
起始頁: 1208
結束頁: 1226
顯示於類別:期刊論文