標題: | An integration of Word Net and fuzzy association rule mining for multi-label document clustering |
作者: | Chen, Chun-Ling Tseng, Frank S. C. Liang, Tyne 資訊工程學系 Department of Computer Science |
關鍵字: | Fuzzy association rule mining;Text mining;Document clustering;WordNet;Frequent itemsets |
公開日期: | 1-Nov-2010 |
摘要: | With the rapid growth of text documents, document clustering has become one of the main techniques for organizing large amount of documents into a small number of meaningful clusters. However, there still exist several challenges for document clustering, such as high dimensionality, scalability, accuracy, meaningful cluster labels, overlapping clusters, and extracting semantics from texts. In order to improve the quality of document clustering results, we propose an effective Fuzzy-based Multi-label Document Clustering (FMDC) approach that integrates fuzzy association rule mining with an existing ontology WordNet to alleviate these problems. In our approach, the key terms will be extracted from the document set, and the initial representation of all documents is further enriched by using hypernyms of WordNet in order to exploit the semantic relations between terms. Then, a fuzzy association rule mining algorithm for texts is employed to discover a set of highly-related fuzzy frequent itemsets, which contain key terms to be regarded as the labels of the candidate clusters. Finally, each document is dispatched into more than one target cluster by referring to these candidate clusters, and then the highly similar target clusters are merged. We conducted experiments to evaluate the performance based on Classic, Re0, R8, and WebKB datasets. The experimental results proved that our approach outperforms the influential document clustering methods with higher accuracy. Therefore, our approach not only provides more general and meaningful labels for documents, but also effectively generates overlapping clusters. (C) 2010 Elsevier B.V. All rights reserved. |
URI: | http://dx.doi.org/10.1016/j.datak.2010.08.003 http://hdl.handle.net/11536/31957 |
ISSN: | 0169-023X |
DOI: | 10.1016/j.datak.2010.08.003 |
期刊: | DATA & KNOWLEDGE ENGINEERING |
Volume: | 69 |
Issue: | 11 |
起始頁: | 1208 |
結束頁: | 1226 |
Appears in Collections: | Articles |
Files in This Item:
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.