標題: | Efficiently mining uncertain high-utility itemsets |
作者: | Lin, Jerry Chun-Wei Gan, Wensheng Fournier-Viger, Philippe Hong, Tzung-Pei Tseng, Vincent S. 資訊工程學系 Department of Computer Science |
關鍵字: | Large-scale dataset;Data mining;Uncertainty;High-utility itemset;Pruning strategies |
公開日期: | 1-六月-2017 |
摘要: | Data mining consists of deriving implicit, potentially meaningful and useful knowledge from databases such as information about the most profitable items. High-utility itemset mining (HUIM) has thus emerged as an important research topic in data mining. But most HUIM algorithms can only handle precise data, although big data collected in real-life applications using experimental measurements or noisy sensors is often uncertain. In this paper, an efficient algorithm, named Mining Uncertain High-Utility Itemsets (MUHUI), is proposed to efficiently discover potential high-utility itemsets (PHUIs) in uncertain data. Based on the probability-utility-list (PU-list) structure, the MUHUI algorithm directly mines PHUIs without generating candidates, and can avoid constructing PU-lists for numerous unpromising itemsets by applying several efficient pruning strategies, which greatly improve its performance. Extensive experiments conducted on both real-life and synthetic datasets show that the proposed algorithm significantly outperforms the state-of-the-art PHUI-List algorithm in terms of efficiency and scalability, and that the proposed MUHUI algorithm scales well when mining PHUIs in large-scale uncertain datasets. |
URI: | http://dx.doi.org/10.1007/s00500-016-2159-1 http://hdl.handle.net/11536/145533 |
ISSN: | 1432-7643 |
DOI: | 10.1007/s00500-016-2159-1 |
期刊: | SOFT COMPUTING |
Volume: | 21 |
起始頁: | 2801 |
結束頁: | 2820 |
顯示於類別: | 期刊論文 |