標題: Fast and memory efficient mining of high-utility itemsets from data streams: with and without negative item profits
作者: Li, Hua-Fu
Huang, Hsin-Yun
Lee, Suh-Yin
資訊工程學系
Department of Computer Science
關鍵字: Data mining;Data streams;Utility mining;High-utility itemsets;Utility itemset with positive item profits;Utility itemset with negative item profits
公開日期: 1-Sep-2011
摘要: Mining utility itemsets from data steams is one of the most interesting research issues in data mining and knowledge discovery. In this paper, two efficient sliding window-based algorithms, MHUI-BIT (Mining High-Utility Itemsets based on BITvector) and MHUI-TID (Mining High-Utility Itemsets based on TIDlist), are proposed for mining high-utility itemsets from data streams. Based on the sliding window-based framework of the proposed approaches, two effective representations of item information, Bitvector and TIDlist, and a lexicographical tree-based summary data structure, LexTree-2HTU, are developed to improve the efficiency of discovering high-utility itemsets with positive profits from data streams. Experimental results show that the proposed algorithms outperform than the existing approaches for discovering high-utility itemsets from data streams over sliding windows. Beside, we also propose the adapted approaches of algorithms MHUI-BIT and MHUI-TID in order to handle the case when we are interested in mining utility itemsets with negative item profits. Experiments show that the variants of algorithms MHUI-BIT and MHUI-TID are efficient approaches for mining high-utility itemsets with negative item profits over stream transaction-sensitive sliding windows.
URI: http://dx.doi.org/10.1007/s10115-010-0330-z
http://hdl.handle.net/11536/19881
ISSN: 0219-1377
DOI: 10.1007/s10115-010-0330-z
期刊: KNOWLEDGE AND INFORMATION SYSTEMS
Volume: 28
Issue: 3
起始頁: 495
結束頁: 522
Appears in Collections:Articles


Files in This Item:

  1. 000294229000002.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.