EFIM: a fast and memory efficient algorithm for high-utility itemset mining

doi:10.1007/s10115-016-0986-0

Full metadata record

DC Field	Value	Language
dc.contributor.author	Zida, Souleymane	en_US
dc.contributor.author	Fournier-Viger, Philippe	en_US
dc.contributor.author	Lin, Jerry Chun-Wei	en_US
dc.contributor.author	Wu, Cheng-Wei	en_US
dc.contributor.author	Tseng, Vincent S.	en_US
dc.date.accessioned	2018-08-21T05:53:57Z	-
dc.date.available	2018-08-21T05:53:57Z	-
dc.date.issued	2017-05-01	en_US
dc.identifier.issn	0219-1377	en_US
dc.identifier.uri	http://dx.doi.org/10.1007/s10115-016-0986-0	en_US
dc.identifier.uri	http://hdl.handle.net/11536/145377	-
dc.description.abstract	In recent years, high-utility itemset mining has emerged as an important data mining task. However, it remains computationally expensive both in terms of runtime and memory consumption. It is thus an important challenge to design more efficient algorithms for this task. In this paper, we address this issue by proposing a novel algorithm named EFIM (EFficient high-utility Itemset Mining), which introduces several new ideas to more efficiently discover high-utility itemsets. EFIM relies on two new upper bounds named revised sub-tree utility and local utility to more effectively prune the search space. It also introduces a novel array-based utility counting technique named Fast Utility Counting to calculate these upper bounds in linear time and space. Moreover, to reduce the cost of database scans, EFIM proposes efficient database projection and transaction merging techniques named High-utility Database Projection and High-utility Transaction Merging (HTM), also performed in linear time. An extensive experimental study on various datasets shows that EFIM is in general two to three orders of magnitude faster than the state-of-art algorithms HUP, HUI-Miner, HUP-Miner, FHM and UP-Growth+ on dense datasets and performs quite well on sparse datasets. Moreover, a key advantage of EFIM is its low memory consumption.	en_US
dc.language.iso	en_US	en_US
dc.subject	Pattern mining	en_US
dc.subject	Itemset mining, High-utility mining	en_US
dc.subject	Fast Utility Counting, High-utility database merging and projection	en_US
dc.title	EFIM: a fast and memory efficient algorithm for high-utility itemset mining	en_US
dc.type	Article	en_US
dc.identifier.doi	10.1007/s10115-016-0986-0	en_US
dc.identifier.journal	KNOWLEDGE AND INFORMATION SYSTEMS	en_US
dc.citation.volume	51	en_US
dc.citation.spage	595	en_US
dc.citation.epage	625	en_US
dc.contributor.department	資訊工程學系	zh_TW
dc.contributor.department	Department of Computer Science	en_US
dc.identifier.wosnumber	WOS:000399408200009	en_US
Appears in Collections:	Articles