Fast discovery of sequential patterns through memory indexing and database partitioning

Full metadata record

DC Field	Value	Language
dc.contributor.author	Lin, MY	en_US
dc.contributor.author	Lee, SY	en_US
dc.date.accessioned	2014-12-08T15:36:58Z	-
dc.date.available	2014-12-08T15:36:58Z	-
dc.date.issued	2005-01-01	en_US
dc.identifier.issn	1016-2364	en_US
dc.identifier.uri	http://hdl.handle.net/11536/25377	-
dc.description.abstract	Sequential pattern mining is a challenging issue because of the high complexity of temporal pattern discovering from numerous sequences. Current mining approaches either require frequent database scanning or the generation of several intermediate databases. As databases may fit into the ever-increasing main memory, efficient memory-based discovery of sequential patterns is becoming possible. In this paper, we propose a memory indexing approach for fast sequential pattern mining, named MEMISP. During the whole process, MEMISP scans the sequence database only once to read data sequences into memory. The find-then-index technique is recursively used to find the items that constitute a frequent sequence and constructs a compact index set which indicates the set of data sequences for further exploration. As a result of effective index advancing, fewer and shorter data sequences need to be processed in MEMISP as the discovered patterns get longer. Moreover, we can estimate the maximum size of the total memory required, which is independent of the minimum support threshold, in MEMISP. Experimental results indicate that MEMISP outperforms both GSP and PrefixSpan (general version) without the need for either candidate generation or database projection. When the database is too large to fit into memory in a batch, we partition the database, mine patterns in each partition, and validate the true patterns in the second pass of database scanning. Experiments performed on extra-large databases demonstrate the good performance and scalability of MEMISP, even with very low minimum support. Therefore, MEMISP can efficiently mine sequence databases of any size, for any minimum support values.	en_US
dc.language.iso	en_US	en_US
dc.subject	data mining	en_US
dc.subject	sequential patterns	en_US
dc.subject	memory indexing	en_US
dc.subject	find-then-index	en_US
dc.subject	database partitioning	en_US
dc.title	Fast discovery of sequential patterns through memory indexing and database partitioning	en_US
dc.type	Article	en_US
dc.identifier.journal	JOURNAL OF INFORMATION SCIENCE AND ENGINEERING	en_US
dc.citation.volume	21	en_US
dc.citation.issue	1	en_US
dc.citation.spage	109	en_US
dc.citation.epage	128	en_US
dc.contributor.department	資訊工程學系	zh_TW
dc.contributor.department	Department of Computer Science	en_US
dc.identifier.wosnumber	WOS:000226824900006	-
dc.citation.woscount	21	-
Appears in Collections:	Articles

Files in This Item:

000226824900006.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.