標題: | Fast discovery of sequential patterns through memory indexing and database partitioning |
作者: | Lin, MY Lee, SY 資訊工程學系 Department of Computer Science |
關鍵字: | data mining;sequential patterns;memory indexing;find-then-index;database partitioning |
公開日期: | 1-一月-2005 |
摘要: | Sequential pattern mining is a challenging issue because of the high complexity of temporal pattern discovering from numerous sequences. Current mining approaches either require frequent database scanning or the generation of several intermediate databases. As databases may fit into the ever-increasing main memory, efficient memory-based discovery of sequential patterns is becoming possible. In this paper, we propose a memory indexing approach for fast sequential pattern mining, named MEMISP. During the whole process, MEMISP scans the sequence database only once to read data sequences into memory. The find-then-index technique is recursively used to find the items that constitute a frequent sequence and constructs a compact index set which indicates the set of data sequences for further exploration. As a result of effective index advancing, fewer and shorter data sequences need to be processed in MEMISP as the discovered patterns get longer. Moreover, we can estimate the maximum size of the total memory required, which is independent of the minimum support threshold, in MEMISP. Experimental results indicate that MEMISP outperforms both GSP and PrefixSpan (general version) without the need for either candidate generation or database projection. When the database is too large to fit into memory in a batch, we partition the database, mine patterns in each partition, and validate the true patterns in the second pass of database scanning. Experiments performed on extra-large databases demonstrate the good performance and scalability of MEMISP, even with very low minimum support. Therefore, MEMISP can efficiently mine sequence databases of any size, for any minimum support values. |
URI: | http://hdl.handle.net/11536/25377 |
ISSN: | 1016-2364 |
期刊: | JOURNAL OF INFORMATION SCIENCE AND ENGINEERING |
Volume: | 21 |
Issue: | 1 |
起始頁: | 109 |
結束頁: | 128 |
顯示於類別: | 期刊論文 |