標題: Fast discovery of sequential patterns through memory indexing and database partitioning
作者: Lin, MY
Lee, SY
資訊工程學系
Department of Computer Science
關鍵字: data mining;sequential patterns;memory indexing;find-then-index;database partitioning
公開日期: 1-一月-2005
摘要: Sequential pattern mining is a challenging issue because of the high complexity of temporal pattern discovering from numerous sequences. Current mining approaches either require frequent database scanning or the generation of several intermediate databases. As databases may fit into the ever-increasing main memory, efficient memory-based discovery of sequential patterns is becoming possible. In this paper, we propose a memory indexing approach for fast sequential pattern mining, named MEMISP. During the whole process, MEMISP scans the sequence database only once to read data sequences into memory. The find-then-index technique is recursively used to find the items that constitute a frequent sequence and constructs a compact index set which indicates the set of data sequences for further exploration. As a result of effective index advancing, fewer and shorter data sequences need to be processed in MEMISP as the discovered patterns get longer. Moreover, we can estimate the maximum size of the total memory required, which is independent of the minimum support threshold, in MEMISP. Experimental results indicate that MEMISP outperforms both GSP and PrefixSpan (general version) without the need for either candidate generation or database projection. When the database is too large to fit into memory in a batch, we partition the database, mine patterns in each partition, and validate the true patterns in the second pass of database scanning. Experiments performed on extra-large databases demonstrate the good performance and scalability of MEMISP, even with very low minimum support. Therefore, MEMISP can efficiently mine sequence databases of any size, for any minimum support values.
URI: http://hdl.handle.net/11536/25377
ISSN: 1016-2364
期刊: JOURNAL OF INFORMATION SCIENCE AND ENGINEERING
Volume: 21
Issue: 1
起始頁: 109
結束頁: 128
顯示於類別:期刊論文


文件中的檔案:

  1. 000226824900006.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。