Full metadata record
DC FieldValueLanguage
dc.contributor.authorLin, MYen_US
dc.contributor.authorLee, SYen_US
dc.date.accessioned2014-12-08T15:36:58Z-
dc.date.available2014-12-08T15:36:58Z-
dc.date.issued2005-01-01en_US
dc.identifier.issn1016-2364en_US
dc.identifier.urihttp://hdl.handle.net/11536/25377-
dc.description.abstractSequential pattern mining is a challenging issue because of the high complexity of temporal pattern discovering from numerous sequences. Current mining approaches either require frequent database scanning or the generation of several intermediate databases. As databases may fit into the ever-increasing main memory, efficient memory-based discovery of sequential patterns is becoming possible. In this paper, we propose a memory indexing approach for fast sequential pattern mining, named MEMISP. During the whole process, MEMISP scans the sequence database only once to read data sequences into memory. The find-then-index technique is recursively used to find the items that constitute a frequent sequence and constructs a compact index set which indicates the set of data sequences for further exploration. As a result of effective index advancing, fewer and shorter data sequences need to be processed in MEMISP as the discovered patterns get longer. Moreover, we can estimate the maximum size of the total memory required, which is independent of the minimum support threshold, in MEMISP. Experimental results indicate that MEMISP outperforms both GSP and PrefixSpan (general version) without the need for either candidate generation or database projection. When the database is too large to fit into memory in a batch, we partition the database, mine patterns in each partition, and validate the true patterns in the second pass of database scanning. Experiments performed on extra-large databases demonstrate the good performance and scalability of MEMISP, even with very low minimum support. Therefore, MEMISP can efficiently mine sequence databases of any size, for any minimum support values.en_US
dc.language.isoen_USen_US
dc.subjectdata miningen_US
dc.subjectsequential patternsen_US
dc.subjectmemory indexingen_US
dc.subjectfind-then-indexen_US
dc.subjectdatabase partitioningen_US
dc.titleFast discovery of sequential patterns through memory indexing and database partitioningen_US
dc.typeArticleen_US
dc.identifier.journalJOURNAL OF INFORMATION SCIENCE AND ENGINEERINGen_US
dc.citation.volume21en_US
dc.citation.issue1en_US
dc.citation.spage109en_US
dc.citation.epage128en_US
dc.contributor.department資訊工程學系zh_TW
dc.contributor.departmentDepartment of Computer Scienceen_US
dc.identifier.wosnumberWOS:000226824900006-
dc.citation.woscount21-
Appears in Collections:Articles


Files in This Item:

  1. 000226824900006.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.