標題: | DSM-PLW: Single-pass mining of path traversal patterns over streaming Web click-sequences |
作者: | Li, Hua-Fu Lee, Suh-Yin Shan, Man-Kwan 資訊工程學系 Department of Computer Science |
關鍵字: | web click-sequence streams;path traversal patterns;single-pass algorithm |
公開日期: | 14-七月-2006 |
摘要: | Mining Web click streams is an important data mining problem with broad applications. However, it is also a difficult problem since the streaming data possess some interesting characteristics, such as unknown or unbounded length, possibly a very fast arrival rate, inability to backtrack over previously arrived click-sequences, and a lack of system control over the order in which the data arrive. In this paper, we propose a projection-based, single-pass algorithm, called DSM-PLW (Data Stream Mining for Path traversal patterns in a Landmark Window), for online incremental mining of path traversal patterns over a continuous stream of maximal forward references generated at a rapid rate. According to the algorithm, each maximal forward reference of the stream is projected into a set of reference-suffix maximal forward references, and these reference-suffix maximal forward references are inserted into a new in-memory summary data structure, called SP-forest (Summary Path traversal pattern forest), which is an extended prefix tree-based data structure for storing essential information about frequent reference sequences of the stream so far. The set of all maximal reference sequences is determined from the SP-forest by a depth-first-search mechanism, called MRS-mining (Maximal Reference Sequence mining). Theoretical analysis and experimental studies show that the proposed algorithm has gently growing memory requirements and makes only one pass over the streaming data. (c) 2005 Elsevier B.V. All rights reserved. |
URI: | http://dx.doi.org/10.1016/j.comnet.2005.10.018 http://hdl.handle.net/11536/12029 |
ISSN: | 1389-1286 |
DOI: | 10.1016/j.comnet.2005.10.018 |
期刊: | COMPUTER NETWORKS |
Volume: | 50 |
Issue: | 10 |
起始頁: | 1474 |
結束頁: | 1487 |
顯示於類別: | 期刊論文 |