標題: DSM-PLW: Single-pass mining of path traversal patterns over streaming Web click-sequences
作者: Li, Hua-Fu
Lee, Suh-Yin
Shan, Man-Kwan
資訊工程學系
Department of Computer Science
關鍵字: web click-sequence streams;path traversal patterns;single-pass algorithm
公開日期: 14-Jul-2006
摘要: Mining Web click streams is an important data mining problem with broad applications. However, it is also a difficult problem since the streaming data possess some interesting characteristics, such as unknown or unbounded length, possibly a very fast arrival rate, inability to backtrack over previously arrived click-sequences, and a lack of system control over the order in which the data arrive. In this paper, we propose a projection-based, single-pass algorithm, called DSM-PLW (Data Stream Mining for Path traversal patterns in a Landmark Window), for online incremental mining of path traversal patterns over a continuous stream of maximal forward references generated at a rapid rate. According to the algorithm, each maximal forward reference of the stream is projected into a set of reference-suffix maximal forward references, and these reference-suffix maximal forward references are inserted into a new in-memory summary data structure, called SP-forest (Summary Path traversal pattern forest), which is an extended prefix tree-based data structure for storing essential information about frequent reference sequences of the stream so far. The set of all maximal reference sequences is determined from the SP-forest by a depth-first-search mechanism, called MRS-mining (Maximal Reference Sequence mining). Theoretical analysis and experimental studies show that the proposed algorithm has gently growing memory requirements and makes only one pass over the streaming data. (c) 2005 Elsevier B.V. All rights reserved.
URI: http://dx.doi.org/10.1016/j.comnet.2005.10.018
http://hdl.handle.net/11536/12029
ISSN: 1389-1286
DOI: 10.1016/j.comnet.2005.10.018
期刊: COMPUTER NETWORKS
Volume: 50
Issue: 10
起始頁: 1474
結束頁: 1487
Appears in Collections:Articles


Files in This Item:

  1. 000237993700005.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.