標題: 多領域同時性循序序列模式探勘
Multi-domain Simultaneous Sequential Pattern Mining
作者: 胡星垣
Hsing-Yuan Hu
彭文志
Wen-Chih Peng
資訊科學與工程研究所
關鍵字: 資料探勘;分散式資料探勘;循序序列模式探勘;Data mining;distributed data mining;sequential pattern mining
公開日期: 2005
摘要: 循序序列模式探勘(Sequential pattern mining)是一個資料探勘中相當重要的研究課題,循序序列模式探勘的問題在於:在一堆循序序列中找出頻繁出現的序列模式。過去關於循序序列探勘的研究,大多只探討關於一個領域(domain)的循序序列模式,例如:找出購買習慣、網頁瀏覽常式或是頻繁移動模式。事實上,在不同領域的循序序列模式如果發生在相同的時間,則這些循序序列模式可以形成一個多領域同時性循序序列模式(Multi-domain simultaneous sequential pattern)。相較於傳統的單一領域循序序列模式,多領域同時性循序序列模式可以更完整的反應出一個使用者的行為模式,因此探勘多領域同時性循序序列模式有其必要性。在本論文中,我們提出一個以模式傳遞(pattern-propagation)為基礎的演算法,取名為PropagatedMine,並利用該演算法有效率的探勘多領域同時性循序序模式。藉由從起始領域(starting domain)開始遞循序序列模式發生的時間到其它的領域的方式,我們所提出的演算法可以明顯的降低探勘的空間,也因此大量的減少探勘多領域同時性循序序列模式的成本。此外,執行PropagatedMine演算法的成本和傳遞循序序列模式到不同領域的順序有很大的相關性,在本研究中,我們進一步發展一個對傳遞循序序列模式的順序做最佳化的機制。實驗結果顯示PropagatedMine演算法可以有效率的探勘多領域同時性循序序列模式;以及針對傳遞順序做最佳化的PropagatedMine演算法可以更進一步的提高探勘的效能。
Sequential pattern mining has attracted a significant amount of research efforts recently. The problem of sequential pattern mining is that discovering frequent sequences with their occurrence counts being larger than or equal to the user-specified number, min_support, among a set of sequences. Most of the previously sequential pattern mining methods only explore mining sequential patterns in one domain, such as buy behavior, Web browsing, and moving patterns. In reality, sequential patterns may exist in multiple sequence databases and for these sequential patterns in each sequence database, if the occurrences of these sequential patterns appear at the same time, these sequential patterns are able to form a multi-domain simultaneous sequential pattern. Note that mining multi-domain simultaneous sequential patterns is very important in that simultaneous sequential patterns reflect the complete behavior of users. In this paper, we propose a propagation-based approach (referred to as algorithm PropogatedMine) for efficient mining of multi-domain sequential patterns. By propagating patterns with their occurrences of time from one starting domain to other domains, our proposed approach is able to significantly reduce the mining space, which improves the performance of mining multi-domain sequential patterns. Note that the cost of performing PropagatedMine is greatly affected by the propagation order. Thus, in this paper, we further develop a novel method to determine the optimized propagation order. A comprehensive performance study is conducted and experimental results show that algorithm PropagatedMine is able to efficiently mine multi-domain sequential patterns. Moreover, algorithm PropagatedMine with an optimized propagation order is able to further improve the performance in mining multi-domain sequential patterns and the performance of the optimized propagation order determined by our proposed method is very close to that of the optimal one resulted by selecting the minimal cost among all possible propagation orders.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009217621
http://hdl.handle.net/11536/74268
Appears in Collections:Thesis


Files in This Item:

  1. 762101.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.