標題: 有效的高利潤序列探勘
Mining High Utility Sequential Pattern Efficiently
作者: 楊宗樺
Yang, Zong-Hua
黃俊龍
Huang, Jiun-Long
資訊科學與工程研究所
關鍵字: 資料探勘;序列資料探勘;Utility mining;Sequence pattern mining
公開日期: 2011
摘要: 找尋高利潤的序列是近年來重要的議題之一,不同於傳統在資料庫內找尋高出現次數的序列,我們傾向尋找的序列可以獲取較高的利潤。這樣的序列可以幫助我們得知哪些序列可以幫助我們在市場交易上了解哪些序列是可以獲得較高的利潤而非重視交易量。找尋高利潤的序列被稱為高利潤序列探勘.近年來,大部分的研究都專注於在交易型資料庫上找尋高利潤組合。此篇論文,我們專注在序列型資料庫內找尋高利潤序列。 我們提了新的探勘框架UPrefixSpan。UPrefixSpan不像大部分的方法需要額外的資料庫掃描。使用UPrefixSpan找尋高利潤序列可以減少確認候選者序列是否為高利潤序列的時間。另外,我們也提了兩個策略使候選序列的數量減少,此處稱為DUPES和DUI策略。最後我們提出了CLexTree在記憶體內維護高利潤序列。在我們的實驗,DUEPS和DUI策略可以減少大量的候選序列,另外UPrefixSpan在執行時間上也有不錯的優勢。使用UPrefixSpan搭配DUEPS和DUI策略是目前我們所知最具擴展性的演算法在高利潤探勘上。
The discovery of high profitable pattern is the one of important issue recently. Instead of mining the pattern which occurs frequently in database, it can be found the pattern which has high profits. In the marketing analysis, the pattern which can be obtained great sale is more practical than the sale volume sometimes. The task of mining high profitable pattern is also called high utility pattern mining. In most recent studies, they focus on mining high utility pattern in transaction database. However, in this paper, we propose a novel mining framework UPrefixSpan to discover the high utility pattern in sequence database. Unlike the most of existing study, UPrefixSpan does not require additional database scanning to identify the candidate of high utility patterns. Moreover, we propose two discarding strategies, DUEPS and DUI strategies, to tight the estimated utility of mined patterns. Finally, we devise the novel tree structure CLexTree to maintain the candidates compactly in memory. In our experimental result, our discarding strategies make the mining algorithm can running effectively, and UPrefixSpan outperform the state-of-the-art algorithm substantially in term of running time and the number of potential high utility sequences. UPrefixSpan which mines with DUEPS and DUI strategies is the most scalable algorithm in our study.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079855562
http://hdl.handle.net/11536/48296
Appears in Collections:Thesis