標題: | 探勘時間間隔循序特徵樣式之相關研究 A Study on Time Interval-based Sequential Patterns Mining |
作者: | 陳以錚 Chen, Yi-Cheng 李素瑛 彭文志 Lee, Suh-Yin Peng, Wen-Chih 資訊科學與工程研究所 |
關鍵字: | 循序特徵樣式;封閉循序特徵樣式;漸增式樣式探勘;時間間隔特徵樣式;表示方法;sequential pattern;closed sequential pattern;incremental pattern mining;temporal pattern mining;representation |
公開日期: | 2011 |
摘要: | 循序特徵樣式因其廣泛的實用性,在資料探勘的領域中一直扮演著非常重要的角色,藉著探勘出存在的時間順序對許多相關實務有很大的幫助。但現今的演算法大都只考慮針對時間點的循序樣式探勘,並無時間的持續概念,處理時間間隔的相關演算法與應用領域一直為學者專家所忽略。本論文專注於探討時間間隔循序樣式之相關問題,研究如何設計正確的表示方法與有效率的探勘演算法,並討論所產生之相關技術。表示方法(representation)是在處理時間間隔序列時,最基礎的問題。對以時間間隔為基礎的循序樣式而言,單純依照發生時間的所排序出的前後關係,並無法表示出一個完整的時間間隔循序樣式。本論文改進目前幾種表示法的缺點,在有效利用儲存空間的前提之下,提出了兩種表示法:同時片段表示法(coincidence representation)與端點表示法(endpoint representation),能有效表達樣式中間隔彼此的關係,並且避免產生混淆問題(ambiguous problem)。以片段表示法為基礎,我們設計了一個能在大型資料庫中,有效率探勘時間間隔特徵樣式的演算法(CTMiner)。本論文也考慮了相關變化型樣式,設計出了以端點表示法為基礎,探勘封閉式時間間隔特徵樣式的演算法(CEMiner)與漸增式探勘時間間隔特徵樣式的演算法(Inc_CTMiner)。從合成與真實測資的實驗結果可得知,我們所提出的三種演算法,都能有效率且正確地找出所有的相關時間間隔特徵樣式,並且只需少量的記憶體使用量。本論文也將三種演算法實際應用於現實生活的資料上,找出有用的時間間隔特徵樣式,以證明演算法的實用性。 Sequential pattern mining is a key issue in data mining. However, most of the previous studies are mainly focused on time point-based event data. Little attention has been paid to mining patterns from time interval-based event data, where each event persists for a time interval. Since intervals may overlap, the relation between any two intervals is intrinsically complex. In this dissertation, we propose two new representations, coincidence representation and endpoint representation to simplify the processing of complex relations among event intervals. Then, three efficient algorithms, CTMiner, CEMiner and Inc_CTMiner, are developed to discover several types of temporal patterns from interval-based data. Based on coincidence representation, an efficient algorithm, CTMiner is developed to discover frequent temporal patterns from interval-based data. The algorithm employs two pruning techniques to reduce the search space effectively. The mining of closed sequential patterns has attracted researchers for its capability of using compact results to preserve the same expressive power as conventional mining. In this dissertation, a novel algorithm, CEMiner, is developed to discover closed temporal patterns based on endpoint representation. Algorithm CEMiner also utilizes some optimization technique to reduce the search space in processing. In several real-life applications, sequence databases generally update incrementally with time. A number of discovered sequential patterns may be invalidated, and a number of new patterns may be introduced by the evolution on the database. We proposed an efficient algorithm, Inc_CTMiner to incrementally mine temporal patterns in interval database. Moreover, the algorithm employs some optimization techniques to reduce the search space effectively. The experimental studies indicate that all proposed algorithms are efficient and scalable and outperforms the state-of-the-art algorithms. The improvement of proposed pruning strategies also has been discussed. Furthermore, we also apply our algorithms on real data to show the efficiency and validate the practicability of interval-base temporal mining. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT079555852 http://hdl.handle.net/11536/41431 |
顯示於類別: | 畢業論文 |