標題: 時間區間事件之有效率的循序樣式探勘
An efficient interval-based sequential pattern mining
作者: 姜季強
Jiang, Ji-Chiang
李素瑛
Lee, Suh-Yin
資訊科學與工程研究所
關鍵字: 以時間區間為基礎的事件之探勘;同時發生的事件片段;時間樣式;Interval-based event mining;coincidence-slice;temporal patterns
公開日期: 2009
摘要: 現在已經發展的循序樣式探勘演算法皆假設事件的發生是在時間點上。然而,在現實生活上發生的事件通常是持續一段時間的,稱之為“以時間區間為基礎的事件”。但是由於時間區間事件間複雜的關係,造成了在設計有效率以時間區間事件為基礎之循序樣式探勘演算法上的困難。因此,我們提出了“同時發生的事件片段”的概念來解決時間區間事件間複雜關係的問題。首先根據時間區間的事件間“同時發生”的部份將時間區間事件切割成互斥的更小事件片段,即“同時發生”的一段時間區間內可能有許多事件片段,而原本的事件序列可表示成我們所提出新的事件序列表示方式:以“同時發生”時間排列的有序序列,稱之為“同時發生事件片段序列表示法”。因此,我們考慮事件片段間的相互關係變地相當簡單,即前後、同時。我們提出一個演算法 CTMiner 基於“同時發生事件片段序列表示法”來表示事件序列並利用知名的循序樣式探勘演算法 PrefixSpan 的概念來找出頻繁的時間區間事件循序樣式,並能完全避免產生候選樣式。最後,為了能理解頻繁的“同時發生事件片段序列”樣式的意義,我們利用關係序列來呈現此頻繁樣式中時間區間事件間所有的關係。並且,我們還根據“同時發生的事件片段”的特性,設計了一些策略來提升CTMiner演算法的效率。在實際的圖書館借閱資料和合成資料的實驗結果皆表現出此演算法的效率和適應性。
Existing sequential pattern mining algorithms assume that events occur instantaneously. However, events in real world applications usually have durations which are called interval-based events. But complex relationship among event intervals causes difficulty in designing an efficient interval-based event mining algorithm. Therefore, the concept of “coincidence-slice” is proposed to solve the problem caused by the complex relationship among event intervals. The event intervals are incised to disjoint smaller “event slices” according to the coincidences among event intervals, that is, several event slices may occur in the same time period called “coincidence”. Therefore, an original event sequence can be represented as a list of ordered “coincidences” which contains event slices. This new representation proposed is called “coincidence sequence representation”. We transform the problem of complex relationship among event interval to consider the simple relationship among event slices. The proposed interval-based sequential pattern mining algorithm called CTMiner is based on the “coincidence sequence representation”. The CTMier also uses the concept of well-known sequential pattern mining algorithm PrefixSpan to find temporal patterns without candidate generation. Finally, to comprehend the frequent temporal pattern represented by “coincidence sequence representation”, we discover and use relation list to present all the relationships in a pattern. We also implement some pruning strategies to improve the performance of CTMiner by considering the characteristics of the “Coincidence-slice”. Experiments on both synthetic datasets and real dataset of library lending indicate the efficiency and scalability of the proposed algorithm.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079655631
http://hdl.handle.net/11536/43439
Appears in Collections:Thesis


Files in This Item:

  1. 563101.pdf
  2. 563102.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.