標題: 在資料串流環境探勘高實用性項目集之研究
Efficient Mining of High Utility Itemsets on Data Streams
作者: 黃心韻
Hsin Yun Huang
李素瑛
Suh Yin Lee
資訊科學與工程研究所
關鍵字: 資料串流;滑動視窗;高實用性項目集;位元向量;data stream;sliding window;high utility itemsets;Bitvector
公開日期: 2006
摘要: 由於目前很多的應用如股市系統分析、線上交易等,資料都是以串流的形式產生,因此在資料串流環境中探勘有意義的樣式是一個很重要的課題。由於資料串流環境的限制,使探勘工作更為複雜。探勘高實用性項目集是近年來新崛起的一個議題,依據使用者感興趣的主題,找尋出使用者所需要的樣式。在這樣的問題中,每個item的單位價格以及每筆交易中每個item出現的個數可以是任意值,因此更加深了問題的複雜度。 在這篇論文中我們提出了MHUI_TransSW以及MHUI_TimeSW,有效率的在兩種滑動視窗的資料串流環境中探勘出高實用性的項目集。我們的方法,使用TIDlist或是位元向量去輔助紀錄item的資訊,再加上lexicographical tree的建立,改進了THUI-Mine演算法的效能。實驗結果也顯示出我們的方法,不管在時間還是空間上的使用,都能夠很有效率的在資料串流的環境中探勘出具有高實用性的項目集。
Since there are many applications in the form of data streams, such as sensor network, stock analysis, mining useful patterns from a data stream is an important issue nowadays. However, it is a difficult problem because of some limitations in the data stream environment. A new issue, called utility mining, for mining interesting pattern which is profitable for users is suggested in recent years. In the mining of high utility itemsets, the utility and the sales quantity of each item could be arbitrarily number, so many methods applied to frequent itemsets mining cannot be used anymore. In this thesis, we propose MHUI_TransSW and MHUI_TimeSW to mine high utility itemsets on a data stream in two types of sliding window. We use item information, i.e. TIDlist or Bitvector of 1-itemsets, and lexicographical tree to improve the efficiency of THUI-Mine. The experiment results show that our approach efficiently find the high utility itemsets not only in execution time but also in memory space.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009455503
http://hdl.handle.net/11536/82031
Appears in Collections:Thesis


Files in This Item:

  1. 550301.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.