標題: 於資料串流上基於動態網格的分群演算法
Clustering Evolving Data Stream Based on Dynamic Grids
作者: 王偉任
Wang, Wei-Jeng
李素瑛
Lee, Suh-Yin
資訊科學與工程研究所
關鍵字: 分群;資料串流;stream;clustering;density grid
公開日期: 2011
摘要: 近年來,由於資訊科學的發展和相關設備的進步,資料串流已成為普遍的資料型態。如何在無限且動態的資料串流上進行分群,並擷取出有意義的資料特徵,此問題已經引起重大的關注。雖然在此議題上已有相當的研究發表,多數的方法都需要在起始時給予適當的參數設定。然而在資料串流上,與一般靜態的資料不同,其資料特徵與分群資訊是動態而不穩定的,因此在起始的參數設定相當困難。處理資料串流是一個連續的程序,在不同時刻也可能需要不同的參數設定,固定參數的方法往往在其資料特徵改變時無法正確的反映與處理。本篇提出一個新穎的演算法,DGBC (動態網格分群法),用來對資料流進行分群。在過程中,該方法可以自動的調整所需要的參數,用以對應最新的資料與分群特徵。在合成資料和真實資料兩者上所進行的實驗結果均顯示 DGBC 不僅擁有較快的執行速度,所產生的分群結果也有較高的品質,同時對於起始參數的敏感度也較低。
Clustering multi-dimensional data stream is a difficult and important problem. The goal is to cluster the objects within the stream continuously, to discover and monitor the evolving up-to-dated events. Density grid based clustering algorithms are fast, and can discover arbitrarily shaped clusters and deal with noise. However, the sizes and borders of the grids easily influence Grid-based algorithms. We propose a Dynamic Grid-Based Clustering algorithm for high-dimensional data streams. When new data arrives, the grid structure is dynamically updated. Dynamic grid structures adjust its range and boundary on each dimension over time to produce effective clustering results with low memory usage. We used both synthetic and real data set for experiments, and the experimental results show that our proposed algorithm has superior quality and efficiency, can find clusters of arbitrary shapes, and can accurately recognize the evolving behaviors of real-time data stream
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079955519
http://hdl.handle.net/11536/50434
Appears in Collections:Thesis


Files in This Item:

  1. 551901.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.