標題: | 基於證據累加的叢集整合技術之強韌化與功能延伸(II) Robustification and Functionality Extension of Evidence-Accumulation-Based Cluster Ensembles (II) |
作者: | 王才沛 Wang Tsaipei 國立交通大學資訊工程學系(所) |
關鍵字: | 叢集整合;證據累加;共識叢集;cluster ensemble;evidence accumulation;consensus clustering;co-association matrix |
公開日期: | 2010 |
摘要: | 叢集化是一個可以在沒有分類資訊的資料當中,將相關的資料點區分成叢集的方
法。叢集化演算法的種類很多,但並沒有一個方法可以對所有的資料與叢集性質都產
生好的結果。叢集整合(cluster ensemble) 技術是近年的一個新趨勢,其做法是對同一
組資料產生多個不同的叢集化結果,再結合這些個別結果來產生一個具有共識的、更
穩定也更能代表實際資料分佈的分群。叢集整合的優點最近已逐漸被證實,也有愈來
愈多的應用出現在不同的領域。
本計畫是我們執行中的國科會專題計畫(編號:98-2221-E-009-146-;題目:基於證
據累加的叢集整合技術之強韌化與功能延伸;期間:98 年8 月1 日至99 年7 月31 日)
的延續性計畫。這個計畫的整體目標是以證據累加叢集法(evidence-accumulation
clustering) --也就是基於co-association 矩陣的叢集整合方法--為基礎,研討改善其強韌
性與延伸其應用範圍的方法。本計畫的主要目標有兩項:(一)Co-association 矩陣的主
要限制及其運算複雜度。我們要利用co-association 矩陣的資訊重複性,發展降低運算
複雜度的方法,而提升其實用性。(二)我們將叢集整合應用到有特定叢集形狀的問題,
以利用叢集整合的優點於使用叢集演算法偵測線段叢集、面叢集、以及主曲線的演算
法,以改善其結果。我們預期這些研究成果將對發展叢集整合的應用有明顯的貢獻。 Clustering is a process that groups unlabeled data points into clusters. There are a large variety of clustering methods, but none can generate good clustering results for all types of data and cluster characteristics. Cluster ensemble is a new trend in recent years. Its approach is to generate multiple clustering results out of the same data set, and then combine the individual clustering results to form a consensus partition of the data that is more stable and more representative of the actual data distribution. As the benefits of cluster ensemble are gradually recognized in recent years, there are a growing number of applications in various fields. This project is a continuation of a current NSC project (No. 98-2221-E-009-146-; Title: Robustification and Functionality Extension of Evidence-Accumulation-Based Cluster Ensembles; Duration: 2009/8/1 to 2010/7/31). The overall goal is to start with evidence-accumulation clustering, that is, the clustering ensemble methods based on co-association matrices, and investigate methods that can improve its robustness and extend its applications. There are two main goals in this proposal: (1) The main drawback of co-association matrices is the high computational complexity. We plan to take advantage of information redundancy present in co-association matrices to develop methods that lead to reduces computational complexity, therefore enhancing its usefulness in practical problems. (2) We plan to apply cluster ensemble to clustering problems that identify clusters of particular shapes, so that the benefits of cluster ensemble can improve the results of detecting line-segment clusters, shell clusters, and principal curves. We expect the outcome of this project to contribute substantially to developing applications of cluster ensembles. |
官方說明文件#: | NSC99-2221-E009-179 |
URI: | http://hdl.handle.net/11536/100218 https://www.grb.gov.tw/search/planDetail?id=2121543&docId=339661 |
顯示於類別: | 研究計畫 |