一近似的費雪線性鑑別分析於分群的應用

標題:	一近似的費雪線性鑑別分析於分群的應用 An Approximate Fisher Linear Discriminant Analysis for Clustering
作者:	楊承綱 Yang, Cheng-Gang 周志成 Jou, Chi-Cheng 電控工程研究所
關鍵字:	分群演算法;主成份分析;費雪線性鑑別分析;clustering algorithm;principal component analysis;Fisher linear discriminant
公開日期:	2010
摘要:	在大量資料取得越來越容易的時代，資料分群顯得更為重要。分群的困難處在於每一筆資料都有多種統計數據，稱為特徵，我們如何選擇特徵或其組合尤其影響分群結果。主成份分析是一種常見的特徵提取方法，然而提取最大變異成分未必對分類或分群有最好的效果。本論文針對特徵提取進行改善，我們結合在分類應用上具有優秀特徵提取功能的費雪線性鑑別分析，與傳統的K-平均分群法(K-means)成一個近似費雪線性鑑別分析演算法(approximate Fisher linear discriminant, AFD)。先令K-平均分群後的結果作為已知類別，再利用費雪線性鑑別分析尋找最佳特徵，之後又使用此特徵重新分群再作費雪分析，又得到新分群結果的最佳特徵，如此反覆直到收斂。本論文選用兩種含有三個類別的資料Iris和Wine進行實驗，並根據真實類別比對分群結果的準確率。實驗結果發現，變異最大的成份雖保有原始資料最多的訊息，但並非都對分群有幫助，透過AFD演算法提取關鍵的特徵再進行分群，證實比主成份分析來的優秀，在相同的特徵數下能有較好的分群結果。 In the era we get the large amounts of data more and more easily, the data clustering becomes more and more important. The difficulty of clustering is that every case has many statistics which call features, how we choose these features or their combination will effect the clustering result extremely. Principal component analysis (PCA) is one of the common feature extraction methods, but extracting the components of maximum variance is uncertain best for both classification and clustering. This thesis focuses on improving the feature extraction, we combine Fisher linear discriminant (FLD) which can extract the features excellently for classification and the traditional K-means clustering to an approximate Fisher linear discriminant (AFD) algorithm. Let the K-means clustering result is the known class, then use FLD to find the best features, after that, use these features to cluster and then do FLD again, we also get the best features for this new clustering result. Repeat above process until convergence. This thesis chooses two kinds of the data, Iris and Wine, that have three classes to do experiment, and compare the clustering accuracy by the real class. By experiment we find that even though the components of maximum variance can contain the most information of the original data, but it is not useful for clustering. Extracting the key features by AFD algorithm to cluster is better than PCA, and in the same number of features AFD algorithm has better clustering result than PCA.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#GT079812606 http://hdl.handle.net/11536/46961
顯示於類別：	畢業論文

文件中的檔案：

260601.pdf

若為 zip 檔案，請下載檔案解壓縮後，用瀏覽器開啟資料夾中的 index.html 瀏覽全文。