標題: 基於中國餐廳過程之貝氏探索分群法
Bayesian Exploratory Clustering Based on Chinese Restaurant Process
作者: 林哲遠
Lin, Che-Yuan
李嘉晃
劉建良
Lee, Chia-Hoang
Liu, Chien-Liang
資訊科學與工程研究所
關鍵字: 中國餐廳過程;無母數模型;分群法;non-parametric;clustering;Chinese Restaurant Process
公開日期: 2015
摘要: 隨著科技日新月異,資料的數量也有著爆炸性的成長,因此資料的分群問題也逐漸變得重要。我們無法再透過人工的方式處理如此龐大數量的資料,所以必需利用電腦自動化完成人力所無法達成的,如此既快速、經濟又節省人力。本論文提出一個貝氏無母數非監督式學習法(Bayesian Non-parametric unsupervised Learning)。貝氏無母數方法在分群時不必事先決定資料群數,而是在分群的過程中讓資料自行決定需要分的群數。並且引用統計學裡的切比雪夫不等式 (Chebyshev's inequality) 的概念來調整先驗分布 (Prior) 的參數,使分群效果更佳。因此本論文延伸中國餐廳過程(Chinese Restaurant Process, CRP)進一步加入一個以統計觀點為基礎的計算方法,使分群的效果更為提升。此外,本論文在調整群數方面,提出了一個不同於原始 CRP 的方法,在決定資料點屬於哪一群時,若現有的資料群皆不適合該資料點時,則該資料點自成一群。最後實驗結果顯示本論文提出的方法表現優於其他非監督式學習法。
In big data era, data explorations is essential to data analytics, since it can provide data insight for the analysts. Therefore, data clustering plays an important role nowadays. This thesis proposes a Bayesian non-parametric unsupervised learning, in which the number of clusters does not need to be given before clustering. The proposed method let data speak by themselves, and the number of clusters is determined by observed data automatically. We use the concept of Chebyshev's inequality to set the prior parameters to yield better clustering results. Besides, this work proposes a novel way to create a new cluster based on entropy. The main difference between the proposed method and Chinese Restaurant Process is determined by existing clusters rather than a hyperparameter. The experimented results show that the proposed algorithm outperforms other unsupervised learning algorithms.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT070256098
http://hdl.handle.net/11536/127074
Appears in Collections:Thesis