完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.author | 李炫勳 | en_US |
dc.contributor.author | Li, Hsuan-Hsun | en_US |
dc.contributor.author | 李嘉晃 | en_US |
dc.contributor.author | Lee, Chia-Hoang | en_US |
dc.date.accessioned | 2014-12-12T01:59:23Z | - |
dc.date.available | 2014-12-12T01:59:23Z | - |
dc.date.issued | 2011 | en_US |
dc.identifier.uri | http://140.113.39.130/cdrfb3/record/nctu/#GT079955612 | en_US |
dc.identifier.uri | http://hdl.handle.net/11536/50520 | - |
dc.description.abstract | 網路上的資料相當龐大而且雜亂,多屬於未標記的非結構化資料,使得分析這些資料的複雜度非常高,無法單純用人力來完成,因此必須透過機器來幫助資料分類或分群,重新組織這些資訊,成為有結構的知識;分群與分類方法可以分為兩類,分別為監督式學習與非監督式學習;監督式學習方法,需要給足夠標記的資料訓練分類模型,資料標記往往需要浪費大量人力以及時間;而非監督式學習方法雖然不需要標記資料,但是往往使用者在分群之前已經有些背景知識,理論上這些知識應該加入系統,讓系統可快速有效的分群。所以本論文加入少量標記資料,利用這已知的資訊,來達到更好的分群效果,同時可以減少人力且能來幫助資料分群。 本論文提出Constrained-Nonnegative Matrix Factorization演算法,這是一種半監督式學習的演算法,透過少量標記資料做為限制條件,來提升整體分群效果。同時論文也設計一個Constrained-Fuzzy Cmeans演算法,只提供少量標記資訊,就能使效能明顯的提升。為了限制Constrained-Nonnegative Matrix Factorization在最佳的收斂範圍,論文運用Constrained-Fuzzy Cmeans來找到較佳的初始點,並透過標記資料設計限制條件,控制整體分群的分群效能,讓Constrained-Nonnegative Matrix Factorization有突出的效能表現。透過這樣的分群架構,實驗中我們比較其他半監督式方法,Constrained-Nonnegative Matrix Factorization確實展現了穩定且優越的效果。 | zh_TW |
dc.description.abstract | Semi-supervised clustering methods ,which aim to cluster the data set under the guidance of some supervisory information, have become a topic of significant research. The supervisory information is usually used as the constraints to bias clustering toward a good region of search space. In this paper, we propose a semi-supervised algorithm, Constrained-Nonnegative Matrix Factorization, with a small amount of labeled data as constraints to cluster data. The proposed algorithm is a matrix factorization algorithm. Intuitively a good initial point can speed up clustering convergence and may lead to a better local optimized solution. As the result, we devise an algorithm called Constrained-Fuzzy Cmeans algorithm to obtain initial point. The evaluation function is a key element to evaluate the solution calculated by Constrained-Nonnegative Matrix Factorization, so we have some discussions about the evaluation of Constrained-Nonnegative Matrix Factorization. Finally we conduct experiments on several data sets including CiteUlike, Classic3, 20Newgroups and Reuters, and compare with other semi-supervised learning algorithms. The experimental result indicate that the method we proposed can effectively improve clustering performance. | en_US |
dc.language.iso | zh_TW | en_US |
dc.subject | 機器學習 | zh_TW |
dc.subject | 半監督式學習 | zh_TW |
dc.subject | machine learning | en_US |
dc.subject | semi-supervised learning | en_US |
dc.subject | non-negative matrix factorization | en_US |
dc.title | 基於Constrained-Nonnegative Matrix Factorization之半監督式分群法 | zh_TW |
dc.title | Clustering with Labeled and Unlabeled Data Based on Constrained -Nonnegative Matrix Factorization | en_US |
dc.type | Thesis | en_US |
dc.contributor.department | 資訊科學與工程研究所 | zh_TW |
顯示於類別: | 畢業論文 |