完整後設資料紀錄
DC 欄位語言
dc.contributor.author李炫勳en_US
dc.contributor.authorLi, Hsuan-Hsunen_US
dc.contributor.author李嘉晃en_US
dc.contributor.authorLee, Chia-Hoangen_US
dc.date.accessioned2014-12-12T01:59:23Z-
dc.date.available2014-12-12T01:59:23Z-
dc.date.issued2011en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#GT079955612en_US
dc.identifier.urihttp://hdl.handle.net/11536/50520-
dc.description.abstract網路上的資料相當龐大而且雜亂,多屬於未標記的非結構化資料,使得分析這些資料的複雜度非常高,無法單純用人力來完成,因此必須透過機器來幫助資料分類或分群,重新組織這些資訊,成為有結構的知識;分群與分類方法可以分為兩類,分別為監督式學習與非監督式學習;監督式學習方法,需要給足夠標記的資料訓練分類模型,資料標記往往需要浪費大量人力以及時間;而非監督式學習方法雖然不需要標記資料,但是往往使用者在分群之前已經有些背景知識,理論上這些知識應該加入系統,讓系統可快速有效的分群。所以本論文加入少量標記資料,利用這已知的資訊,來達到更好的分群效果,同時可以減少人力且能來幫助資料分群。 本論文提出Constrained-Nonnegative Matrix Factorization演算法,這是一種半監督式學習的演算法,透過少量標記資料做為限制條件,來提升整體分群效果。同時論文也設計一個Constrained-Fuzzy Cmeans演算法,只提供少量標記資訊,就能使效能明顯的提升。為了限制Constrained-Nonnegative Matrix Factorization在最佳的收斂範圍,論文運用Constrained-Fuzzy Cmeans來找到較佳的初始點,並透過標記資料設計限制條件,控制整體分群的分群效能,讓Constrained-Nonnegative Matrix Factorization有突出的效能表現。透過這樣的分群架構,實驗中我們比較其他半監督式方法,Constrained-Nonnegative Matrix Factorization確實展現了穩定且優越的效果。zh_TW
dc.description.abstractSemi-supervised clustering methods ,which aim to cluster the data set under the guidance of some supervisory information, have become a topic of significant research. The supervisory information is usually used as the constraints to bias clustering toward a good region of search space. In this paper, we propose a semi-supervised algorithm, Constrained-Nonnegative Matrix Factorization, with a small amount of labeled data as constraints to cluster data. The proposed algorithm is a matrix factorization algorithm. Intuitively a good initial point can speed up clustering convergence and may lead to a better local optimized solution. As the result, we devise an algorithm called Constrained-Fuzzy Cmeans algorithm to obtain initial point. The evaluation function is a key element to evaluate the solution calculated by Constrained-Nonnegative Matrix Factorization, so we have some discussions about the evaluation of Constrained-Nonnegative Matrix Factorization. Finally we conduct experiments on several data sets including CiteUlike, Classic3, 20Newgroups and Reuters, and compare with other semi-supervised learning algorithms. The experimental result indicate that the method we proposed can effectively improve clustering performance.en_US
dc.language.isozh_TWen_US
dc.subject機器學習zh_TW
dc.subject半監督式學習zh_TW
dc.subjectmachine learningen_US
dc.subjectsemi-supervised learningen_US
dc.subjectnon-negative matrix factorizationen_US
dc.title基於Constrained-Nonnegative Matrix Factorization之半監督式分群法zh_TW
dc.titleClustering with Labeled and Unlabeled Data Based on Constrained -Nonnegative Matrix Factorizationen_US
dc.typeThesisen_US
dc.contributor.department資訊科學與工程研究所zh_TW
顯示於類別:畢業論文