基於Constrained-pLSA之半監督式判別分群

Full metadata record

DC Field	Value	Language
dc.contributor.author	苟富昇	en_US
dc.contributor.author	Gou, Fu-Sheng	en_US
dc.contributor.author	李嘉晃	en_US
dc.contributor.author	Lee, Chia-Hoang	en_US
dc.date.accessioned	2014-12-12T01:59:54Z	-
dc.date.available	2014-12-12T01:59:54Z	-
dc.date.issued	2011	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#GT079957541	en_US
dc.identifier.uri	http://hdl.handle.net/11536/50607	-
dc.description.abstract	由於科技的進步，網路的發展，造成網路上的文件量迅速攀升，如何讓使用者快速和正確的得到所需的資訊，成為一項重要的研究議題。在網路上可以輕易取得許多未標記資料；然而監督式學習方法，需要給足夠標記的資料訓練模型，資料標記往往需要浪費大量人力以及時間；而非監督式學習方法雖然不需要標記資訊，但是往往使用者在分群之前已經有些背景知識，理論上這些知識應該加入系統，讓系統可以將分群導向正確方向，所以本論文加入少許標記資料，利用這些已知的資訊，來達到更好的效果，同時不用介入過多的人力來幫助標記資料。本論文提出了一個半監督式的學習法，同時兼具了降維與分群，本論文的方法透過Constrained-pLSA去取得每筆文件的群別機率歸屬值，再利用這個歸屬值去結合LDA﹙Linear Discriminant Analysis﹚，去尋找一個好的特徵空間，使其分群效果提升。本論文在實際的問題上，使用了CiteUlike、20Newsgroups及Reuters資料集做分析，使用本論文提出的方法，將高維度的資料集降到低維度，再來分群，最後實驗的結果顯示只需要少許的標記資料就可以讓本論文提出的方法有不錯的效果。	zh_TW
dc.description.abstract	Document classification is of great practical importance today given the massive volume of online text available. Supervised learning is one of the popular techniques for tackling document classification problems. However, sufficient labeled data is necessary for supervised learning methods to train a classification model. Labeling must typically be done manually and it is a time-consuming process obviously. In general, unlabeled data may be relatively easy to collect. Although unsupervised learning methods don’t need any labeled data, users often have some background knowledge before clustering. Practically, background knowledge should be considered in the algorithms to improve clustering accuracy. This paper proposes a semi-supervised learning algorithm, which considers dimension reduction and clustering simultaneously. This paper applies constrained-pLSA to obtain soft labels , and then combines soft labels with linear discriminant analysis to find a better feature space. We conduct experiments on CiteUlike, 20Newsgroups, Reuters and experimental results indicate that the proposed method can effectively improve clustering performance.	en_US
dc.language.iso	zh_TW	en_US
dc.subject	機器學習	zh_TW
dc.subject	半監督式學習	zh_TW
dc.subject	資料降維	zh_TW
dc.subject	machine learning	en_US
dc.subject	semi-supervised learning	en_US
dc.subject	dimensional reduction	en_US
dc.title	基於Constrained-pLSA之半監督式判別分群	zh_TW
dc.title	Semi-Supervised Discriminant Clustering via Constrained-pLSA	en_US
dc.type	Thesis	en_US
dc.contributor.department	多媒體工程研究所	zh_TW
Appears in Collections:	Thesis