Full metadata record
DC FieldValueLanguage
dc.contributor.author苟富昇en_US
dc.contributor.authorGou, Fu-Shengen_US
dc.contributor.author李嘉晃en_US
dc.contributor.authorLee, Chia-Hoangen_US
dc.date.accessioned2014-12-12T01:59:54Z-
dc.date.available2014-12-12T01:59:54Z-
dc.date.issued2011en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#GT079957541en_US
dc.identifier.urihttp://hdl.handle.net/11536/50607-
dc.description.abstract由於科技的進步,網路的發展,造成網路上的文件量迅速攀升,如何讓使用者快速和正確的得到所需的資訊,成為一項重要的研究議題。在網路上可以輕易取得許多未標記資料;然而監督式學習方法,需要給足夠標記的資料訓練模型,資料標記往往需要浪費大量人力以及時間;而非監督式學習方法雖然不需要標記資訊,但是往往使用者在分群之前已經有些背景知識,理論上這些知識應該加入系統,讓系統可以將分群導向正確方向,所以本論文加入少許標記資料,利用這些已知的資訊,來達到更好的效果,同時不用介入過多的人力來幫助標記資料。 本論文提出了一個半監督式的學習法,同時兼具了降維與分群,本論文的方法透過Constrained-pLSA去取得每筆文件的群別機率歸屬值,再利用這個歸屬值去結合LDA﹙Linear Discriminant Analysis﹚,去尋找一個好的特徵空間,使其分群效果提升。本論文在實際的問題上,使用了CiteUlike、20Newsgroups及Reuters資料集做分析,使用本論文提出的方法,將高維度的資料集降到低維度,再來分群,最後實驗的結果顯示只需要少許的標記資料就可以讓本論文提出的方法有不錯的效果。zh_TW
dc.description.abstractDocument classification is of great practical importance today given the massive volume of online text available. Supervised learning is one of the popular techniques for tackling document classification problems. However, sufficient labeled data is necessary for supervised learning methods to train a classification model. Labeling must typically be done manually and it is a time-consuming process obviously. In general, unlabeled data may be relatively easy to collect. Although unsupervised learning methods don’t need any labeled data, users often have some background knowledge before clustering. Practically, background knowledge should be considered in the algorithms to improve clustering accuracy. This paper proposes a semi-supervised learning algorithm, which considers dimension reduction and clustering simultaneously. This paper applies constrained-pLSA to obtain soft labels , and then combines soft labels with linear discriminant analysis to find a better feature space. We conduct experiments on CiteUlike, 20Newsgroups, Reuters and experimental results indicate that the proposed method can effectively improve clustering performance.en_US
dc.language.isozh_TWen_US
dc.subject機器學習zh_TW
dc.subject半監督式學習zh_TW
dc.subject資料降維zh_TW
dc.subjectmachine learningen_US
dc.subjectsemi-supervised learningen_US
dc.subjectdimensional reductionen_US
dc.title基於Constrained-pLSA之半監督式判別分群zh_TW
dc.titleSemi-Supervised Discriminant Clustering via Constrained-pLSAen_US
dc.typeThesisen_US
dc.contributor.department多媒體工程研究所zh_TW
Appears in Collections:Thesis