標題: 基於Semi-AdaBoost.MH加上 Universum Example之文件分類法
Document Classification based on Semi-AdaBoost.MH with Universum Example
作者: 郭宗勳
Kuo, Tsung-Hsun
李嘉晃
Lee, Chia-Hoang
資訊科學與工程研究所
關鍵字: 機器學習;半監督式學習;machine learning;semi-supervised learning;AdaBoost.MH;Universum
公開日期: 2011
摘要: 半監督式學習法雖然已經在機器學習領域上證明了它的成功之處,但如果labeled資料極少時,還是有可能會影響到它的分類效能。Universum是一個新穎的概念,代表一群不屬於所要分類類別的資料集合,本論文提出一個半監督式學習結合Universum的方法,希望利用Universum所提供的prior knowledge,來解決傳統半監督式學習法所會遇到的問題。   本論文從Boosting的角度著手,並進而提出以confidence來解釋Universum 為什麼可以輔助分類,而且我們提出的confidence與U-SVM的margin概念不謀而合。此外,本研究更進一步分析什麼樣的資料當Universum是會損害分類效能。在實驗部分,我們使用三種文章集進行實驗,並驗證了當labeled資料愈少時,加入Universum就愈能發揮它的影響力;且只要選用的Universum不偏向某一欲分類類別文章時,我們所提的方法不僅可以贏過原本的半監督式學習法,也可以贏過其它同樣使用Universum的半監督式學習法。
  Although Semi-Supervised learning has achieved a great success in the domain of machine learning, the classification performance may be affected when only a small amount of labeled examples is available. Universum, a collection of non-examples that do not belong to either class of interest, has become a new research topic in machine learning. This paper proposes a Semi-Supervised learning method with Universum to improve classification problems in which Universum is viewed as prior of the data set.   This paper devises a Semi-Supervised learning with Universum algorithm based on Boosting technique, and proposes to use confidence to illustrate why Universum can improve classification performance. Moreover, the concept of confidence can correspond to the margin of U-SVM. Finally, we further analyze which data as Universum damages classification performance. In the experimental section, we use three data corpora to conduct experiments. The experimental results indicate that the fewer labeled examples we have, the more influential Universum is. If the data distribution of Universum does not bias either class of interest, the proposed method can outperform original Semi-Supervised method and several Semi-Supervised methods with Universum.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079955606
http://hdl.handle.net/11536/50514
Appears in Collections:Thesis