標題: 不完整資料學習演算法使用支援向量機器
Learning from Incomplete Data using Support Vector Machines
作者: 許哲銘
Jer-Ming Hsu
胡毓志
Yuh-Jyh Hu
資訊科學與工程研究所
關鍵字: 不完整資料;支援向量機器;分類;學習演算法;missing data;incomplete data;support vector machines;EM;mixture model;classification
公開日期: 2001
摘要: 在實務上,學習工作經常得處理的高維度的資料,而且部分資料甚至是不完整的。一般的方法處理不完整的資料只是簡單地將遺失的部分填上平均值,這樣通常得不到理想的結果。而最近這幾年,支援向量機器(SVMs)在分類和回歸分析上的問題往往有相當不錯的表現,可是目前只能處理完整的資料,因此,我們改良支援向量機器的演算法,賦予它在訓練和預測階段有處理不完整的資料的能力。我們的方法首先需要資料的機率分佈來加權不完整的資料比重,而機率分佈則採用混合的模型,我們採用期望值最大化(EM)演算法來估量該機率分佈的參數。針對高斯混合模型,我們的SVMs採用radial basis kernel可以得到簡單和快速的計算方式。如果採用其他的機率模型或其他的kernel,我們依然可以採用蒙地卡羅法來得到結果。在我們的實驗裡,我們的方法得到比填入期望值和條件期望值來得更好的結果。
Real-world learning tasks often involve high-dimensional data sets with complex patterns of incomplete data. Conventional techniques to deal with missing data are simply to fill with means. In practice, it often fails to obtain satisfactory results. In recently years, support vector machines (SVMs) on various classification and regression problems usually have excellent performance. Thus, we analyze how incomplete data can be incorporated into the training and recalling of SVMs. In our approach, we need the data density to weight the incomplete data. We apply the expectation-maximization (EM) algorithm to estimate the parameters of the data density based on mixture modeling. For Gaussian mixtures model, the closed-form solution of SVMs with Gaussian radial basis kernel is derived. For other densities and kernel functions, the solution can be evaluated by Monte Carlo integration techniques. In our experimental results, we have obtained favorable results in comparison to unconditional mean imputation and conditional mean imputation.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT900394095
http://hdl.handle.net/11536/68624
顯示於類別:畢業論文