標題: 給定資料不同損失函式的提昇演算法
LevelBoost: A Boosting Algorithm Using Different Loss functions on Partitioned Data Sets
作者: 施昱安
Yu-An Shih
周志成
Dr. Chi-Cheng Jou
電控工程研究所
關鍵字: 提昇演算法;不同損失函式;分割資料集;boosting;different loss functions;partitioned data sets
公開日期: 2002
摘要: 適應性提昇 (AdaBoost) 是一種提昇演算法,它反覆的給資料點一個權重,並且根據這些權重來訓練一個基底分類器。由於它用的是指數損失函式,所以它會對雜訊點特別的敏感,給定雜訊點的權重將會成指數的成長,而演算法的表現也因此變壞。在這個研究中,我們提出一給定資料不同損失函式的提昇演算法;當資料點有可能是雜訊或者當它無法傳遞資料的主要資訊時,我們就給它比較少的著重。在實驗及真實資料的模擬裡,它有比適應性提昇演算法更好的表現;在某些規格時,它能有比較少的測試誤差,或是用較少的長度來達到同樣的測試誤差。而這個進步所需要的花費是更多的計算,因為我們必需先將資料點分開。
AdaBoost is a boosting algorithm that iteratively gives data weights and trains abase classifier based on the weighted data. It is sensitive to the noise because of its exponential loss function. The weights assigned to the noisy data points will grow exponentially and the performance is therefore degraded. In this research, we propose a boosting algorithm that uses different loss functions on the data points. Those points, which can be noisy ones or cannot represent the main information of the data, will get fewer emphases. The performance is shown to be better than AdaBoost in the experiments and the simulations on the real data sets. With some specifications, it has fewer test errors or needs fewer lengths to achieve the same test error. The cost of the improvement is that it needs more computation in the training procedure because we need to separate the data.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT910591081
http://hdl.handle.net/11536/71057
顯示於類別:畢業論文