標題: 利用距離轉換於類別傳遞與其生醫分類問題之應用
Label Propagation Through Distance Transformation and Its Applications to Biomedical Classification Problems
作者: 丁子芸
胡毓志
Ting, Zih-Yun
Hu, Yuh-Jyh
生醫工程研究所
關鍵字: 半監督式學習;分類法;以距離為基礎的學習法;距離轉換;類別傳遞;Semi-supervised learning;classification;(dis)similarity-based learning;distance transformation;label propagation
公開日期: 2016
摘要: 監督式學習中的分類問題是機器學習領域中的一個重要議題,其目的在於透過分析既有資料產出預測未知資料類別的能力。監督式學習目前已廣泛使用於許多領域,但此學習法必需使用足夠數量的已標記類別資料做為訓練資料,才能使分類器具備準確的預測能力,然而在實際應用上,資料類別的標記較未標記類別資料的蒐集困難且耗時,因此我們常面臨擁有足夠未標記類別資料,但已標記類別資料不足的情況。為了解決此問題,學者們提出了半監督式學習法,透過少量已標記資料與已知的未標記資料建立預測準確率更好的預測模型。然而目前大部分的半監督式學習法只能處理向量型態資料,但在真實世界中,有許多以成對距離或相似度來表述的資料。因此,我們基於距離轉換與貝氏定理,提出一個全新類別傳遞機制的半監督式學習演算法,並能處理向量型態與距離型態資料。我們與其他常見的半監督式學習法進行比較實驗,而實驗結果顯示,不論預測對象為距離型態資料集或是向量型態資料集,我們所提出的方法確實優於其他類別傳遞法。
Supervised learning is an important topic in machine learning. By analyzing previously labeled data, it predicts the classes of unseen data. While supervised learning has been applied in various domains, its success relies on a sufficient amount of labeled data for training to warrant an acceptable prediction performance. In practice, it is more expensive in time and labor to collect labeled data than unlabeled data. Consequently, we are often required to learn in situations where there are sufficient unlabeled data but only limited labeled data. Though significant efforts have been made to tackle this problem by proposing semi-supervised learning algorithms to utilize unlabeled data to produce better prediction models when labeled data are limited, most of them are limited to vectorial data. Nevertheless, there are a significant number of domain in which data are represented in proximity forms. We propose a semi-supervised learning algorithm with a new label propagation mechanism that is applicable to both vectorial and (dis)similarity data. The label propagation mechanism applies distance transformation and Bayes theorem. We compared the proposed method with other semi-supervised learning algorithms for both (dis)similarity data and vectorial data to demonstrate its superior performance.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070256701
http://hdl.handle.net/11536/139941
Appears in Collections:Thesis