標題: 利用距離轉換於以配對距離為基礎之主動學習
Applying Distance Transformation to Pairwise Distance-based Active Learning
作者: 王祥安
胡毓志
Wang, Hsiang-An
Hu, Yuh-Jyh
資訊科學與工程研究所
關鍵字: 分類;轉導式;歸納式;主動學習;徵詢;classification;transductive;inductive;active learning;inquiry
公開日期: 2016
摘要: 分類在監督式學習之中是重要的議題之一,大部份的分類演算法都只能處理向量型態的資料,然而,除了向量型態的資料,在生物資訊、醫療看護以及影像分析等研究領域中相似度(距離)型態的資料是很常見的。為使分類器能夠更彈性地處理不同種類的資料,我們以資料之間的距離為基礎發展一種轉導式學習演算法TransD。TransD演算法的優點是能夠處理向量型態的資料以及距離型態的資料,但它有一些先天性的限制,這些限制就是它需要同時擁有訓練資料以及測試資料才能進行分類任務。在本篇論文當中,我們首先擴展TransD演算法將其改為歸納式方法以消除轉導式 TransD的限制。實驗結果顯示歸納式TransD仍然保有和其他主要分類器同等的預測準確率。此外,由於科技的進步,我們越來越容易從各個領域獲得原始的資料,可是當中卻只有少部分有經過整理的資料可供監督式機器學習方法使用,基於此動機,我們希望發展主動式學習為基礎的TransD。透過使用和研究目前可得的未標計資料,我們找出合適的未標計資料並徵詢得到它們真實的類別,我們預期這個方法能夠增加已標記資料的數量。我們設計一個貝氏方法評估類別機率值和未標記資料的重要性,使我們能夠更智慧地徵詢到關鍵性的資料點。實驗結果顯示我們提出的主動學習方式能夠將未標計資料有效轉換成已標記的訓練資料,並且能夠有效改進監督式學習方法的正確率。
Classification is an important subject of supervised learning. Most current classification algorithm were designed to process vectorial data. Nevertheless, in addition to vector-based data, proximity-based data are also common in various fields such as biology, medicine, and imaging. To expand the flexibility of classifiers, a transductive distance-based learner, TransD, has been developed. Despite its capability of processing both vectorial and proximity data, TransD has an inherent limitation of transduction, that is, it requires the co-existance of training data and test data to perform classification.In this study, we first extendedTransD to an inductive version, which relaxes the constraint of transductive TransD. The experimental results show that inductive TransD maintains thepredictive accuracy comparable to those of other major classifiers. Motivated by the observation thatit becomes easier to acquire raw data from various fields as the advance of technologies, and yet only a small portion of the data has been classified before they can beput to use, we aimed to develop a new active learning methodbased on TransD. By exploiting and exploring the available unlabeled data, we intended to increase the labeled data set size by inquiring the real classes of the appropriate unlabeled data. We designed a Bayesian approach for class probability estimation to make intelligent inquiries into the real classes of unlabeled data.The experimental results show that the proposed active learning method can turn eligible unlabeled data into labeled training data to improve supervised learning.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070256020
http://hdl.handle.net/11536/138847
顯示於類別:畢業論文