標題: Big Active Learning
作者: Huang, Er-Chen
Pao, Hsing-Kuo
Lee, Yuh-Jye
應用數學系
Department of Applied Mathematics
關鍵字: active learning;high dimensionality;large-scale data;pool-based sampling
公開日期: 1-Jan-2017
摘要: Active learning is a common strategy to deal with large-scale data with limited labeling effort. In each iteration of active learning, a query is ready for oracle to answer such as what the label is for a given unlabeled data. Given the method, we can request the labels only for those data that are essential and save the labeling effort from oracle. We focus on pool-based active learning where a set of unlabeled data is selected for querying in each run of active learning. To apply pool-based active learning to massive high-dimensional data, especially when the unlabeled data set is much larger than the labeled set, we propose the APRAL and MLP strategies so that the computation for active learning can be dramatically reduced while keeping the model power more or less the same. In APRAL, we avoid unnecessary data re-ranking given an unlabeled data selection criteria. To further improve the efficiency, with MLP, we organize the unlabeled data in a multi-layer pool based on a dimensionality reduction technique and the most valuable data to know their label information are more likely to store in the top layers. Given the APRAL and MLP strategies, the active learning computation time is reduced by about 83% if compared to the traditional active learning ones; at the same time, the model effectiveness remains.
URI: http://hdl.handle.net/11536/147187
期刊: 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)
起始頁: 94
結束頁: 101
Appears in Collections:Conferences Paper