標題: | Knowledge acquisition through information granulation for imbalanced data |
作者: | Su, CT Chen, LS Yih, YW 工業工程與管理學系 Department of Industrial Engineering and Management |
關鍵字: | information granulation;fuzzy ART;granular computing;knowledge acquisition;imbalanced data |
公開日期: | 1-十月-2006 |
摘要: | When learning from imbalanced/skewed data, which almost all the instances are labeled as one class while far few instances are labeled as the other class, traditional machine learning algorithms tend to produce high accuracy over the majority class but poor predictive accuracy over the minority class. This paper proposes a novel method called 'knowledge acquisition via information granulation' (KAIG) model which not only can remove some unnecessary details and provide a better insight into the essence of data but also effectively solve 'class imbalance' problems. In this model, the homogeneity index (H-index) and the undistinguishable ratio (U-ratio) are successfully introduced to determine a suitable level of granularity. We also developed the concept of sub-attributes to describe granules and tackle the overlapping among granules. Seven data sets from UCI data bank, including one imbalanced diagnosis data (pima-Indians-diabetes), are provided to evaluate the effectiveness of KAIG model. By using different performance indexes, overall accuracy, G-mean and Receiver Operation Characteristic (ROC) curve, the experimental results comparing with C4.5 and Support Vector Machine (SVM) demonstrate the superiority of our method. (c) 2005 Elsevier Ltd. All rights reserved. |
URI: | http://dx.doi.org/10.1016/j.eswa.2005.09.082 http://hdl.handle.net/11536/11716 |
ISSN: | 0957-4174 |
DOI: | 10.1016/j.eswa.2005.09.082 |
期刊: | EXPERT SYSTEMS WITH APPLICATIONS |
Volume: | 31 |
Issue: | 3 |
起始頁: | 531 |
結束頁: | 541 |
顯示於類別: | 期刊論文 |