標題: | Using Experimental Design to Determine the Re-Sampling Strategy for Developing a Classification Model for Imbalanced Data |
作者: | Tong, Lee-Ing Chang, Yung-Chia Lin, Shan-Hui 工業工程與管理學系 Department of Industrial Engineering and Management |
關鍵字: | re-sampling strategy;imbalanced data;classifier;machine learning |
公開日期: | 2009 |
摘要: | Imbalanced data are often found in many real-world applications in machine learning. In an imbalanced data set, the number of instances in at least one class is significantly greater or smaller than that in other classes. Consequently, when developing a classification model with imbalanced data, most classifiers are subjected to the unequal number of instances in each class and thereby fail to construct an accurate classification model. Balance the sample sizes from different classes using re-sampling strategy is a common approach to enhance the accuracy of a classification model for an imbalanced data. Many studies utilized try-and-error method to determine the appropriate sampling proportion in each class for imbalanced data. The try-and-error method may not effectively classify the imbalanced data if the sampling strategy determined by the try-and-error method does not include the optimal sampling strategy. The conventional under-sampling strategy or over-sampling strategy determines just a specified sampling strategy. If the optimal sampling proportion for each class is not the specific sampling strategy determined by over-sampling approach or under-sampling approach, the classifiers cannot develop an effective classification model either. This study proposes a procedure to determine the optimal re-sampling strategy using design of experiments (D.O.E.). The proposed procedure can be utilized by any classifier. Finally, the classification model based on the training data obtained from the proposed procedure is verified to be more accurate than that obtained using the try-and-error method, over-sampling approach or under-sampling approach. |
URI: | http://hdl.handle.net/11536/13012 |
ISSN: | 1539-2023 |
期刊: | PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON INFORMATION AND MANAGEMENT SCIENCES |
Volume: | 8 |
起始頁: | 646 |
結束頁: | 648 |
Appears in Collections: | Conferences Paper |