標題: Using Experimental Design to Determine the Re-Sampling Strategy for Developing a Classification Model for Imbalanced Data
作者: Tong, Lee-Ing
Chang, Yung-Chia
Lin, Shan-Hui
工業工程與管理學系
Department of Industrial Engineering and Management
關鍵字: re-sampling strategy;imbalanced data;classifier;machine learning
公開日期: 2009
摘要: Imbalanced data are often found in many real-world applications in machine learning. In an imbalanced data set, the number of instances in at least one class is significantly greater or smaller than that in other classes. Consequently, when developing a classification model with imbalanced data, most classifiers are subjected to the unequal number of instances in each class and thereby fail to construct an accurate classification model. Balance the sample sizes from different classes using re-sampling strategy is a common approach to enhance the accuracy of a classification model for an imbalanced data. Many studies utilized try-and-error method to determine the appropriate sampling proportion in each class for imbalanced data. The try-and-error method may not effectively classify the imbalanced data if the sampling strategy determined by the try-and-error method does not include the optimal sampling strategy. The conventional under-sampling strategy or over-sampling strategy determines just a specified sampling strategy. If the optimal sampling proportion for each class is not the specific sampling strategy determined by over-sampling approach or under-sampling approach, the classifiers cannot develop an effective classification model either. This study proposes a procedure to determine the optimal re-sampling strategy using design of experiments (D.O.E.). The proposed procedure can be utilized by any classifier. Finally, the classification model based on the training data obtained from the proposed procedure is verified to be more accurate than that obtained using the try-and-error method, over-sampling approach or under-sampling approach.
URI: http://hdl.handle.net/11536/13012
ISSN: 1539-2023
期刊: PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON INFORMATION AND MANAGEMENT SCIENCES
Volume: 8
起始頁: 646
結束頁: 648
顯示於類別:會議論文