Using Experimental Design to Determine the Re-Sampling Strategy for Developing a Classification Model for Imbalanced Data

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.author	Tong, Lee-Ing	en_US
dc.contributor.author	Chang, Yung-Chia	en_US
dc.contributor.author	Lin, Shan-Hui	en_US
dc.date.accessioned	2014-12-08T15:18:00Z	-
dc.date.available	2014-12-08T15:18:00Z	-
dc.date.issued	2009	en_US
dc.identifier.issn	1539-2023	en_US
dc.identifier.uri	http://hdl.handle.net/11536/13012	-
dc.description.abstract	Imbalanced data are often found in many real-world applications in machine learning. In an imbalanced data set, the number of instances in at least one class is significantly greater or smaller than that in other classes. Consequently, when developing a classification model with imbalanced data, most classifiers are subjected to the unequal number of instances in each class and thereby fail to construct an accurate classification model. Balance the sample sizes from different classes using re-sampling strategy is a common approach to enhance the accuracy of a classification model for an imbalanced data. Many studies utilized try-and-error method to determine the appropriate sampling proportion in each class for imbalanced data. The try-and-error method may not effectively classify the imbalanced data if the sampling strategy determined by the try-and-error method does not include the optimal sampling strategy. The conventional under-sampling strategy or over-sampling strategy determines just a specified sampling strategy. If the optimal sampling proportion for each class is not the specific sampling strategy determined by over-sampling approach or under-sampling approach, the classifiers cannot develop an effective classification model either. This study proposes a procedure to determine the optimal re-sampling strategy using design of experiments (D.O.E.). The proposed procedure can be utilized by any classifier. Finally, the classification model based on the training data obtained from the proposed procedure is verified to be more accurate than that obtained using the try-and-error method, over-sampling approach or under-sampling approach.	en_US
dc.language.iso	en_US	en_US
dc.subject	re-sampling strategy	en_US
dc.subject	imbalanced data	en_US
dc.subject	classifier	en_US
dc.subject	machine learning	en_US
dc.title	Using Experimental Design to Determine the Re-Sampling Strategy for Developing a Classification Model for Imbalanced Data	en_US
dc.type	Proceedings Paper	en_US
dc.identifier.journal	PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON INFORMATION AND MANAGEMENT SCIENCES	en_US
dc.citation.volume	8	en_US
dc.citation.spage	646	en_US
dc.citation.epage	648	en_US
dc.contributor.department	工業工程與管理學系	zh_TW
dc.contributor.department	Department of Industrial Engineering and Management	en_US
dc.identifier.wosnumber	WOS:000270433200122	-
顯示於類別：	會議論文