改善多類別不平衡資料之分類準確率

Full metadata record

DC Field	Value	Language
dc.contributor.author	林子硯	en_US
dc.contributor.author	Lin, Tzu-Yen	en_US
dc.contributor.author	唐麗英	en_US
dc.contributor.author	洪瑞雲	en_US
dc.date.accessioned	2015-11-26T01:02:20Z	-
dc.date.available	2015-11-26T01:02:20Z	-
dc.date.issued	2015	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#GT070253328	en_US
dc.identifier.uri	http://hdl.handle.net/11536/127328	-
dc.description.abstract	在預測不同類別的資料時，一般的做法是從過去已知類別的資料中，依據各類別資料之特性建構分類模型，再藉此模型預測新資料的類別。然而在實際的類別資料中，通常某一類別的資料數量會顯著較另一類別的資料數量多，此型態資料稱為不平衡（imbalanced）資料。使用不平衡資料建構分類模型時，大部份的樣本會傾向被歸類到多數類別，而造成多數類別的分類準確率高、少數類別的分類準確率低，但整體的分類準確率卻又相當高之情形。相較於多數類別資料，少數類別的預測常是研究者有興趣的議題。無論整體準確率有多高，若無法正確分類出少數類別的資料，分類模型可能不具任何實用價值。因此，為提升少數類別資料的分類準確率，本研究利用實驗設計法（design of experiment，DOE）與反應曲面法（response surface methodology，RSM）先求得可提升少數類別資料的分類準確率之最適重新取樣比例，再使用自組性演算法（group method of data handling，GMDH）建構分類模型，並透過兩個實例來說明本研究提出的最適重新取樣方法確實可以有效提升少數類別的分類準確率。	zh_TW
dc.description.abstract	For classifying categorical data, the common method is to construct a classification model with historical data, and classifying the new observation using the classification model. The categorical data in real-world often are imbalanced data. That is, most of data are in the majority class and few data are in the minority class. When constructing a classification model with imbalanced data, most of data tend to be classified into the majority class. Consequently, although the overall prediction accuracy of the classification model and the prediction accuracy of majority class are high, whereas the prediction accuracy of minority class is quite low. However, compared to the majority class, minority class is often concerned. No matter how high the overall classification accuracy is, if the observations of minority class cannot be classified correctly, the classification model might not have any practical use. Therefore, the objective of this study is to develop a method of improving the prediction accuracy of minority class for imbalanced data using design of experiment（DOE）, Response Surface Methodology（RSM）and Group Method of Data Handling（GMDH）. Finally, two real cases are utilized to verify the effectiveness of the proposed procedure.	en_US
dc.language.iso	zh_TW	en_US
dc.subject	不平衡資料	zh_TW
dc.subject	重新取樣	zh_TW
dc.subject	實驗設計	zh_TW
dc.subject	反應曲面法	zh_TW
dc.subject	自組性演算法	zh_TW
dc.subject	Imbalanced Data	en_US
dc.subject	Re-sampling	en_US
dc.subject	Design of Experiments	en_US
dc.subject	Response Surface Methodology	en_US
dc.subject	Group Method of Data Handling	en_US
dc.title	改善多類別不平衡資料之分類準確率	zh_TW
dc.title	Improving the Prediction Accuracy of Classfication Model for Multi-Class Imbalanced Data	en_US
dc.type	Thesis	en_US
dc.contributor.department	工業工程與管理系所	zh_TW
Appears in Collections:	Thesis