改善兩類別不平衡資料之分類模型準確率

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.author	賴宗偉	zh_TW
dc.contributor.author	唐麗英	zh_TW
dc.contributor.author	洪瑞雲	zh_TW
dc.contributor.author	Lai, Tsung-Wei	en_US
dc.contributor.author	Tong, Lee-Ing	en_US
dc.contributor.author	Horng, Ruey-Yun	en_US
dc.date.accessioned	2018-01-24T07:40:12Z	-
dc.date.available	2018-01-24T07:40:12Z	-
dc.date.issued	2017	en_US
dc.identifier.uri	http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070453346	en_US
dc.identifier.uri	http://hdl.handle.net/11536/141061	-
dc.description.abstract	許多領域常需要對資料建構分類模型(classification model)以預測未來之資料歸屬之群組，故提升分類模型之準確率是一個非常重要的議題。在現實世界中，各類別的資料數量通常不會相同，且常出現某一類別之資料量會明顯多於其他類別之資料量，此類資料稱為不平衡資料(imbalanced data)。針對不平衡資料建立分類模型時，由於各類別資料數量的差異，可能會發生分類模型的整體預測準確率相當高，且預測多數類別資料的準確率高，但預測少數類別資料的準確率卻相當低的情形。然而許多實際應用案例顯示，研究者常會對少數類別資料之預測準確性特別感興趣，故希望分類模型在預測少數類時準確性要高。目前大多數資料是屬於兩類別型態之資料，因此，本研究針對兩類別不平衡資料在建構分類模型前，先利用實驗設計(Design of Experiment)與雙反應曲面法(Dual Response Surface Methodology)，找出最適之多數類別需要重新抽樣及少數類別需要增生之樣本數量，再用經過調整之樣本數來建立分類模型，以少數類別資料之分類準確率在研究者可以接受的情況下，最大化多數類別之料之分類準確率。本研究最後利用KEEL資料庫中三個兩類別不平衡資料來說明本研究方法確實能有效改善兩類別資料分類模型中少數類別資料的準確率。	zh_TW
dc.description.abstract	In many fields, it is necessary to construct a classification model to classify the future observations. Therefore, it is an important issue to assure the accuracy of the classification model. In many real-world data, the observations in each class is usually not the same, that is, the amount of number of data in a particular class may significantly greater than that of other classes. Such data are called imbalanced data. When a classification model is established for the imbalanced data, the prediction accuracy of the majority class is high, but the prediction accuracy of the minority class is relatively low. In many practical cases, the researchers may be interested in having a high accuracy rate of classifying observations into the minority class. Because most data belong to two-class data, this study uses Design of Experiments (D.O.E.) and Dual Response Surface methodology to find an optimal resampling strategy for the majority class and the minority class. Then,applying the optimal resampling strategy to adjust the number of observations in the majority and minority classes, respectively. The accuracy rate of classifying observations into the minority class can significantly be improved. Finally, three datasets from the KEEL-dataset repository are used to demonstrate the effectiveness of the proposed method.	en_US
dc.language.iso	zh_TW	en_US
dc.subject	不平衡資料	zh_TW
dc.subject	重新取樣	zh_TW
dc.subject	實驗設計	zh_TW
dc.subject	雙反應曲面法	zh_TW
dc.subject	分類模型	zh_TW
dc.subject	兩類別資料	zh_TW
dc.subject	Imbalanced data	en_US
dc.subject	resampling	en_US
dc.subject	Design of Experiments	en_US
dc.subject	Dual Response Surface Methodologies	en_US
dc.subject	classification model	en_US
dc.subject	Two-class dataset	en_US
dc.title	改善兩類別不平衡資料之分類模型準確率	zh_TW
dc.title	Improving the Prediction Accuracy of Classification Model for Two Types of Imbalanced Data	en_US
dc.type	Thesis	en_US
dc.contributor.department	工業工程與管理系所	zh_TW
顯示於類別：	畢業論文