Full metadata record
DC FieldValueLanguage
dc.contributor.author賴宗偉zh_TW
dc.contributor.author唐麗英zh_TW
dc.contributor.author洪瑞雲zh_TW
dc.contributor.authorLai, Tsung-Weien_US
dc.contributor.authorTong, Lee-Ingen_US
dc.contributor.authorHorng, Ruey-Yunen_US
dc.date.accessioned2018-01-24T07:40:12Z-
dc.date.available2018-01-24T07:40:12Z-
dc.date.issued2017en_US
dc.identifier.urihttp://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070453346en_US
dc.identifier.urihttp://hdl.handle.net/11536/141061-
dc.description.abstract許多領域常需要對資料建構分類模型(classification model)以預測未來之資料歸屬之群組,故提升分類模型之準確率是一個非常重要的議題。在現實世界中,各類別的資料數量通常不會相同,且常出現某一類別之資料量會明顯多於其他類別之資料量,此類資料稱為不平衡資料(imbalanced data)。針對不平衡資料建立分類模型時,由於各類別資料數量的差異,可能會發生分類模型的整體預測準確率相當高,且預測多數類別資料的準確率高,但預測少數類別資料的準確率卻相當低的情形。然而許多實際應用案例顯示,研究者常會對少數類別資料之預測準確性特別感興趣,故希望分類模型在預測少數類時準確性要高。目前大多數資料是屬於兩類別型態之資料,因此,本研究針對兩類別不平衡資料在建構分類模型前,先利用實驗設計(Design of Experiment)與雙反應曲面法(Dual Response Surface Methodology),找出最適之多數類別需要重新抽樣及少數類別需要增生之樣本數量,再用經過調整之樣本數來建立分類模型,以少數類別資料之分類準確率在研究者可以接受的情況下,最大化多數類別之料之分類準確率。本研究最後利用KEEL資料庫中三個兩類別不平衡資料來說明本研究方法確實能有效改善兩類別資料分類模型中少數類別資料的準確率。zh_TW
dc.description.abstractIn many fields, it is necessary to construct a classification model to classify the future observations. Therefore, it is an important issue to assure the accuracy of the classification model. In many real-world data, the observations in each class is usually not the same, that is, the amount of number of data in a particular class may significantly greater than that of other classes. Such data are called imbalanced data. When a classification model is established for the imbalanced data, the prediction accuracy of the majority class is high, but the prediction accuracy of the minority class is relatively low. In many practical cases, the researchers may be interested in having a high accuracy rate of classifying observations into the minority class. Because most data belong to two-class data, this study uses Design of Experiments (D.O.E.) and Dual Response Surface methodology to find an optimal resampling strategy for the majority class and the minority class. Then,applying the optimal resampling strategy to adjust the number of observations in the majority and minority classes, respectively. The accuracy rate of classifying observations into the minority class can significantly be improved. Finally, three datasets from the KEEL-dataset repository are used to demonstrate the effectiveness of the proposed method.en_US
dc.language.isozh_TWen_US
dc.subject不平衡資料zh_TW
dc.subject重新取樣zh_TW
dc.subject實驗設計zh_TW
dc.subject雙反應曲面法zh_TW
dc.subject分類模型zh_TW
dc.subject兩類別資料zh_TW
dc.subjectImbalanced dataen_US
dc.subjectresamplingen_US
dc.subjectDesign of Experimentsen_US
dc.subjectDual Response Surface Methodologiesen_US
dc.subjectclassification modelen_US
dc.subjectTwo-class dataseten_US
dc.title改善兩類別不平衡資料之分類模型準確率zh_TW
dc.titleImproving the Prediction Accuracy of Classification Model for Two Types of Imbalanced Dataen_US
dc.typeThesisen_US
dc.contributor.department工業工程與管理系所zh_TW
Appears in Collections:Thesis