標題: Borderline SMOTE 自適應提升決策樹
Borderline SMOTE adaptive boosted decision tree
作者: 陳奕名
王秀瑛
Chen, Yih-Ming
Wang, Hsiu-Ying
統計學研究所
關鍵字: SMOTE;borderline SMOTE;SMOTE Boosting;不平衡樣本;SMOTE Boosting;borderline SMOTE;SMOTE Boosting;imbalanced dataset
公開日期: 2016
摘要: 不平衡數據一直以來嚴重影響分類器進行分類的效能,許多的學者投入心力在此領域中。產生出許多的解決方法,由最簡易的抽樣方法(Sampling)、代價敏感學習方法(Cost-Sensitive)和聚合方法等等。在這些方法中經常有損失訊息或者是模型過度配適的情形,導致模型過度配適的原因為在有限的樣本點重複抽樣而分類器的分割邊界被限制,於是抽樣方法又由SMOTE延伸出大量的改良算法。而抽樣方法與代價敏感方法又被用於自適應提升框架(Adaptive Boosting)來增加分類器邊界的廣度。由於早期的SMOTE Boosting方法嵌入的SMOTE方法並沒有考慮數據的噪聲點,於是本文章將在SMOTE Boosting裡的SMOTE方法替換成更可靠的邊界人工過抽樣(Borderline SMOTE)來使得製造新的樣本點時可以考慮數據的分配並且排除噪聲點。最後我們實作Borderline SMOTE Boosting方法並且比較SMOTE Boosting方法以及Boosintg方法。
The problem of learning from imbalanced data has been receiving a growing attention. Since dealing with imbalanced data may decrease the efficiency of classifier, many researchers have been working on this domain and coming up with many solutions, such as the method of combining SMOTE(Synthetic Minority Over-sampling Technique) and decision tree. In this study, we review the existing methods including SMOTE, Borderline SMOTE, Adaptive Boosting and SMOTE Boosting. To improve these methods, we propose an approach Borderline SMOTE Boosting. This approach is compared with the existing methods using three real data examples. The results show that the proposed method leads to a better result.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070252622
http://hdl.handle.net/11536/141129
Appears in Collections:Thesis