標題: 應用增生少數合成技術建構信用風險評估模型
Constructing a Credit Risk Assessment Model using Synthetic Minority Over-sampling Technique
作者: 林宜憲
Yi-Hsien Lin
張永佳
工業工程與管理學系
關鍵字: 風險評估;類別不對稱;增生少數合成技術;risk assessment;class imbalance;Synthetic Minority Over-sampling Technique
公開日期: 2011
摘要: 對企業放款收取利息為金融機構主要收益來源,但借款客戶可能會發生無法償還的情形,因此金融機構多採用信用風險評估模型來預測欲借款客戶是否有能力償還貸款。而財務風險資料多具有類別不對稱問題(class imbalance)。類別不對稱是指資料中類別數量不對稱,分為多數類別(major class)與少數類別(minor class)。若將此型態的資料納入訓練樣本進行建模,則可能發生整體判別準確率與多數類別判別準確率高,但少數類別判別準確率過低的情況。目前雖然中外文獻提出多項風險評估模型,但多數模型仍採用抽樣法(sampling)進行建模,此方法可能使得資料完整性不足、對於抽樣樣本過於敏感,導致模型失準。本研究應用增生少數合成技術(Synthetic Minority Over-sampling Technique, SMOTE)於信用風險資料,建構信用風險評估模型。此做法可保留資料完整性,解決類別不對稱資料所造成分類不準確的問題。此方法不僅修正以往風險評估模型應用於類別不對稱資料可能造成的缺失,並可解決在分類類別不對稱資料時,模型對多數與少數類別資料分類準確率相差懸殊的問題,藉此提升分類模型的準確率。最後與以往使用不同方法所建構的分類模型進行比較,證實本研究所應用之方法有效降低類別不對稱資料分類準確率不對稱之情形,維持一定整體準確率下提升少數類別分類準確率。
The main source of revenue of financial institutions is the interest they charge from their customers. But not all the customers will pay back their debt, financial institutions need to adopt some kind of risk assessment models in order to measure this credit risk. It is not uncommon to observe class imbalance problem in finance risk data. Class imbalance problem is asymmetric categories within data, that is, there is one class of data (major class) significantly outnumbered others (minor class). If we trained a model with imbalanced data, while the accuracy of major class instances might be very well, it could have a poor predictive ability to identify minority instances. Most of the risk assessment models apply sampling to deal with the class imbalanced problem. However, sampling method may lead to lack of data integrity and the model is sensitive on the sampling result as to produce inaccurate problems. This study constructs a risk model using Synthetic Minority Over-sampling Technique (SMOTE) to tackle class imbalance problems. The model we proposed not only fixed the lack of data integrity, but also solved the poor minority class predictive ability issue, hence improved the overall model accuracy. In the end, the study compares the results of classification with several sampling methods and previous Granular Computing model. By calculation and compare of the accuracy, AUC and G-means, we can conclude that using Synthetic Minority Over-sampling Technique to construct risk models would have the same or even better result than sampling and Granular Computing model.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079933531
http://hdl.handle.net/11536/50095
Appears in Collections:Thesis