標題: 以自主性演算法為基礎之變數選擇法建構兩階段風險評估模型
Two-Stage Risk Assessment Model by GMDH-based Feature Selection
作者: 簡健宇
Chien, Chien-Yu
張永佳
李榮貴
Chang, Yung-Chia
Li, Rong-Kwei
工業工程與管理學系
關鍵字: 風險評估;自組性演算法;變數排名法;變數選擇法;risk assessment;GMDH;feature ranking;feature selection
公開日期: 2010
摘要: 對企業放款收取利息為金融機構主要收益來源,但借款客戶可能會發生無法償還的情形造成損失,因此金融機構多採用風險評估模型來預測借款客戶是否有能力償還貸款。現有國內外文獻所提出之風險評估模型多採用單一分類器進行分類,對於有著類別不對稱特性的財務風險資料會有某類借款客戶準確率高,而另一類客戶準確率卻偏低的情形,無法應用於實務上。而現有風險評估模型也鮮少對於投入變數進行選擇,未過濾的變數直接投入建模有可能導致過度適配的情形發生。本研究建構了兩階段風險評估模型,第一階段使用自主性演算法(Group Method of Data Handling, GMDH)建構風險評估模型,透過選擇切點的方式降低準確率不對稱的問題,並設計一個以自主性演算法為基底之變數選擇法,先使用變數排名法依貢獻程度對變數加以排名,接著依照排名結果依序投入自主性演算法中進行建模,藉此選出最適合自主性演算法建模的變數。第二階段使用C4.5決策樹找出第一階段分類錯誤的規則,修正判別結果進一步提升風險評估模型之預測能力。本研究將此方法分別應用於UCI資料庫中可取得的信用風險資料與一組臺灣某金融機構所提供之實際資料進行信用風險評估,並與國內外文獻中所使用過的方法進行比較,驗證本研究兩階段風險評估模型能有更好的表現。
The main revenue of financial institutions comes from the interest they charge to their enterprises customers. But some customers may not be able to pay their debts back, so financial institutions needs to adopt some risk assessment models to measure this credit risk. Many risk assessment models have been developed to deal with the credit risk; most of them used only one stage classifier, but when those methods have to deal with financial data, which was divided into two categories with large numbers of normal instances and small number of default instances, there may be a large gap in accuracy between these two categories. Too many features used in a risk assessment model without feature selection may cause the problem of Overfitting. This study construction a two-stage risk assessment model using Group Method of Data Handling (GMDH) method and decision tree method. In the first stage, this study designs a GMDH-based feature selection method. A feature ranking method is used to rank the entire feature first, and then uses a feature selection method to choose the most appropriate features into construction the GMDH model. In the second stage a decision tree is used to identify the wrong classification instances and revise them into the right ones. In the end two credit risk data in UCI Repository of Machine Learning database and a real case from a Taiwanese financial institution are used to demonstrate the accurate of the proposed two-stage risk assessment model. This study also compares to other references to see that our study would have the same or better result than other models.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079833518
http://hdl.handle.net/11536/47864
Appears in Collections:Thesis