標題: 整合變數篩選與分類方法建構信用評等模型
Incorporation of feature selection into classification methods to construct a model for credit scoring prediction
作者: 曾建元
王志軒
工業工程與管理學系
關鍵字: 資料採礦;變數選擇;信用評等;分類預測模型;data mining;feature selection;credit rating;classification prediction model
公開日期: 2011
摘要: 信用評等模型的建構一向被廣為研究,在2008年全球金融風暴過後,信評模型的重要性更為企業所重視,它可以幫助公司選擇較好的、風險相對小的投資標的以進行投資,然而,以往研究可能忽略資料集中常有的關鍵變數,這些少數卻關鍵的變數往往可產生分類規則以識辨出資料的類別,故以這些變數來建構分類預測模型可以提升模型績效,變數篩選演算法即是用來幫助找出這些變數;所以,本論文提出兩階段法,先以變數篩選方法篩選出資料集中的顯著變數,再搭配分類模型來建構預測模式。同時,本研究使用三個公開資料集:Australian Credit Approval、German Credit Data、Japanese Credit Approval,並以資料採礦標準建模流程為基準建構分類模型。分類模型的建構主要依據資料集的特性,對資料的變數型態做預先處理的動作,再依據變數的資料型態搭配適合的變數篩選方法找到資料集中關鍵的變數,最後以這些變數建構分類模型做分類預測,目的是希望找到一個更合理、更快速且能得到高預測正確率的分類模型建構方式,幫助企業決策,在風險端可以減少企業成本,收益端則能增加企業營收。
Constructing credit rating model has always been widely studied. Companies have more emphasized the issue of credit rating after the 2008 global financial crisis, since it can help them choose the better and smaller-risking targets for investment. However, previous studies might overlook the impact of the key variables in the dataset. A small number of key variables can generate critical classification rules to identify the class of labels, and using these variables to construct a classification model can increase its prediction performance. Thus, the algorithms of feature selection can help us find significant variables that are utilized by classification algorithms to construct a better prediction model. In this study, we use three public datasets, namely, Australian Credit Approval, German Credit Data, Japanese Credit Approval, to validate our proposed framework based on the characteristics of the data set. This study constructs prediction model for credit rating. First, this study performs the step of pre-processing. Second, using appropriate feature selection methods to search key variables. Finally, the aim is to find a more reasonable and faster approach that can result in high prediction accuracy. In summary, a good prediction model can not only reduce the cost in the risk side but increase revenue in the income side.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079933544
http://hdl.handle.net/11536/50109
顯示於類別:畢業論文