標題: 探討變數選取與變數萃取在企業財務預測綜合比較
The Impact of Feature Selection and Feature Extraction on the Performance of Bankruptcy Prediction: A Comparative Study
作者: 林昭龍
Lin, Chao-Lung
王志軒
工業工程與管理學系
關鍵字: 變數選取;變數萃取;分類正確率;財務危機預測;feature selection;feature extraction;classification accuracy;bankruptcy prediction
公開日期: 2011
摘要: 企業的破產與財務危機預測能提供給金融機構放貸的重要決策依據。然而在企業財務資料中,大部分的變數指標之間可能為彼此相關,或者變數干擾易分類預測結果。本研究使用有系統的方法來搜尋對於分類預測結果具有代表性的變數,並建構一個更佳的財務危機預測模型。在實務中由於財務資料的變數相當眾多與繁雜,多餘的變數或者變數之間的高相依性所衍生的共線性問題,不但增加建立模型的複雜性,也干擾預測分類結果。因此,變數選取與變數萃取對於財務危機預測的問題值得我們進一步探討。監督式的變數選取透過一系列的統計測試與選取準則訂定,以刪除多餘的原始變數。非監督式變數萃取則將原始變數重新組合,產生新的變數,以降低變數之間的相關性。在本研究中,針對企業財務資料同時進行選取與萃取的變數簡化過程,並探討其績效表現。研究使用兩個資料集分別來自「波蘭破產公司」與「台灣經濟新報」來驗證本研究所提出的方法。並且利用三個常用績效指標分類「分類預測正確率」、「敏感度」、「特異度」來評估其模型優劣關係。根據本研究實驗結果,變數選取法在變數間的區別能力明顯不同時表現較佳,而變數萃取則變數間存在高相關性時表現較佳。
Corporate bankruptcy prediction is of importance for financial institutes to make decisions on granting loans to an enterprise. Because most financial indicators are redundant or mutually interdependent, a systematic approach to indicate what variables are more representative is necessary to construct a better model for bankruptcy prediction. In general, superfluous variable or dependent features might gives rise to colinearity, degrading the prediction performance and increasing the computational complexity. Consequently, the impact of feature selection and feature extraction on bankruptcy prediction deserves to be further explored. As a matter of fact, supervised feature selection directly diminishes superfluous variables through a series of statistical testing or soft-computing criterion. By contrast, to reduce their mutual dependency among the original variables, unsupervised feature extraction usually recombines “old” variables to generate “new” features that are less correlated. In this study, we concurrently various schemes of feature preprocessing with typical classifiers to obtain a comprehensive understanding of their overall performance. In particular, two empirical datasets originated from Poland and Taiwan are used to validate the proposed approach. And three common indices such as ”prediction accuracy”, “sensitivity”, “specificity” are simultaneously adopted to assess their prediction performance. Based on experimental results, it’s observed that feature selection seems to be more appropriate to tackle a scenario in which input variables are disparate in their discriminant power. But it’s observed that feature extraction seems to be more appropriate to tackle a scenario in which input variables are mutual dependency.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079933527
http://hdl.handle.net/11536/50091
顯示於類別:畢業論文