標題: 應用統計與類神經網路模式於監督式分類問題
Statistical and Neural Models for Supervised Classification
作者: 李得盛
Li, Te-Sheng
蘇朝墩
Su, Chao-Ton
工業工程與管理學系
關鍵字: 分類;k個最鄰近法;線性辨別分析法;馬氏距離;基因演算法;倒傳綱路;放射基準機能綱路;學習向量化綱路;classification;k nearest neighbor;linear discriminant analysis;Mahalanobis distance;genetic algorithm;backpropagation network;radial basis function network;learning vector quantization
公開日期: 2001
摘要: 應用統計和類神經網路模式於監督式分類問題 研究生:李得盛 指導教授:蘇朝墩博士 國立交通大學 工業工程與管理系 摘要 多維度資料間充滿著模糊不清與許多變異,資料間的關係不易釐清。傳統上建立資料分類系統需藉助輸入變數所形成的規則,而在大量資料的情況下,這些規則的形成特別因難。在過去的數十年間,有許多監督型分類演算法已經成功地實行在多維度的資訊系統上,這些演算中可分為統計分類器和類神經網路分類器兩大類。本論文首先比較不同的統計分類器與類神網路分類器運用在原始的輸入變數上,統計分類器為k個最鄰近法 (KNN, k-nearest neighbor)、線性辨別分析法 (LDA, linear discriminant analysis) 和馬氏距離法(MD, Mahalanobis distance);類神經分類器依序為倒傳遞類神經網路 (BP, backpropagation)、放射基準機能網路 (RBF, radial basis function) 和學習向量量化網路 (LVQ, learning vector quantization)。這些方法的比較主要是以分類的正確率作為評估的準則。接下來,為了能從多維度的資訊系統中去除多餘的變數以增進分類的效率與正確率,本論文提出幾項有關變數縮減的方法論,能夠篩選出重要因子,利用較少的變數,即可完成分類的任務,並且不影響原先的分類正確率。他們是馬氏田口法(MTS, Mahalanobis-Taguchi method)、輸入節點選擇法 (INS, input node selection)、結合倒傳遞類神經網路與基因演算法的BP-GA (backpropagation-genetic algorithm)、結合放射基準機能網路與基因演算法的RBF-GA (radial basis function-genetic algorithm) 結合學習向量量化網路與基因演算法的LVQ-GA (learning vector quantization-genetic algorithm)。以上所提的縮減變數的方法將與統計上逐步辨別分析(stepwise discriminant analysis) 所得的結果作一比較。 本論文提供原始變數分類器的基礎,也提出縮減變數的方法論以及應用這些方法在實際個案上的分類結果。在分類結果上,醫學檢驗資料上,利用原始變數進行分析具有較佳結果的是馬氏距離(MD);利用縮減變數的模式具有較佳結果的是MTS, INS和LVQ-GA。例二利用原始變數所得較佳結果為LDA和RBF;利用縮減變數的模式具有較佳結果的是INS其次為MTS和LVQ-GA。因此,本論文所提的方法的確可刪除多餘變數,即可篩選重要變數以利未來的應用。最後,本論文將各方法的優缺點與實際應用上的注意事項作一比較與說明。
The relationships among multi-dimensional data with ambiguity and variation are difficult to explore. The traditional approach to building a data classification system requires the formulation of rules by which the input data can be analyzed. The formulation of such rules is very difficult with large sets of input data. Various algorithms for supervised classification of multi-dimensional data have been implemented in the past decades. Among these algorithms, statistical and neural classifiers are two major methodologies used in literature. In this dissertation, a comparison of different statistical and neural network algorithms using all the original input variables for classification is first presented. Three statistical classifier: k-nearest neighbor (KNN), linear discriminant analysis (LDA) and Mahalanobis distance (MD) are considered. Meanwhile, three types of neural classifiers: back-propagation (BP) neural network, radial basis function network (RBF), and learning vector quantization (LVQ) are also discussed in order to compare the accuracy of classification with those of using statistical classifiers. Next, in order to eliminate the redundant variables in multi-dimensional data set and increase classification efficiency and accuracy, we also herein proposed the variable reduction techniques. They are Mahalanobis-Taguchi system (MTS), input nodes selection (INS), BP combined with GA procedure (named BP-GA), RBF combined with GA (named RBF-GA) and LVQ combined with GA (LVQ-GA). A benchmark method, stepwise discriminant analysis, is employed to compare the accuracy with those of reduced models. This dissertation includes an introduction of the theoretical background of the classifiers, their implementation procedures, and two case studies to evaluate their performance. Whether the full model or reduced model, both neural networks and statistical models are demonstrated to be efficient and effective methods for multi-dimensional data classification. In example one, MD outperforms the neural classifiers and statistical models for full models. Compared to the full models, MTS, INS and LVQ-GA reduced models result in the higher classification accuracy. In example two, LDA and RBF outperform the other models for full models. Compared to the full models, INS, MTS, and LVQ-GA reduced models result in the higher classification accuracy. It is also shown that the proposed variable reduction techniques indeed eliminate the redundant variables in the multi-dimensional system. Once a subset of variables is selected, the more important variables can be used for rule extraction or future application. In conclusion, the comparison and discussion of these approaches are presented in view of practical and theoretical consideration.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT900031066
http://hdl.handle.net/11536/68185
顯示於類別:畢業論文