標題: | 變數精準粗略集之理論與應用 Variable Precision Rough Sets Theory and Its Application |
作者: | 許志華 Jyh-Hwa Hsu 蘇朝墩 Chao-Ton Su 工業工程與管理學系 |
關鍵字: | 資料探勘;粗略集理論;最簡化屬性集合;離散化;Chi2 演算法;data mining;Rough Set Theory (RST);β-reduct;discretization;Chi2 algorithm |
公開日期: | 2003 |
摘要: | 摘 要
變數精確粗略集理論是資料探勘的重要工具之一,已廣泛應用於不同領域的知識獲取。然而,變數精確粗略集理論卻無法應用於資料含有連續型屬性的分類問題,它需要一個將能將屬性離散化的方作來進行資料的前置處理。此外,變數精確粗略集理論缺乏一個適當的方法來決定精確參數(β)值以確定其最簡化屬性集合(β-reducts)。本論文提出一個稱為「擴充的Chi2」的新演算法,此演算法以Chi2演算法為基礎來發展,並改善了Chi2演算法無法由訓練樣本決定預先定義的錯誤分類率(δ)的問題。本論文也提出一個根據精確參數來選擇最簡化集合的方法,這方法首先利用資料錯誤率的最小上界來決定精確參數值,並利用所獲得的精確參數值來尋找資訊系統的子集合;接著計算每一個子集合的分類品質並利用分類品質的量測移除子集合中多餘的屬性,而刪除多餘屬性的子集合即β的最簡化屬性集合。
本論文利用決策樹軟體See 5分析五筆數值資料。分析結果顯示所提出的「擴充的Chi2」演算法之績效優於Chi2演算法。論文中也利用一個簡單的範例說明所提出的最簡化屬性集合選擇方法如何實施,並分析一個實際的醫學案例將實驗結果和類神經網路進行比較,實驗結果顯示本論文所提出的方法有較好的績效。最後,一個通訊產業的應用案例被分析,利用本論文所修正的變數精確粗略集理論來刪減行動電話製造程序中多餘的無線電頻率測試項目。實驗結果顯示,無線電頻率測試項目顯著的減少,而利用這些剩餘的測試項目進行後續分析,結果顯示產品檢驗的準確率非常接近原先未進行測試項目刪減前的測試程序;此外,與決策樹相比較,變數精確粗略集理論也有較好的績效。
關鍵詞:資料探勘,粗略集理論,最簡化屬性集合,離散化,Chi2演算法 Abstract The Variable Precision Rough Sets (VPRS) theory is a powerful tool for data mining, as it has been widely applied to acquire knowledge. Despite its diverse applications in many domains, the VPRS theory unfortunately cannot be applied to real world classification tasks involving continuous attributes. This requires a discretization method to pre-process the data. Also, the VPRS theory lacks a feasible method to determine a precision parameter (β) value to control the choice of β-reducts. In this study we first propose a new algorithm, named the extended Chi2 algorithm that uses a Chi2 algorithm as a basis, whereby the extended Chi2 algorithm improves the Chi2 algorithm in that the value of pre-defined misclassification rate (δ) is calculated based on the training data itself. In addition, an effective method is proposed to select the β-reducts. First, we calculate a precision parameter value to obtain the subsets of information system that are based on the least upper bound of the data misclassification error. Next, we measure the quality of classification and remove redundant attributes from each subset. Five numerical examples are analyzed in this study. By running the software of See5, our proposed extended algorithm possesses a better performance than the Chi2 algorithm. To show the effectiveness of the proposed β-reducts selection approach, a simple example and a real-world medical case are analyzed. Comparing the implementation results from the proposed method with the neural network approach, our proposed approach demonstrates a better performance. Finally, a real example from communication industry is analyzed. The VPRS theory using our proposed procedures is applied to reduce the Radio Frequency (RF) test items in mobile phone manufacturing. Implementation results show that the test items have been significantly reduced. By using these remaining test items, the inspection accuracy is very close to that of the original test procedure. Also, VPRS demonstrates a better performance than that of the decision tree approach. Keywords: date mining, Rough Set Theory (RST), β-reduct, discretization, Chi2 algorithm. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT009033809 http://hdl.handle.net/11536/38802 |
Appears in Collections: | Thesis |
Files in This Item:
-
380901.pdf
-
380902.pdf
-
380903.pdf
-
380904.pdf
-
380905.pdf
-
380906.pdf
-
380907.pdf
-
380908.pdf
-
380909.pdf
-
380910.pdf
-
380911.pdf
-
380912.pdf
-
380913.pdf
-
380914.pdf
-
380915.pdf
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.