標題: 2001年資料探勘競賽研究
Study on KDD cup 2001
作者: 張文賢
林心宇
電控工程研究所
關鍵字: 資料探勘競賽;決策樹;少數服從多數;"主要-輔助”分類系統;KDD Cup;Decision Tree;Majority vote;"Primary-Secondary " classification system
公開日期: 2003
摘要: 資料探勘是一種分析的程序,用來幫助我們發現大型資料庫中的特徵及知識。因為有關生物學的資料探勘快速的發展,2001年資料探勘競賽聚焦在基因及藥物設計資料上。我們所熱衷的是一個分類問題,這個問題有三個有趣的特性(1)大量的遺漏值(2)大量的屬性(3)混合兩種不同型態的資料,而我們最感興趣的分類方法就是決策樹分類法,我們修改了決策樹演算法,並引入“少數服從多數”技巧來提昇分類正確性。為了結合上述兩種分類方法我們發展出“主要-輔助”分類系統。
Data mining is an analysis process which helps discovering patterns and knowledge in large databases. Because of the rapid growth of interest in mining biological databases, KDD Cup 2001 was focused on data from genomics and drug design. We were involved in a classification problem. The problem has three interesting features: (1) the dataset contains many missing values; (2) this dataset has a lot of attributes; and (3) the dataset is a mixture of two types of data, while the classification method we interested in most is Decision Tree. We modify the Decision Tree algorithm and cite the majority vote to improve the classification accuracy. For integrating the above two classification methods we develop " Primary-Secondary " classification system.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009012537
http://hdl.handle.net/11536/80836
顯示於類別:畢業論文


文件中的檔案:

  1. 253701.pdf
  2. 253702.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。