標題: 2001年資料探勘競賽研究
Study on KDD cup 2001
作者: 張文賢
林心宇
電控工程研究所
關鍵字: 資料探勘競賽;決策樹;少數服從多數;"主要-輔助”分類系統;KDD Cup;Decision Tree;Majority vote;"Primary-Secondary " classification system
公開日期: 2003
摘要: 資料探勘是一種分析的程序,用來幫助我們發現大型資料庫中的特徵及知識。因為有關生物學的資料探勘快速的發展,2001年資料探勘競賽聚焦在基因及藥物設計資料上。我們所熱衷的是一個分類問題,這個問題有三個有趣的特性(1)大量的遺漏值(2)大量的屬性(3)混合兩種不同型態的資料,而我們最感興趣的分類方法就是決策樹分類法,我們修改了決策樹演算法,並引入“少數服從多數”技巧來提昇分類正確性。為了結合上述兩種分類方法我們發展出“主要-輔助”分類系統。
Data mining is an analysis process which helps discovering patterns and knowledge in large databases. Because of the rapid growth of interest in mining biological databases, KDD Cup 2001 was focused on data from genomics and drug design. We were involved in a classification problem. The problem has three interesting features: (1) the dataset contains many missing values; (2) this dataset has a lot of attributes; and (3) the dataset is a mixture of two types of data, while the classification method we interested in most is Decision Tree. We modify the Decision Tree algorithm and cite the majority vote to improve the classification accuracy. For integrating the above two classification methods we develop " Primary-Secondary " classification system.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009012537
http://hdl.handle.net/11536/80836
Appears in Collections:Thesis


Files in This Item:

  1. 253701.pdf
  2. 253702.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.