標題: 針對資料探勘技術的資料型態整合
Data Types Generalization for Data Mining Algorithms
作者: 廖尚儀
Shan-Yi Liao
曾憲雄
Dr. Shian-Shyong Tseng
資訊科學與工程研究所
關鍵字: 資料探勘;資料型態;合併階段;轉換階段;整合;Data Mining;Data Types;Merging Phase;Transforming Phase;Generalization
公開日期: 1998
摘要: 隨著資料庫系統應用的日漸增加,近年來資料探勘的重要性也逐漸被重視,各式資料探勘 (data mining) 方法也已經被提出。正如一般所知,資料探勘所處理的資料可以從不同種類的來源獲得,也因此資料的型態可能各不相同。但目前似乎沒有一套資料探勘的方法可以同時適用於所有的應用,因為實際上每種方法都有其適合處理的資料型態。為此使用者在決定要使用何種資料探勘方法時,不僅要考慮到其應用的目標,也需要考慮到資料型態適合與否的問題。 因此,將不適合的資料型態轉換為可適用的資料型態就成為資料探勘領域中的一項重要工作。然而,由於現實中所存在的資料型態過多, 這項工作也就變成十分繁重。將性質相似的資料型態合併成為一個整合式資料型態 (generalized data type)是一個降低這項工作複雜度的好方法。在這篇論文中,一個包含了合併階段和轉換階段的資料型態整合程序被提出來。在合併階段中,資料來源中的各種資料型態首先被合併成為一些整合式資料型態。轉換階段接著將這些整合式資料型態轉換成適合於被選用的資料探勘方法。藉著使用這個資料型態整合程序,使用者可以依照其應用的目標來選擇資料探勘方法,而不用考慮到資料型態的問題。 在這篇論文中,六類常用資料探勘技術的資料型態適合度的問題會被討論並且對其提出一個完整的分析。如何藉著使用這個資料型態整合程序,來解決在關聯式資料庫中的資料型態適合度的問題也會被說明。最後,各種在這個程序中所使用的轉換策略會以演算法的方式加以說明。而一些實例也會被列出來說明這種資料整合程序是可行的。
With the increasing of database applications, mining interesting information from huge databases becomes of most concern and a variety of mining algorithms have been proposed in recent years. As we know, the data processed in data mining may be obtained from many sources in which different data types may be used. However, no algorithm can be applied to all applications due to the difficulty for fitting data types of the algorithm, so the selection of an appropriate mining algorithm is based on not only the goal of application, but also the data fittability. Therefore, to transform the non-fitting data type into target one is also an important work in data mining, but the work is often tedious or complex since a lot of data types exist in real world. Merging the similar data types of a given selected mining algorithm into a generalized data type seems to be a good approach to reduce the transformation complexity. In this work, a data type generalization process including merging and transforming phases is proposed. In the merging phase, the original data types of data sources to be mined are first merged into the generalized ones. The transforming phase is then used to convert the generalized data types into the target ones for the selected mining algorithm. Using the data type generalization process, the user can select appropriate mining algorithm just for the goal of application without considering the data types. In this thesis, the data types fittability problem for six kinds of widely used data mining techniques will be discussed and a complete analysis of it will be presented. We will also show that with the proposed data types generalization process, users can do data types transformation for the attributes in relations and thus the data fittability problem for relational databases is solved. Finally, we explain different kinds of transformation strategies used in the process by giving concise algorithms for them. We also illustrate examples to show the prototype of the data types generalization process is practical.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT870394014
http://hdl.handle.net/11536/64152
顯示於類別:畢業論文