一個漸進式資料庫知識擷取的方法

標題:	一個漸進式資料庫知識擷取的方法 An Incremental Method for Knowledge Discovery in Databases
作者:	宋振裕 Sung, Jane-Yuh 黃書淵, 李素瑛 Shu-Yuen Hwang, Suh-Yin Lee 資訊科學與工程研究所
關鍵字:	知識擷取;資料庫學習;非監督式學習;資料歸納;機器學習;資料篩選;knowledge discovery;KDD;machine learning;unsupervised learning;data mining;data summarization
公開日期:	1995
摘要:	由於資料及資料庫大量且快速的增加,使得從資料庫中發掘知識的研究, 在近年來漸有增加的趨勢.如果我們能把隱含在資料庫中的資訊發掘出來,在許多方面將會有所幫助.如商業決策,詐欺偵測,資料庫綱要修正,完整性執行,語意最佳化,智慧查詢處理等等.為了使儲存的資料做最有效的利用,對於資料庫作知識擷取便顯得有其必要了.對於資料庫的知識擷取,基本上可分為兩大類,即監督式學習和非監督式學習.監督式學習又被稱為分類規則的發掘,而非監督式學習又被稱為資料歸納或是特性規則的發掘.本論文提出一種資料歸納的方法,此方法由兩個資料過濾術函數及一個快速的規則發掘演算法所構成. 本方法包含三個主要步驟:1. 資料先經過篩選找出可能隱含知識的部份. 2.把選擇後的資料送至知識發掘模組找出隱含的知識. 3. 被發掘的知識將被評估.遞迴的把我們的方法應用於資料集, 最終可建立起一個樹狀結構,我們稱之為特性樹,以此可作為知識歸納的表現方式,使發掘的知識更易讀且易於了解. 我們以另外兩種方法,即Thought/KD1和Rough,和我們的方法做比較.由於知識發掘模組的計算複雜度低,我們的方法對於KDD而言,是可行且有效率的.實驗結果同時也顯示了我們的方法可以找出更多存在於資料庫中資料歸納的知識. There is an increasing growing interest in knowledge discovery in databases re search area due to the rapid increase in the amount of data and databases. If we can discover the information "hidden'' among the data,it will be helpful in many aspects, such as business decision making, fraud detection, database sche ma refinement, integrity enforcement, semantic query optimization, and intelli gent query handling. Thus, it becomes necessary to perform knowledge discovery in databases to best utilize the stored data. There are two basic ways to per form knowledge discovery in databases, supervised learning and unsupervised le arning. Supervised learning in databases is also called discovery of classific ation rules, and unsupervised learning in databases is also called data summar ization or discovery of characteristic rules. In this thesis, we proposed a un supervised learning method of data summarization. The method is base on two da ta filtering functions and a fast rule discovery algorithm. Our discovery proc ess includes three main steps. First, the data set is analyzed by data filteri ng functions. Second, the selected part of data is sent to discovery procedure . Finally, the rules are evaluated. Recursively applying our method into the d ata set, we can construct a tree structure, called characteristic tree, to sum marize the data. We compare our method with two methods, Thought/KD1 and Rough . Due to the fast discovery procedure with low computational complexity, our m ethod is efficient and feasible for knowledge discovery. Results also showed t hat our method can find more useful knowledge for data summarization for datab ases.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#NT840392001 http://hdl.handle.net/11536/60341
Appears in Collections:	Thesis