標題: 一個漸進式資料庫知識擷取的方法
An Incremental Method for Knowledge Discovery in Databases
作者: 宋振裕
Sung, Jane-Yuh
黃書淵, 李素瑛
Shu-Yuen Hwang, Suh-Yin Lee
資訊科學與工程研究所
關鍵字: 知識擷取;資料庫學習;非監督式學習;資料歸納;機器學習;資料篩選;knowledge discovery;KDD;machine learning;unsupervised learning;data mining;data summarization
公開日期: 1995
摘要: 由於資料及資料庫大量且快速的增加,使得從資料庫中發掘知識的研
究, 在近年來漸有增 加的趨勢.如果我們能把隱含在資料庫中的資訊發
掘出來,在許多方面將會有所幫助.如商 業決策,詐欺偵測,資料庫綱要修
正,完整性執行,語意最佳化,智慧查詢處理等等.為了使儲 存的資料做最
有效的利用,對於資料庫作知識擷取便顯得有其必要了.對於資料庫的知識
擷 取,基本上可分為兩大類,即監督式學習和非監督式學習.監督式學習
又被稱為分類規則的 發掘,而非監督式學習又被稱為資料歸納或是特性
規則的發掘.本論文提出一種資料歸納的 方法,此方法由兩個資料過濾術
函數及一個快速的規則發掘演算法所構成. 本方法包含三 個主要步
驟:1. 資料先經過篩選找出可能隱含知識的部份. 2.把選擇後的資料送至
知識發 掘模組找出隱含的知識. 3. 被發掘的知識將被評估.遞迴的把我
們的方法應用於資料集, 最終可建立起一個樹狀結構,我們稱之為特性
樹,以此可作為知識歸納的表現方式,使發掘 的知識更易讀且易於了解.
我們以另外兩種方法,即Thought/KD1和Rough,和我們的方法做 比較.由
於知識發掘模組的計算複雜度低,我們的方法對於KDD而言,是可行且有效
率的.實 驗結果同時也顯示了我們的方法可以找出更多存在於資料庫中
資料歸納的知識.
There is an increasing growing interest in knowledge discovery
in databases re search area due to the rapid increase in the
amount of data and databases. If we can discover the
information "hidden'' among the data,it will be helpful in many
aspects, such as business decision making, fraud detection,
database sche ma refinement, integrity enforcement, semantic
query optimization, and intelli gent query handling. Thus, it
becomes necessary to perform knowledge discovery in databases
to best utilize the stored data. There are two basic ways to per
form knowledge discovery in databases, supervised learning and
unsupervised le arning. Supervised learning in databases is
also called discovery of classific ation rules, and
unsupervised learning in databases is also called data summar
ization or discovery of characteristic rules. In this thesis, we
proposed a un supervised learning method of data summarization.
The method is base on two da ta filtering functions and a fast
rule discovery algorithm. Our discovery proc ess includes three
main steps. First, the data set is analyzed by data filteri ng
functions. Second, the selected part of data is sent to
discovery procedure . Finally, the rules are evaluated.
Recursively applying our method into the d ata set, we can
construct a tree structure, called characteristic tree, to sum
marize the data. We compare our method with two methods,
Thought/KD1 and Rough . Due to the fast discovery procedure
with low computational complexity, our m ethod is efficient and
feasible for knowledge discovery. Results also showed t hat our
method can find more useful knowledge for data summarization for
datab ases.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT840392001
http://hdl.handle.net/11536/60341
顯示於類別:畢業論文