标题: 一个渐进式资料库知识撷取的方法
An Incremental Method for Knowledge Discovery in Databases
作者: 宋振裕
Sung, Jane-Yuh
黄书渊, 李素瑛
Shu-Yuen Hwang, Suh-Yin Lee
资讯科学与工程研究所
关键字: 知识撷取;资料库学习;非监督式学习;资料归纳;机器学习;资料筛选;knowledge discovery;KDD;machine learning;unsupervised learning;data mining;data summarization
公开日期: 1995
摘要: 由于资料及资料库大量且快速的增加,使得从资料库中发掘知识的研
究, 在近年来渐有增 加的趋势.如果我们能把隐含在资料库中的资讯发
掘出来,在许多方面将会有所帮助.如商 业决策,诈欺侦测,资料库纲要修
正,完整性执行,语意最佳化,智慧查询处理等等.为了使储 存的资料做最
有效的利用,对于资料库作知识撷取便显得有其必要了.对于资料库的知识
撷 取,基本上可分为两大类,即监督式学习和非监督式学习.监督式学习
又被称为分类规则的 发掘,而非监督式学习又被称为资料归纳或是特性
规则的发掘.本论文提出一种资料归纳的 方法,此方法由两个资料过滤术
函数及一个快速的规则发掘演算法所构成. 本方法包含三 个主要步
骤:1. 资料先经过筛选找出可能隐含知识的部份. 2.把选择后的资料送至
知识发 掘模组找出隐含的知识. 3. 被发掘的知识将被评估.递回的把我
们的方法应用于资料集, 最终可建立起一个树状结构,我们称之为特性
树,以此可作为知识归纳的表现方式,使发掘 的知识更易读且易于了解.
我们以另外两种方法,即Thought/KD1和Rough,和我们的方法做 比较.由
于知识发掘模组的计算复杂度低,我们的方法对于KDD而言,是可行且有效
率的.实 验结果同时也显示了我们的方法可以找出更多存在于资料库中
资料归纳的知识.
There is an increasing growing interest in knowledge discovery
in databases re search area due to the rapid increase in the
amount of data and databases. If we can discover the
information "hidden'' among the data,it will be helpful in many
aspects, such as business decision making, fraud detection,
database sche ma refinement, integrity enforcement, semantic
query optimization, and intelli gent query handling. Thus, it
becomes necessary to perform knowledge discovery in databases
to best utilize the stored data. There are two basic ways to per
form knowledge discovery in databases, supervised learning and
unsupervised le arning. Supervised learning in databases is
also called discovery of classific ation rules, and
unsupervised learning in databases is also called data summar
ization or discovery of characteristic rules. In this thesis, we
proposed a un supervised learning method of data summarization.
The method is base on two da ta filtering functions and a fast
rule discovery algorithm. Our discovery proc ess includes three
main steps. First, the data set is analyzed by data filteri ng
functions. Second, the selected part of data is sent to
discovery procedure . Finally, the rules are evaluated.
Recursively applying our method into the d ata set, we can
construct a tree structure, called characteristic tree, to sum
marize the data. We compare our method with two methods,
Thought/KD1 and Rough . Due to the fast discovery procedure
with low computational complexity, our m ethod is efficient and
feasible for knowledge discovery. Results also showed t hat our
method can find more useful knowledge for data summarization for
datab ases.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT840392001
http://hdl.handle.net/11536/60341
显示于类别:Thesis