标题: | 在大型资料库中探勘容错频繁样式 Mining Fault-Tolerant Frequent Patterns in Large Databases |
作者: | 王圣舜 Sheng-Shun Wang 李素瑛 Suh-Yin Lee 资讯科学与工程研究所 |
关键字: | 资料探勘;容错频繁样式;Data mining;Fault tolerant frequent pattern;FT-contain;support |
公开日期: | 2001 |
摘要: | 有鉴于现实世界中的资料来源可能会因受到杂讯的干扰而导致资料的不正确。此外, 我们可能希望所找出来的资讯是较具普遍性。因此, FT-Aprori 用来解决在大型现实资料库中容错频繁样式(fault-tolerant frequent pattern)的探勘。然而,FT-Apriori是根据Apriori 性质采用候选样式产生及测试(candidate generating-and-testing)方式,因此较无效率。 在本篇论文中,我们采用样式成长(pattern growth)的方式发展较有效率的容错频繁样式探勘演算法FTP-mine。首先我们考虑所有的资料都能储存在记忆体的FTP-mine,我们设计一个STable用来计算具有相同字首样式的item support 和 FT-support. 对于无法将资料储存于记忆体的大型资料库, FTP-mine 利用切割资料库的方式来探勘容错频繁样式. 此外,考虑到所找出来的容错频繁样式可能会很多而且会有互相包含的情况,因此我们延伸FTP-mine 找出最大的容错频繁样式。 在本论文中,我们针对不同的support、tolerance和scalability设计各种实验,从结果可以看出FTP-mine在各种条件下都比FT-Apriori有较好的执行效率和linear scalability。 In view of real world data may be interfered with noise which leads data to contain faults. The data mining methods proposed previously may not be applicable. Besides, we may hope that the knowledge discovered is more general and can be applied to find more interesting information. Hence, FT-Aprori was proposed for fault-tolerant data mining to discover information over large real-world data. However, FT-Apriori which generates and tests candidates based on Apriori property is not so efficient. In this paper, we develop memory-based algorithm FTP-mine which is based on the concept of pattern growth to mine fault-tolerant frequent patterns efficiently. In FTP-mine, the table, STable, is designed to count the item support and FT-support of the k-length patterns which have the same prefix of length k-1 by comparing transaction once. As to mining in a large database which is too large to fit in memory, FTP-mine also can be adopted by means of database partition. In addition, since there might exist a large number of fault tolerant frequent patterns and some may be contained in others, we also focus on the finding of maximal FT-frequent patterns by extending the FTP-mine algorithm. Our study shows that FTP-mine has higher performance than FT-Apriori in all kinds of parameter settings, such as various supports, tolerance, and scalability. The empirical evaluations show that the proposed method has good linear scalability and outperforms FT-Apriori in the discovery of FT-frequent pattern. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#NT900392051 http://hdl.handle.net/11536/68465 |
显示于类别: | Thesis |