標題: | 設計高效能之頻繁封閉項目集維護演算法 Designing Efficient Mining algorithms for Frequent Closed Itemsets Maintenances |
作者: | 邱成樑 Cheng-Liang Chiu 曾憲雄 Shian-Shyong Tseng 資訊科學與工程研究所 |
關鍵字: | 封閉項目集;漸進式資料探勘;關聯式規則;資料探勘;closed itemsets;incremental mining;association rules;data mining |
公開日期: | 2004 |
摘要: | 如何從龐大且複雜的資料庫中萃取出有用的知識與資料是近來非常熱門的研究課題。在真實生活中,新資料的加入使得資料探勘的結果不斷改變。當新的資料新增時為了避免重新處理整個資料庫來獲得新的探勘結果;一些漸進式的資料探勘演算法利用儲存之前的探勘結果來降低新增資料時所需要處理的時間。
然而,傳統的漸進式資料探勘技術遇到頻繁項目集(Frequent Itemsets)長度很長時,大量頻繁項目集的保存將使得漸進式資料探勘有實做上的困難。此外,傳統的漸進式資料探勘技術每次都必須消耗額外的存取動作重新掃描一次資料庫以確定是否有新的頻繁項目集產生。
有鑑於此,此篇論文將封閉項目集(Closed Itemsets)與準大項目集(Pre-large Itemsets)的概念導入到漸進式資料探勘;封閉項目集的概念是在沒有遺失任何資訊的情況下將資料進行壓縮,透過此優點來將漸進式資料探勘預先保存的資料進行壓縮。而準大項目集則是用一種緩衝區的概念來避免項目集直接從有效項目集被判定為無效項目集,且反之亦然;這樣可避免常對資料庫進行重新掃描。基於這兩個概念,我們提出封閉項目集維護演算法(CIM)與準大封閉項目集維護演算法(CIM-P)兩個新的漸進式資料探勘演算法,CIM利用封閉項目集來有效取得探勘結果,減少維護於記憶體中的項目集。而CIM-P則利用緩衝區來降低CIM在每次加入新資料時,必須重新掃描資料庫的機率。 Recently, mining association rules from transaction databases has been one of the most interesting and popular research topics in data mining. In real-world applications, a database grows over time such that existing association rules may become invalid or new implicitly valid association rules may appear. Some researchers have thus developed incremental mining algorithms to maintain association rules without re-processing the entire database whenever the database is updated. The common idea among these approaches is to store previously mined itemsets in advance for later use. However, for a dense database, the performance of classical incremental mining algorithms will degrade dramatically due to a huge amount of pre-stored mining information. On the other hand, most incremental mining algorithms are required one scan of original database to discover new implicitly valid rules. When the original database is massive, this will result in excessive I/O cost. In this study, we attempt to utilize the concepts of closed itemsets and pre-large itemsets dealing with the two challenges, respectively. The closed itemsets can losslessly determine all the pre-stored mined itemsets and their exact support, but are orders of magnitude smaller than all pre-stored patterns. The pre-large patterns act as a buffer to avoid the movements of itemsets directly from valid to invalid and vice-versa when the database maintained. Based on the two concepts, two novel incremental mining algorithms called Closed Itemsets Maintenance (CIM) and CIM with Pre-large concept (CIM-P) are thus developed to efficiently maintain association rules, especially in a dense database. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT009223567 http://hdl.handle.net/11536/76617 |
顯示於類別: | 畢業論文 |