標題: 兩階段演算法應用於電子郵遞近似性關連規則之探勘研究
A Two-phase Algorithm for Association Rules Mining with Approximation on E-Mail Log
作者: 游適彰
Shih-Chang Yu
曾憲雄
Shian-Shyong Tseng
資訊學院資訊學程
關鍵字: 電子郵件記錄檔探勘;近似演算法;垃圾郵件;病毒郵件;資料探勘;e-mail log mining;approximation algorithm;spam mail;virus mail;data mining
公開日期: 2001
摘要: 隨著網際網路的盛行,電子郵件的使用日趨頻繁。然而垃圾郵件、商業廣告與病毒夾帶的郵件也因此層出不窮。一般的解決辦法是在使用者端執行郵件過濾管理功能,然而大多數的使用者並沒有太多的專業知識背景足以追查這些不請自來的郵件來源,甚至設定完善的過濾規則。即使啟動了過濾機制亦可能因設定錯誤造成本身使用上的不便。所以將過濾條件與控管機制交由伺服器端執行,才能避免上述問題的缺點。在此篇論文中,我們針對電子郵件記錄檔進行近似性資料探勘的研究,以協助系統管理員處理垃圾郵件與病毒郵件肆虐的問題。我們研究的重點在於利用兩階段資料探勘的方法,以近似模式進行關聯組合屬性分析,配合探索式規則以發掘電子郵件紀錄檔之中所蘊含的郵件傳送行為模式,目標是在即時有效的第一時間內偵測出大量傳送與異常發生的行為記錄分析結果。首先在第一階段,我們使用資料探勘前置處理流程,擷取記錄檔中重要的屬性欄位儲存至資料庫,作為後續研究的基礎來源。第二階段則是採用近似性資料探勘方法,針對大量異常與出現頻度較高的關聯組合進行分析。我們所獲得的分析結果將會記錄至關聯規則資料庫,進而提供系統管理員進行有效的管理決策參考依據,以防止類似行為的持續發生。
As e-mail service becomes popular on Internet, general problems such as UBE/UCE and virus mails have occurred more and more. Many client-side facilities have been developed to help users deal with such problems. However, since most users do not have enough resources and expertise to track theses abusing and make the necessary changes adaptively, few (if any) could benefit from applying these facilities. In general, if the filtering process could be done at the servers, these drawbacks could be avoided. In this thesis, we propose an approximation algorithm for mining e-mail logs to help deal with the anti-SPAM and anti-virus problems. The focus of our work is to apply the two-phase incremental mining processes with heuristic rules on e-mail logs for locating the embedded patterns of massively abnormal e-mail transactions in near real-time. In the first phase, we will make the ECTL preprocessing for extracting important attributes from the e-mail log and put them into the database for later use. In the second phase, we will apply the incremental mining algorithm with approximation to find the suspected outliers with massively anomalous e-mail transactions. The results could be integrated into the rule-base and utilized by the related system administrators for further preventing these kinds of abusing activities adaptively.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT901706019
http://hdl.handle.net/11536/69651
顯示於類別:畢業論文