標題: | An Online Subject-Based Spam Filter Using Natural Language Features |
作者: | Lee, Chih-Ning Chen, Yi-Ruei Tzeng, Wen-Guey 資訊工程學系 Department of Computer Science |
關鍵字: | Spam filter;Email;Subject;Naive Bayesian;Natural language |
公開日期: | 1-Jan-2017 |
摘要: | This paper proposes an online subject-based spam filter built upon an extended version of weighted naive Bayesian (WNB) classifier. The spam filter checks email subjects only. It is faster than spam filters that scan whole body of emails and useful even spam senders temper email bodies to avoid filtering. In addition to the widely used bag-of-word feature, we further consider statistical and nature language features to discover new characteristics from email subjects. In online learning, we use an extended WNB classifier. It is not only computationally efficient, but also more adaptive to the changes of spams with new malicious campaigns. The proposed classifier is immune to the spams with malicious campaigns beyond contemplation. We evaluate the performance of our spam filter on 8 well-known ham-spam email datasets from TREC and Enron-Spam corpus. Our approach achieves 94.85% of accuracy and 95.8% of F1-measure on TREC datasets, and 95.74% of accuracy and 97.2% of F1-measure on Enron-Spam datasets. Compared with previous works of the same line, our approach has 2.43%, 2.3%, and 3.2% improvements on accuracy, true positive rate, and false positive rate, respectively. |
URI: | http://hdl.handle.net/11536/150831 |
期刊: | 2017 IEEE CONFERENCE ON DEPENDABLE AND SECURE COMPUTING |
起始頁: | 479 |
結束頁: | 484 |
Appears in Collections: | Conferences Paper |