An Online Subject-Based Spam Filter Using Natural Language Features

標題:	An Online Subject-Based Spam Filter Using Natural Language Features
作者:	Lee, Chih-Ning Chen, Yi-Ruei Tzeng, Wen-Guey 資訊工程學系 Department of Computer Science
關鍵字:	Spam filter;Email;Subject;Naive Bayesian;Natural language
公開日期:	1-Jan-2017
摘要:	This paper proposes an online subject-based spam filter built upon an extended version of weighted naive Bayesian (WNB) classifier. The spam filter checks email subjects only. It is faster than spam filters that scan whole body of emails and useful even spam senders temper email bodies to avoid filtering. In addition to the widely used bag-of-word feature, we further consider statistical and nature language features to discover new characteristics from email subjects. In online learning, we use an extended WNB classifier. It is not only computationally efficient, but also more adaptive to the changes of spams with new malicious campaigns. The proposed classifier is immune to the spams with malicious campaigns beyond contemplation. We evaluate the performance of our spam filter on 8 well-known ham-spam email datasets from TREC and Enron-Spam corpus. Our approach achieves 94.85% of accuracy and 95.8% of F1-measure on TREC datasets, and 95.74% of accuracy and 97.2% of F1-measure on Enron-Spam datasets. Compared with previous works of the same line, our approach has 2.43%, 2.3%, and 3.2% improvements on accuracy, true positive rate, and false positive rate, respectively.
URI:	http://hdl.handle.net/11536/150831
期刊:	2017 IEEE CONFERENCE ON DEPENDABLE AND SECURE COMPUTING
起始頁:	479
結束頁:	484
Appears in Collections:	Conferences Paper