一個基於使用者個人資料與發佈文章的社群網站垃圾訊息散播者的偵測機制—以新浪微博為例

標題:	一個基於使用者個人資料與發佈文章的社群網站垃圾訊息散播者的偵測機制—以新浪微博為例 A Social Network Spammer Detection Mechanism Based on User’s Profile and Posts : A Case Study of Sina Weibo.
作者:	楊元齊羅濟群 Yang,Yuan-Chi Lo,Chi-Chun 資訊管理研究所
關鍵字:	社群網站;垃圾訊息散播者;機器學習;Social Network;Spammer;Machine Learning
公開日期:	2016
摘要:	近年來，隨著網際網路的普及與行動裝置的快速發展，社群網站(Social Network)在世界各地的使用者持續增加，進而衍生出社群垃圾訊息散播者(Social Network Spammer)的問題。社群垃圾訊息散播者為在社群媒體中，散播不當的垃圾訊息，內容包含了惡意連結、色情網頁、釣魚網站和廣告等干擾一般使用者正常使用社群媒體的環境。為了能夠減少垃圾訊息對於正常使用者的影響，給予一般使用者更加乾淨不受垃圾訊息騷擾的社群網站使用環境，本論文提出一個能夠對社群網路中的使用者進行是否為社群垃圾訊息散播者的偵測機制，根據某個使用者在社群網路中的個人資料與其所發布的文章內文中萃取出 20 項特徵，例如計算文章相似度、名聲等，並輸入機器學習的分類演算法來訓練分類模型，透過訓練後的模型進行使用者是否為垃圾訊息散播者或一般使用者的自動分類，實驗結果顯示，本論文所提出的社群垃圾訊息散播者的偵測機制在對新浪微博中使用者進行社群垃圾訊息散播者的偵測時，可達到96.25% 的分類準確率(Accuracy)、96.3% 的精確度(Precision)、96.2% 的召回率(Recall)與96.2% 的F-measure。與現有研究方法相比，準確率從平均的 89.79% 提升至 96.25%，本論文所提出之機制確實能夠提升偵測出社群垃圾訊息散播者的準確率。 With the advent of the internet age and the increasing popularity of mobile devices, the social network users grow rapidly in the world. Thus, a new kind of social network users, called the social network spammer, has been flourished. Social network spammers spread spam messages in several ways like malicious links, pornographic websites and advertisement messages in social network to interfere the environment for normal users. This thesis proposes a social network spammer detection mechanism according to users’ personal profile and published posts. The proposed mechanism extract 20 features, for instance post similarity and reputation etc., then applies into classification algorithms of machine learning to train model, and according to the trained model to automatically identify whether a user is a social network spammer or a normal user. We expect to reduce the interference of spam on normal users, giving them a safer and more comfortable social network environment from spam annoyance. Experimental results show that the proposed spammer detection mechanism reaches 96.25% accuracy rate, 96.3% precision, 96.2% recall and 96.2% F-measure when detecting whether the users are social network spammers or normal users in Sina Weibo. Compare to the existing researches, the proposed mechanism is able to improve the accuracy rate from the average 89.79% to 96.25%.
URI:	http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070353423 http://hdl.handle.net/11536/143473
顯示於類別：	畢業論文