抗污染的Android惡意軟體異常檢測系統

標題:	抗污染的Android惡意軟體異常檢測系統 An Anomaly-based Android Malware Detection System Resisting Polluted Training Data
作者:	楊偉晨黃育綸 Yang, Wei-Chen 電控工程研究所
關鍵字:	手機;惡意軟體;檢測系統;分群法;android;malware;detection system;clustering;K-means
公開日期:	2017
摘要:	近年來Android智慧型裝置快速成長，伴隨著越來越多Android手機軟體出現在市場上， Android平台成功的同時也吸引了許多惡意軟體開發者的注意。惡意軟體可以藉由資訊竊取以及錢財損失的方式讓手機使用者們受到許多傷害。由於Android平台面臨到惡意軟體的巨大威脅，許多研究者於是致力於惡意軟體檢測系統的研究。惡意軟體檢測系統是手機安全的一個重要元素，能夠抓取手機軟體的特徵行為並由此判斷這個軟體是否擁有惡意舉動。在本篇論文中，我們提出了一個抗污染的惡意軟體檢測系統。我們的惡意軟體檢測系統利用訓練軟體的特徵值系統化地找出適當的分群群數，並定義適當的閥值，在本文中，我們更排除特徵異常的分群，以提升本檢測系統的抗汙染能力。我們的系統包含兩個階段：訓練階段以及測試階段。在訓練階段，我們取出正常軟體的特徵值並依此訓練出一個模型，特徵值代表了單一軟體的行為，而模型則代表了全數正常軟體的標準行為。行為異常之訓練資料將被排除，以維護檢測的正確性。在測試階段，我們同樣取出測試軟體的特徵值，並以此跟我們的模型進行比較，判斷測試軟體是否有異常行為。為了測試我們系統效能，我們設計了四組實驗以2089隻正常軟體和355隻惡意軟體來驗證我們的系統。我們的實驗包括：不同的特徵選擇，不同的閥值定義，汙染軟體對系統的影響以及系統抗污染的能力。第一個實驗我們判定甚麼樣的特徵選擇能達到最好效果，特徵包含了permission以及API call，實驗結果permission加上一些關鍵API call能達到最好的系統表現。第二個實驗我們比較靜態閥值以及動態閥值的表現，實驗結果動態閥值表現較為優秀。第三個實驗我們測試汙染對系統的影響，實驗證實汙染會降低系統的表現。第四個實驗我們測試我們提出的抗污染方法，我們比較使用前及使用後系統表現，發現使用抗污染方法之後，系統能有效對抗遭受污染的訓練資料，減低污染影響。根據所有實驗結果，我們的系統能夠偵測零時差惡意軟體，並擁有適當的閾值定義以及抗污染的能力。 Due to the rising population of Android smart devices, the number of the Android applications also grows rapidly. The success of Android platform draws the attention of malware developers. Malware causes large damage to mobile users by information stealing and financial charging. Since the Android platform facing a massive threat against malware, many researchers have dedicated themselves into designing modern malware detection system to analyse an application's behavior and judge if an application behaves maliciously. In this thesis, we design an anomaly-based Android malware detection system resisting polluted training data. Our system can systematically find proper number of clusters and define proper threshold. In this thesis, we also exclude the anomaly clusters to improve the resistance against polluted training applications. Our system contains two phases: training phase and testing. In the training phase, we extract features of the applications and generate a model. The feature represents the behavior of the application. The model represents the standard behavior of all benign applications. Training applications with odd behavior will be excluded to maintain system's accuracy. In the testing phase, we extract the feature of testing applications and determine anomalism based on the model we generated in the training phase. To evaluate our system, we design four experiments with 2089 benign and 355 malware samples. In the first experiment we evaluate the influence of feature selection. Better performance is obtained when applying "Permission and key API". In the second experiment, we prove that dynamic threshold is better than static threshold. In the third experiment, we evaluate the performance downgrade caused by polluted training data. In the last experiment we prove our system effectively reduce the influence of polluted training data. Based on all the results of experiments, our system can detect zero-day malware, and resist against the polluted training data.
URI:	http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070460028 http://hdl.handle.net/11536/141653
Appears in Collections:	Thesis