標題: 網頁內容過濾初期阻擋與通過之加速演算法
Early Blocking and Bypassing for Accelerating Web Content Filtering
作者: 劉明道
Ming-Dao Liu
林盈達
Ying-Dar Lin
資訊科學與工程研究所
關鍵字: 內容過濾;文件分類;Naive Bayes;及早阻攔;及早忽略;content filtering;text classification;Naive Bayes classification;early blocking;early bypassing
公開日期: 2002
摘要: 即時性的內容分析具有低維護成本及低空間需求性的特色,因此對網頁內容過濾來說是一樣非常重要的技巧。但其同時也有準確度較低及處理時間過長的問題。因此我們針對Naïve Bayes方法提出兩種加速其分類過程的演算法,分別稱做及早阻攔和及早忽略。前者在分類途中一但有足夠證據證明標的文件屬於某禁止類別,就可以及早做出禁止的判斷。反之,後者在發現標的文件應屬於正常文件時,就會及早做出忽略的決定。實驗結果顯示,在使用Pentinum III 700 MHz CPU及NetBSD 1.6的作業系統環境下,我們提出的演算法比較於原始的Bayesian分類演算法,傳輸效能可以提升四倍以上。同時F1評估方法顯示此時仍可維持相當好的判斷準確度,在禁止流量中可達92%,在正常流量中可達96%。
Real-time content analysis is an important technique in Web content filtering. However, it may also suffer lower accuracy and longer processing time. In this work, we present two algorithms named early blocking and early bypassing based on the Naïve Bayes method to accelerate the classification process. The former algorithm allows making the blocking decision as early as we have enough confidence that the Web document should belong to some forbidden category, while the latter helps to make the bypassing decision as soon as the document is considered as a normal one. Experiments performed on NetBSD 1.6 with Pentinum III 700 MHz CPU show our algorithms can improve the throughput over four times higher than the original Bayesian classifier, while the F1 measure shows that the accuracy remains fairly good: 92% in forbidden traffic and 96% in normal traffic.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT910394047
http://hdl.handle.net/11536/70218
顯示於類別:畢業論文