完整後設資料紀錄
DC 欄位語言
dc.contributor.author黃福祥en_US
dc.contributor.authorFu-Hsiang Huangen_US
dc.contributor.author林盈達en_US
dc.contributor.authorYing-Dar Linen_US
dc.date.accessioned2014-12-12T02:04:47Z-
dc.date.available2014-12-12T02:04:47Z-
dc.date.issued2003en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#GT009123562en_US
dc.identifier.urihttp://hdl.handle.net/11536/53179-
dc.description.abstract即時性的內容分析具有低維護成本及低空間需求性的特色,因此對網頁內容過濾來說是一種非常重要的技巧,但其同時也有準確度較低及處理時間過長的問題。由於多語系網頁的影響,相對也影響了準確度,因此我們嘗試以N-gram的演算法訓練樣本並找出關鍵字加入到內容過濾器中,評估以加入關鍵字的方式影響準確度的程度。此外,我們提出及早決策的演算法,此演算法包含兩部份,分別稱為及早阻擋和及早通過。前者在分類過程中一旦有足夠條件證明標的網頁屬於禁止類別便予以阻擋。反之,後者在發現標的網頁應屬於正常類別時,就會做出及早通過的決定。實驗結果顯示,在使用Pentium III 1GHZ CPU及NetBSD 1.6的作業系統環境下,我們提出的方式較原始的方式在傳輸效能上提升六倍,而在傳輸延遲上改善了三倍以上。同時在阻擋率從原來70%提升到99%。zh_TW
dc.description.abstractReal-time content analysis is an important technique in Web content filtering and has two advantages: low maintenance cost and low storage requirement. However, it may also suffer lower accuracy and longer processing time. Because Web pages in different languages can complicate content analysis, we try to extract keywords from training samples by the N-gram algorithm and evaluate the accuracy. To shorten the processing time, we propose the early decision algorithm that has two parts: early blocking and early bypassing. The former algorithm allows making the blocking decision as early as we have enough confidence that the Web page should belong to a forbidden category, while the latter helps to make the bypassing decision as soon as the Web page is considered a normal one. Experiments performed on NetBSD 1.6 with Pentium III 1GHZ CPU show our algorithm can improve the throughput about six times higher than the original and reduce the latency by two thirds. Furthermore, the blocking ratio is raised from 70% to 99%.en_US
dc.language.isoen_USen_US
dc.subject內容過濾zh_TW
dc.subject文件分類zh_TW
dc.subjectN-gramzh_TW
dc.subject及早阻擋zh_TW
dc.subject及早通過zh_TW
dc.subjectcontent filteringen_US
dc.subjecttext classificationen_US
dc.subjectN-gramen_US
dc.subjectearly blockingen_US
dc.subjectearly bypassingen_US
dc.title一個針對多語系網頁內容過濾的快速精確之代理伺服器zh_TW
dc.titleA Fast Accurate Proxy for Multi-Language Text Webpage Classificationen_US
dc.typeThesisen_US
dc.contributor.department資訊科學與工程研究所zh_TW
顯示於類別:畢業論文


文件中的檔案:

  1. 356201.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。