完整後設資料紀錄
DC 欄位語言
dc.contributor.author林志鴻en_US
dc.contributor.authorJyh-Horng Linen_US
dc.contributor.author林珊如en_US
dc.contributor.author劉旨峰en_US
dc.contributor.authorDr. Sunny S. J. Linen_US
dc.contributor.authorDr. Eric Zhi-Feng Liuen_US
dc.date.accessioned2014-12-12T01:30:44Z-
dc.date.available2014-12-12T01:30:44Z-
dc.date.issued2003en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#GT009073539en_US
dc.identifier.urihttp://hdl.handle.net/11536/42613-
dc.description.abstract隨著網路上文件的等比級數增加,如何精確地找出所需要文件成為了重要的議題。在本文中,參酌自動化文件分類的相關研究,提出了利用向量模型對中文敵意文件的分類程序與方法。從學術網路BBS站的硬體討論版(tw.bbs.comp.hardware)抽樣5000篇文章,先以人工分類方式,將文章依敵意的程度分類後,再進行自動分類實驗,先輸入數篇文章,由系統分析出文章的關鍵詞,並計算權重,建立敵意文章中心向量,再依據輸入的文章會計算出與敵意文章相似度,最後將相似度高於門檻值的文章判定為敵意文章,其他則為非敵意文章,研究發現: (1)利用同一主題文章作為訓練文章,來計算敵意與非敵意文章與敵意中心向量的相似度時,其相似度具有明顯差異。 (2)訓練文章的主題不同時,所計算出的相似度亦有差距。 (3)利用門檻值實驗計算出的最佳門檻值0.17來進行分類時,對於非敵意文章有較佳的精確度,約為0.98,但對於敵意文章的分類精確度則較差,約為0.25。 (4)當門檻值降低至0.136時,可提高HR值至0.72。zh_TW
dc.description.abstractWith the increasing of Website documents drastically, how to precisely find what are needed documents turns to be an important issue. In this article by referring to relevant study on the automatic document classification, it brings out to utilize the vector model to classify and process Chinese hostile documents.By sampling 5000 articles from the hardware discussion board in the academic BBS (tw.bbs.comp.hardware), we classify them by manual first, based on the degree of the hostile.Later on, proceeded automatic classification experiment. By ntering several articles in the beginning, the system can analysis key terms, and calculate the term weight ratio in order to establish the central vector of the hostile articles.Then, this system can calculate the similarity by comparing with the build-in hostile articles. Finally, if an article's similarity is higher than the threshold, then it will be classified into hostile articles. Other than that, it will be classified into articles without hostile. Some observations found through this study as following: 1. By using the same topic of articles for the purpose of training articles to calculate the similarity of the hostile central vector between hostile and unhostile articles, the similarity was obviously different. 2. When the topic of training articles was different, the similarity was different. 3. When using an optimum threshold value 0.17 to proceed classification, it came out a better accuracy for the articles without hostile with about 0.98, but got a worse classification accuracy for the hostile articles, about 0.25. 4.We can get better HR by decreasing the threshold.en_US
dc.language.isozh_TWen_US
dc.subject資訊檢索zh_TW
dc.subject向量模型zh_TW
dc.subject論戰zh_TW
dc.subject敵意zh_TW
dc.subject文件分類zh_TW
dc.subjectinformation retrievalen_US
dc.subjectvector space modelen_US
dc.subjectflameen_US
dc.subjecthostilityen_US
dc.subjectdocument classificationen_US
dc.title自動化文章敵意分級系統之初探研究zh_TW
dc.titleA Pilot of Automatic Sorting System with Hostile Articlesen_US
dc.typeThesisen_US
dc.contributor.department理學院科技與數位學習學程zh_TW
顯示於類別:畢業論文


文件中的檔案:

  1. 353901.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。