標題: 一個基於雲端計算的跨語言抄襲檢測系統
A Cross Language Plagiarism Detection Based on Cloud Computing
作者: 阮登明
Nguyen Dang Minh
袁賢銘
Shyan-Ming Yuan
資訊科學與工程研究所
關鍵字: 抄襲偵測;跨語言抄襲偵測;雲端計算;MapReduce;plagiarism detection;cross-language plagiarism detection;cloud computing;MapReduce
公開日期: 2013
摘要: 近年來,尤其是學生族群而言,對於語言近理解的障礙是越來越低。大多數的學生至少懂得另一種外國語言。而近年來,雲端越來越熱門;越來越多的電腦翻譯服務可以直接透過網路來使用。這些服務不但快、準確、而穩定,還能同時支援多種語言。這些服務甚至還可被嵌進瀏覽器或手機中,使得使用上更方便。然而,對學生而言,這是最方便的方式來達到跨語言的抄襲。相對的,跨語言的抄襲也是最難被偵測的。
在這篇碩士論文中,我們使用了雲端服務及雲端計算的模型,來偵測一篇短文被抄襲的程度。我們從越南的一個翻譯網站及越南當地的新聞提供者下載文章,當作測試的資料庫。這些文章,會先被使用網路上的翻譯服務將原始文章翻譯成英文。再來,每個翻譯出的句子,分別送到搜尋引擎,找出相關的文章。我們將句子與找出的文章,一段一段地比對,找出這句子是跨語言抄襲的程度。最後,再計算出該文章是抄襲的程度。實驗的結果顯示偵側的準確度略高於0.5。
Nowadays, language barrier is getting lower, especially to students. Most of them know at least one foreign language. Even that, translation through the cloud is getting popular. There are many machine translation services can be access easily via Internet. These services are accuracy and support many languages. They are even be embedded by default in web browser of users and in mobile devices. This is easiest way for a student to translate a sentence from any language, and of course easy for cross-language plagiarism, too. It makes plagiarism is getting difficult to detect.
In this thesis, we exploited cloud services and cloud computing model for detecting cross-language plagiarism. We downloaded a translation of the article from a website of Vietnam and Vietnamese local news provider then used as a test dataset. These articles were first translated into the English using translation services on the Web. Again, each translated sentences were sent to the web search engine to find related articles. Then, we compared this sentence with each sentence in the related articles to find out its score of plagiarism. Finally, we computed the score of plagiarism of this article. Experiment results show that the accuracy of detection is slightly higher than 0.5.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT070156144
http://hdl.handle.net/11536/75714
顯示於類別:畢業論文