標題: 以Google搜尋引擎為基礎之中文剽竊偵測系統
Development of Chinese Plagiarism Detection System
作者: 張雅雯
Chang, Ya-Wen
柯皓仁
林妙聰
Ke, Hao-Ren
Lin, B.M.T.
資訊管理研究所
關鍵字: Google搜尋引擎;剽竊;最長共同子序列;Google search engines;Plagiarism;Longest Common Subsequence (LCS)
公開日期: 2009
摘要: 隨著資訊科技與網路的蓬勃發展,搜尋引擎強大的搜尋功能,讓資訊分享變得十分容易,但在使用者缺乏尊重他人智慧財產權觀念的情況下,網路資訊被隨意濫用的情形時有所聞。目前發展出許多不同的剽竊偵測方法,各有其優、缺點,但都針對較具有規則性的英文,而非較無規則性的中文,本研究以Google搜尋引擎為基礎建立中文剽竊系統,利用修正後的最長共同子序列(Longest Common Sequence, LCS)之概念計算搜尋引擎傳回結果與中文文件之間的相似度。實驗證明比未經修正的LCS公式,可大幅降低其假警報(False Positive)機率。期望藉由系統的實際運作,賦有教育意義地教導學生尊重他人智慧財產權。
With the advancement of information and network technology, powerful search engines facilitate information sharing. However, users who lack the concept of intellectual property rights usually abuse the information on the Internet. As so far, there are many plagiarism detection techniques, most of which focus on regular grammatical patterns in English. Few plagiarism-detection methods were developed for non-regular grammatical patterns like Chinese. This thesis builds a plagiarism detection system for Chinese documents. The proposed system is based on the search results of Google. Considering the concept of the revised longest common sequence (LCS), our system calculates the similarities between the results returned by Google and Chinese documents to be examined. The empirical studies show that the revised longest common sequence can significantly reduce the occurrences of false positives. We expect that the development of this system can teach students to respect intellectual property rights of others.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079734515
http://hdl.handle.net/11536/45480
Appears in Collections:Thesis


Files in This Item:

  1. 451501.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.