標題: 以網路為主之英對中專有名詞翻譯萃取
Empirical Approach to Resolving English to Chinese Named Entity Translation
作者: 張晨輝
Jhang, Chen-Huei
梁婷
Liang, Tyne
資訊科學與工程研究所
關鍵字: 實體名稱翻譯;機器翻譯;網路;自然語言處理;named entity translation;machine translation;web-based;natural language processing
公開日期: 2011
摘要: 專有名詞翻譯的成效影響許多自然語言處理的應用,例如跨語言資料檢索、機器翻譯、與自動問答系統等。由於網路資源豐富且更新迅速,近年專有名詞翻譯研究多利用搜索引擎回傳的網頁片段萃取翻譯候選詞,並根據候選詞與專有名詞在搜尋結果中的頻率、距離與二者的詞長比例等特徵,使用監督式學習模組或非監督式學習排選候選詞。有鑑於各領域的專有名詞有各自的命名規則,而先前研究較少考慮此點,因此本論文提出利用搜尋結果萃取翻譯候選詞並以命名規則協助搜尋詞擴展與候選詞評量。 在本論文中,我們考量四個領域的英對中譯名,分別是書名、電影名、醫藥名、和公司名等。所提的方法分三個階段進行:首先,我們使用13種特徵並以支援向量機模組(SVM)進行專有名詞領域辨識;然後,根據已定義好的領域命名規則做搜尋詞擴展;最後,我們利用制定好的表面樣式萃取候選詞,且依造頻率與命名規則排序候選詞。在實驗中,我們測試 3315筆名稱,以排序第一的候選詞即為正確翻譯的機率可達到82.3%。 關鍵字: 實體名稱翻譯、機器翻譯、網路、自然語言處理
Name Entity translation plays an important role in many natural language processing (NLP) applications, such as machine translation, cross-language information retrieval, and question answering. With rich web information, many previous researches have employed with web resources, and search results. However, naming rules for the translating in domains are not concerned in most previous researches. In this thesis, we proposed an approach based on extracted translations from search results and considered naming rules for query expansion and translation candidate evaluation. In this thesis, we extracted translations of name entities in four categories, namely, book, movie, medicine, and company. The proposed approach was implemented in three steps. We extracted features and identified name entities using support vector machine. Then, we applied pre-defined naming rules for different types of entities to expand queries with the purpose to require more relevant results. Finally, we extracted translation candidates by defined surface patterns and evaluated candidates. From the experiment results, the proposed approach yielded 82.3% accuracy of average top-1 inclusion rate. Keyword: named entity translation, machine translation, web-based, natural language processing
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079955615
http://hdl.handle.net/11536/50523
顯示於類別:畢業論文


文件中的檔案:

  1. 561501.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。