完整後設資料紀錄
DC 欄位語言
dc.contributor.author薛丞宏en_US
dc.contributor.authorSih, Sing-Hongen_US
dc.contributor.author張智星en_US
dc.contributor.author易志偉en_US
dc.contributor.authorJang, Jyh-Shingen_US
dc.contributor.authorYi, Chih-Weien_US
dc.date.accessioned2015-11-26T00:55:16Z-
dc.date.available2015-11-26T00:55:16Z-
dc.date.issued2014en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#GT070156016en_US
dc.identifier.urihttp://hdl.handle.net/11536/125657-
dc.description.abstract臺灣是一个多元民族、多元語言的國家。 講母語、使用母語是上基本的權利, 毋過母語的電腦相關應用煞誠少, 需要加強自然語言處理的研究佮語料收集整理。 臺灣本土語言百百種, 本論文是針對閩南語, 研究伊翻譯語料的特性。 除了閩南語本身以外, 嘛希望研究結果對別的本土語言有幫助。 本論文提出一个自動整理漢語語料的方法, 予資訊無完整的語料庫補足資訊, 發揮上大的價值, BLEU分數對9.30搝到13.82。 另外閣用實驗證明平行語料數量無到十萬句的時, 加語料對翻譯的效果影響非常大, 原本64121句加到99147句了後, BLEU分數對13.82提昇到19.33。zh_TW
dc.description.abstractTaiwan is a multi-culture and multi-language country. Speaking in mother tongues is a basic human right, but there are few computer applications for mother languages. The applications are supported by corpus and research of natural language processing. There are many local languages in Taiwan. This thesis focuses on Southern Min Taiwanese, is major local language in Taiwan. It contains research into corpus preprocessing to get good performance in statistical machine translation. We wish it can help the computational linguistic research of other local language of Taiwan. This thesis introduces a method to preprocess the corpus whose information is lacking. After refining, the BLEU score is raised from 9.30 to 13.82. Experiments in this thesis show that translation performance is sensitive to the amount of parallel corpus when the amount of parallel corpus sentences is less than 100,000. The BLEU score raises from 13.82 to 19.33 as the amount of sentences increased from 64121 to 99147.en_US
dc.language.isozh_TWen_US
dc.subject臺灣閩南語zh_TW
dc.subject華語zh_TW
dc.subject翻譯zh_TW
dc.subject語料zh_TW
dc.subject斷詞zh_TW
dc.subject語言分類zh_TW
dc.subjectSouthern Minen_US
dc.subjectTaiwaneseen_US
dc.subjectMandarinen_US
dc.subjectTranslationen_US
dc.subjectCorpusen_US
dc.subjectSegmentationen_US
dc.subjectLanguage Identificationen_US
dc.title漢語間統計式機器翻譯語料處理-用臺灣閩南語示範zh_TW
dc.titleCorpus Preprocessing for Statistical Machine Translation between the Chinese Languages - Using Taiwan Southern Min as Examplesen_US
dc.typeThesisen_US
dc.contributor.department資訊科學與工程研究所zh_TW
顯示於類別:畢業論文