漢語間統計式機器翻譯語料處理－用臺灣閩南語示範

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.author	薛丞宏	en_US
dc.contributor.author	Sih, Sing-Hong	en_US
dc.contributor.author	張智星	en_US
dc.contributor.author	易志偉	en_US
dc.contributor.author	Jang, Jyh-Shing	en_US
dc.contributor.author	Yi, Chih-Wei	en_US
dc.date.accessioned	2015-11-26T00:55:16Z	-
dc.date.available	2015-11-26T00:55:16Z	-
dc.date.issued	2014	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#GT070156016	en_US
dc.identifier.uri	http://hdl.handle.net/11536/125657	-
dc.description.abstract	臺灣是一个多元民族、多元語言的國家。講母語、使用母語是上基本的權利，毋過母語的電腦相關應用煞誠少，需要加強自然語言處理的研究佮語料收集整理。臺灣本土語言百百種，本論文是針對閩南語，研究伊翻譯語料的特性。除了閩南語本身以外，嘛希望研究結果對別的本土語言有幫助。本論文提出一个自動整理漢語語料的方法，予資訊無完整的語料庫補足資訊，發揮上大的價值， BLEU分數對9.30搝到13.82。另外閣用實驗證明平行語料數量無到十萬句的時，加語料對翻譯的效果影響非常大，原本64121句加到99147句了後， BLEU分數對13.82提昇到19.33。	zh_TW
dc.description.abstract	Taiwan is a multi-culture and multi-language country. Speaking in mother tongues is a basic human right, but there are few computer applications for mother languages. The applications are supported by corpus and research of natural language processing. There are many local languages in Taiwan. This thesis focuses on Southern Min Taiwanese, is major local language in Taiwan. It contains research into corpus preprocessing to get good performance in statistical machine translation. We wish it can help the computational linguistic research of other local language of Taiwan. This thesis introduces a method to preprocess the corpus whose information is lacking. After refining, the BLEU score is raised from 9.30 to 13.82. Experiments in this thesis show that translation performance is sensitive to the amount of parallel corpus when the amount of parallel corpus sentences is less than 100,000. The BLEU score raises from 13.82 to 19.33 as the amount of sentences increased from 64121 to 99147.	en_US
dc.language.iso	zh_TW	en_US
dc.subject	臺灣閩南語	zh_TW
dc.subject	華語	zh_TW
dc.subject	翻譯	zh_TW
dc.subject	語料	zh_TW
dc.subject	斷詞	zh_TW
dc.subject	語言分類	zh_TW
dc.subject	Southern Min	en_US
dc.subject	Taiwanese	en_US
dc.subject	Mandarin	en_US
dc.subject	Translation	en_US
dc.subject	Corpus	en_US
dc.subject	Segmentation	en_US
dc.subject	Language Identification	en_US
dc.title	漢語間統計式機器翻譯語料處理－用臺灣閩南語示範	zh_TW
dc.title	Corpus Preprocessing for Statistical Machine Translation between the Chinese Languages - Using Taiwan Southern Min as Examples	en_US
dc.type	Thesis	en_US
dc.contributor.department	資訊科學與工程研究所	zh_TW
顯示於類別：	畢業論文