Full metadata record
DC FieldValueLanguage
dc.contributor.author李安琪en_US
dc.contributor.authorLee, An-Chien_US
dc.contributor.author陳信宏en_US
dc.contributor.authorChen Sin-Horngen_US
dc.date.accessioned2014-12-12T02:15:42Z-
dc.date.available2014-12-12T02:15:42Z-
dc.date.issued1995en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#NT840435009en_US
dc.identifier.urihttp://hdl.handle.net/11536/60759-
dc.description.abstract在本論文中﹐主要是研究中文語音辨認系統中音轉字的語言模型,包括詞 雙連文、詞類雙連文和詞群雙連文三種語言模型。我們以實作系統的觀點 ,分別對其語言參數的訓練方式及音轉字系統運作的效率進行評估,並建 立了初步的音轉字架構。在我們的實驗中,使用58362詞的詞庫,約700萬 詞的訓練語料庫和76萬字的測試語料庫,詞雙連文模型的平均音轉字正確 率為 94.7%,詞群雙連文模型為93.25%,而詞類雙連文則達91.3%,並 在加入破音字統計資訊後,正確率也有0.23%的提昇。另外,配合口語語 音聲調上的改變,我們也設計一判別法則,使音轉字系統更具包容性。 In the thesis, a first study on Chinese language model for syllable-to- character is presented. Three statistical models are discussed. First, a word bigram model with 58326 word entries is constructed using a large corpus containing about 5 million words. A POS bigram model with 46 POS entries is then constructed using a manually-tagged corpus containing about 2 million words.Lastly, the scheme using word-class bigram model is studied. An algorithm aiming at minimizing mutual information is employed to auto- matically generate all word classes by using the first corpus of 5 million words. A testing database containing about 760,000 characters was used to examine their performances. Character accuracy rates of 94.7%, 93.25% and 91.3% were obtained by these three models, respectively. Further improve- ments to consider the sandhi rule of Tone 3 change and the problem of Po- in character for monosyllabic words are also studied. Experimental results showed that slight performance improvements were achieved.zh_TW
dc.language.isozh_TWen_US
dc.subject語言模型zh_TW
dc.subject雙連文zh_TW
dc.subject詞類zh_TW
dc.subject詞群zh_TW
dc.subject語料庫zh_TW
dc.subject詞庫zh_TW
dc.subjectlanguage modelen_US
dc.subjectbigramen_US
dc.subjectpart-of-speechen_US
dc.subjectword-classen_US
dc.subjectcorpusen_US
dc.subjectlexiconen_US
dc.title統計式中文語言模型之初步探討zh_TW
dc.titleA First Study on Statistical Chinese Language Modelsen_US
dc.typeThesisen_US
dc.contributor.department電信工程研究所zh_TW
Appears in Collections:Thesis