完整後設資料紀錄
DC 欄位語言
dc.contributor.author楊育菁en_US
dc.contributor.authorYang, Yu-Chingen_US
dc.contributor.author陳信宏en_US
dc.contributor.authorChen, Sin-Horngen_US
dc.date.accessioned2014-12-12T02:19:30Z-
dc.date.available2014-12-12T02:19:30Z-
dc.date.issued1997en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#NT863435024en_US
dc.identifier.urihttp://hdl.handle.net/11536/63468-
dc.description.abstract本論文的研究重點在於中文語音辨認系統中語言解碼的語言模型。我們以實作系統的觀點,分別對詞群雙連文語言模型參數的訓練及語言解碼系統的運作進行研究。在本論文中,我們首先針對語言模型的建立,在兼顧文法結構下,加入中文的特殊構詞特性,利用詞與前後相連的語法變化設計一套詞群雙連文模型。另外建立了初步的語步的語言解碼系統,使用111246詞的詞庫及約900萬詞的語料庫,建立語言模型,再結合聲學解碼系統,針對一套平衡語料句加上節錄報紙文章的長短句的語音資料庫,經過傳統的HMM辦認法對測試語音作辨認,得到音節辨認率為81%的基本音節串,產生格狀音節組,最後進入語言解碼系統做最後的辨認。得到的基本辦認率為57.69%並且,在加入專有名詞辭庫、數詞構詞規則、詞類考量後,辦認率可達64.40%。zh_TW
dc.description.abstractIn this thesis, a word-class bigram of Chinese is discussed for speech-to-text conversion . An algorithm is first proposed to partition all words of a large lexicon containing 111246 word entries into several hundreds of word classes. It considers many linguists features of word inchuding part-of-speech, prefix, suffix, and length to make words with same characteristics being clustered together. Then a word-class bigram model is constructed using a text-corpus containing 9 million wors.Performance of the proposed word-class bigram model was examined by simulation to combine it with a HMM-based base-syllable recognier for converting speech into text. The base-syllable accuracy rate of the HMM recognizer was 81%. A character accuracy rate of 57.7% was achieved for the baseline system. By further including all proper nouns and some information rules for compound words, the accuracy rate raised to 64.4%.en_US
dc.language.isozh_TWen_US
dc.subject中文詞群zh_TW
dc.subject雙連文zh_TW
dc.title中文詞群雙連文語言模型之初步研究zh_TW
dc.titleA Firest Study on Mandarin Word-class Bigram Language Modelen_US
dc.typeThesisen_US
dc.contributor.department電信工程研究所zh_TW
顯示於類別:畢業論文