中文詞群雙連文語言模型之初步研究

Full metadata record

DC Field	Value	Language
dc.contributor.author	楊育菁	en_US
dc.contributor.author	Yang, Yu-Ching	en_US
dc.contributor.author	陳信宏	en_US
dc.contributor.author	Chen, Sin-Horng	en_US
dc.date.accessioned	2014-12-12T02:19:30Z	-
dc.date.available	2014-12-12T02:19:30Z	-
dc.date.issued	1997	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#NT863435024	en_US
dc.identifier.uri	http://hdl.handle.net/11536/63468	-
dc.description.abstract	本論文的研究重點在於中文語音辨認系統中語言解碼的語言模型。我們以實作系統的觀點，分別對詞群雙連文語言模型參數的訓練及語言解碼系統的運作進行研究。在本論文中，我們首先針對語言模型的建立，在兼顧文法結構下，加入中文的特殊構詞特性，利用詞與前後相連的語法變化設計一套詞群雙連文模型。另外建立了初步的語步的語言解碼系統，使用111246詞的詞庫及約900萬詞的語料庫，建立語言模型，再結合聲學解碼系統，針對一套平衡語料句加上節錄報紙文章的長短句的語音資料庫，經過傳統的HMM辦認法對測試語音作辨認，得到音節辨認率為81%的基本音節串，產生格狀音節組，最後進入語言解碼系統做最後的辨認。得到的基本辦認率為57.69%並且，在加入專有名詞辭庫、數詞構詞規則、詞類考量後，辦認率可達64.40%。	zh_TW
dc.description.abstract	In this thesis, a word-class bigram of Chinese is discussed for speech-to-text conversion . An algorithm is first proposed to partition all words of a large lexicon containing 111246 word entries into several hundreds of word classes. It considers many linguists features of word inchuding part-of-speech, prefix, suffix, and length to make words with same characteristics being clustered together. Then a word-class bigram model is constructed using a text-corpus containing 9 million wors.Performance of the proposed word-class bigram model was examined by simulation to combine it with a HMM-based base-syllable recognier for converting speech into text. The base-syllable accuracy rate of the HMM recognizer was 81%. A character accuracy rate of 57.7% was achieved for the baseline system. By further including all proper nouns and some information rules for compound words, the accuracy rate raised to 64.4%.	en_US
dc.language.iso	zh_TW	en_US
dc.subject	中文詞群	zh_TW
dc.subject	雙連文	zh_TW
dc.title	中文詞群雙連文語言模型之初步研究	zh_TW
dc.title	A Firest Study on Mandarin Word-class Bigram Language Model	en_US
dc.type	Thesis	en_US
dc.contributor.department	電信工程研究所	zh_TW
Appears in Collections:	Thesis