中語聲韻母之模型建立與辨識方法

Full metadata record

DC Field	Value	Language
dc.contributor.author	周嘉賢	en_US
dc.contributor.author	Chou, Chia-Shyan	en_US
dc.contributor.author	劉啟民	en_US
dc.contributor.author	Chi-Min Liu	en_US
dc.date.accessioned	2014-12-12T02:15:05Z	-
dc.date.available	2014-12-12T02:15:05Z	-
dc.date.issued	1995	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#NT840392031	en_US
dc.identifier.uri	http://hdl.handle.net/11536/60374	-
dc.description.abstract	在本篇論文中，我們將分三個部份來討論中文語音辨識上的一些問題。首先，我們要探討的問題在於中語聲韻母模型建立的過程中所需要考慮的因素，包括了聲韻母連音部份 (transition area) 的處理、隱藏性馬可夫模型 (HMM) 狀態個數對辨識效率的影響。最重要的一點，我們在這個部份利用了大量的測試資料來定義出中文聲母的混淆集合 (confusion sets)，也用已有的關於發聲器官變化的知識以及實驗的結果來佐證我們的定義。同時我們也提出了所謂的可接受的錯誤(acceptable errors)，這些因為地域上、發音姿勢上或習慣上造成的差異將是無法避免的，只能在加入語言模型 (language model) 之後獲得改善。再者，我們針對在第一部份定義出來的混淆集合中，找出三個混淆集合，包括了上顎音（ㄐ、ㄑ、ㄒ）、捲舌音（ㄓ、ㄔ、ㄕ）、齒擦音（ㄗ、ㄘ、ㄙ），這三個混淆集合有一個相似的特性，那就是在同一個集合中的所有元素的發音長度 (duration) 相差很大。利用這個特性，我們在傳統的 Viterbi辨識方法中引入了發音長度的特徵，針對特定語者系統降低了這些集合的大約47% 的錯誤，整個系統的辨識率也提昇了大約0.7%，至於在非特定語者系統中，我們得到大約1% 的好處。最後一部份，我們嘗試著加入鼻音化韻母的考量，這是為了改善跟鼻音有關的聲母的混淆情形，可惜的是我們的推論並沒有成功。論文的最後我們建立了一套非特定語者 (speakerindependent) 系統，實驗的結果顯示整體的混淆情況與特定語者系統並不會有什麼差異，長度的限制也的確降低了系統的錯誤率。 This thesis focuses on three issues of Mandarin speech recognition. First, we consider the modeling of the Mandarin speech including the basic modeling units, the coarticulation effect between INITIALs and FINALs, and the state number of a HMM. Most importantly, we use a large amount of speech from two speakers to define the confusing sets of Mandarin INITIALs. We affirm the definition with information on articulator gestures and experiment results. From the experiments, we also introduce the concept of acceptable errors. An acceptable error is an utterance error that occurs due to factors such as improper or customary articulator manners of a large amount of persons. This problem can be treated as acceptable for syllable or word recognition and can be overcome with the help from language models. Second, we focus on three sets among the previous defined confusions, including palatals（ㄐ、ㄑ、ㄒ）, retroflexions （ㄓ、ㄔ、ㄕ）, and dental sibilants（ㄗ、ㄘ、ㄙ）. The common property of these three sets is that all elements in the same set are different in duration. We develop the algorithms that include the duration information into the conventional Viterbi algorithm. The experiment results show an error reduction 45% for the three sets and 0.7% for the total errors in speaker dependent systems. The third issue is on the confusions of nasal consonant. We try to solve the problem by introducing the nasalized FINAL models. However, it seems not work due to the variance of nasalization level. Finally, all the above three issues are conducted through speaker independent experiments. The results show that the overall confusions will not change and the induction of duration will enhance the performance.	zh_TW
dc.language.iso	zh_TW	en_US
dc.subject	聲韻母	zh_TW
dc.subject	中語	zh_TW
dc.subject	發音長度	zh_TW
dc.subject	非特定語者	zh_TW
dc.subject	Initial-Final	en_US
dc.subject	Mandarin	en_US
dc.subject	Duration	en_US
dc.subject	Speaker-Independent	en_US
dc.title	中語聲韻母之模型建立與辨識方法	zh_TW
dc.title	Initial-Final Modeling and Recognition of Mandarin Speech	en_US
dc.type	Thesis	en_US
dc.contributor.department	資訊科學與工程研究所	zh_TW
Appears in Collections:	Thesis