使用韻律信息之中文自發性語音辨認

Full metadata record

DC Field	Value	Language
dc.contributor.author	黃仰駿	en_US
dc.contributor.author	Huang, Yang-Chun	en_US
dc.contributor.author	陳信宏	en_US
dc.contributor.author	Chen Sin-Horng	en_US
dc.date.accessioned	2014-12-12T02:43:50Z	-
dc.date.available	2014-12-12T02:43:50Z	-
dc.date.issued	2014	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#GT070160268	en_US
dc.identifier.uri	http://hdl.handle.net/11536/75676	-
dc.description.abstract	近年來朗讀式語音辨識已有相當不錯的效能，但自發性語音辨認卻因為語速較快、語法不規則、語流不流暢等原因仍舊困難，本論文探討中文自發性語音辨認，研究重點在語言模型的建立及加入韻律信息的辨認過程。在語言模型建立上，考慮語者說話猶豫時所使用的感嘆詞及無意義的慣用插語，並利用語言模型調適來解決文字語料不足及文法語流特性和朗讀語音不同的問題，以建立一套自發性語言模型；在辨認過程上，使用兩階段辨認來加入韻律信息協助辨認，首先在第一階段辨認使用傳統聲學模型及bigram語言模型產生一個word lattice，接著在第二階段辨認先擴展語言模型為factored語言模型，再加入韻律邊界停頓資訊與音節韻律狀態資訊，經過重新評分後得到一條最佳路徑，並同時解碼出相關資訊。使用中研院MCDC語料作實驗，獲得詞、字及音節的辨識率分別為58.29%、64.94%及68.89%，較傳統只使用第一階段辨認的作法絕對辨認率改善了4.43%、4.6%及3.06%。經辨認結果分析發現，對於正常語流而言，加入韻律信息能夠改善搶詞及聲調辨認錯誤；但對於不正常語流來說，改善的效能非常有限。	zh_TW
dc.description.abstract	In recent years, the Mandarin read-speech recognition technology is quite mature. However, it is still difficult for spontaneous speech recognition due to high speaking rate and the existence of disfluent speech events. This thesis discusses Mandarin spontaneous speech recognition, focusing on language model establishment and the process of prosody-assisted recognition. In the language model establishment, two particular words of particle and marker are added to the vocabulary to model the disfluency phenomena of spontaneous speech. Besides, language model adaptation is employed to solve the problem of the insufficiency of texts of spontaneous speech. In recognition, a two-stage recognition process to incorporate prosodic information is adopted. In the first stage, an acoustic model and a bigram language model is used to generate a word lattice. Then, in the second stage the word lattice is firstly extended to replace the bigram LM with a factorized LM. Then, break-related models and prosodic state-related models of a hierarchical prosodic model are sequentially added to rescore all searching paths in order to find the best recognized word sequence. Experimental results on the Academia Sinica MCDC corpus showed that word, character and base-syllable accuracy rates of 58.29%, 64.94% and 68.89% were achieved. They were better than the results of the baseline system by 4.43%, 4.6% and 3.06%, respectively. By error analysis we find that prosodic information is useful in resolving word segmentation ambiguity and tone pattern confusion for fluent speech part, while it is less effective for disfluent part.	en_US
dc.language.iso	zh_TW	en_US
dc.subject	自發性語音	zh_TW
dc.subject	語音辨認	zh_TW
dc.subject	spontaneous speech	en_US
dc.subject	speech recognition	en_US
dc.title	使用韻律信息之中文自發性語音辨認	zh_TW
dc.title	A Prosody-Assisted Mandarin Spontaneous Speech Recognition	en_US
dc.type	Thesis	en_US
dc.contributor.department	電信工程研究所	zh_TW
Appears in Collections:	Thesis

Files in This Item:

026801.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.