以韻律訊息輔助中文自發性語音辨認之改進

標題:	以韻律訊息輔助中文自發性語音辨認之改進 An Improvement on Prosody-Assisted Mandarin Spontaneous Speech Recognition
作者:	吳孟謙陳信宏 Wu, Meng-chian Chen, Sin-Horng 電信工程研究所
關鍵字:	語音辨認;中文自發性語音;韻律模型;語言模型;speech recognition;mandarin spontaneous speech;prosody model;language model
公開日期:	2015
摘要:	本研究使用韻律資訊來輔助中文自發性語音辨認，研究重點在於語言模型的建立與透過已訓練好的韻律模型來加入韻律資訊的辨認過程；在這兩階段自動語音辨認系統前，針對自發性語音一些特別現象進行適當處理，包括感嘆詞、語助詞與副語言現象等，並使用語言模型調適來解決自發性文字語料不足與文法語流特性上自發性文字與一般文章的差異；首先系統中的第一階段辨認使用HMM辨認器，由聲學模型與trigram語言模型產生word lattice，而系統的第二階段依序加入的factored語言模型、韻律邊界停頓資訊與音節韻律狀態資訊，經過重新評分後得到一條最佳路徑，同時並解碼出相關資訊，包括詞性、詞後所接的標點符號以及用來建構測試語料之階層式韻律架構的兩種韻律標記。本研究實驗語料為中研院MCDC語料庫，實驗結果的音節、字及詞錯誤率由僅有聲學模型與trigram語言模型時的35.6%、40.2%及45.1%，下降到加入韻律資訊後的32.4%、36.5%及41.8%；經由實驗結果分析，可以發現本系統能成功修正聲調及搶詞的辨認錯誤。 A prosody-assisted ASR approach for spontaneous Mandarin speech is proposed. A well-trained hierarchical prosodic model (HPM) is used in two-stage speech recognition. Before recognition the special terms in spontaneous speech, such as particle, marker and paralinguistic are processed first, and then the maximum a posteriori adaptation is employed to generate an adaptation LM. In the first-stage recognition, a word lattice is generated by the HMM method using a tri-phone AM and a bigram LM. Then, the lattice is extended by replacing the LM to a trigram model. A rescoring process is applied in the second-stage recognition to sequentially add factor POS and PM LMs, and the HPM. The method is evaluated on the MCDC database comprising 8 dialogues of 16 speakers with length of 9.09 hours. Error rates of syllable/character/word were reduced from 35.6/40.2/45.1% by the baseline trigram HMM method to 32.4/36.5/41.8% by the proposed method. By error analysis, we find that many tone recognition errors and word segmentation errors were corrected.
URI:	http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070260261 http://hdl.handle.net/11536/139773
Appears in Collections:	Thesis