标题: | 以韵律讯息辅助中文自发性语音辨认之改进 An Improvement on Prosody-Assisted Mandarin Spontaneous Speech Recognition |
作者: | 吴孟谦 陈信宏 Wu, Meng-chian Chen, Sin-Horng 电信工程研究所 |
关键字: | 语音辨认;中文自发性语音;韵律模型;语言模型;speech recognition;mandarin spontaneous speech;prosody model;language model |
公开日期: | 2015 |
摘要: | 本研究使用韵律资讯来辅助中文自发性语音辨认,研究重点在于语言模型的建立与透过已训练好的韵律模型来加入韵律资讯的辨认过程;在这两阶段自动语音辨认系统前,针对自发性语音一些特别现象进行适当处理,包括感叹词、语助词与副语言现象等,并使用语言模型调适来解决自发性文字语料不足与文法语流特性上自发性文字与一般文章的差异;首先系统中的第一阶段辨认使用HMM辨认器,由声学模型与trigram语言模型产生word lattice,而系统的第二阶段依序加入的factored语言模型、韵律边界停顿资讯与音节韵律状态资讯,经过重新评分后得到一条最佳路径,同时并解码出相关资讯,包括词性、词后所接的标点符号以及用来建构测试语料之阶层式韵律架构的两种韵律标记。本研究实验语料为中研院MCDC语料库,实验结果的音节、字及词错误率由仅有声学模型与trigram语言模型时的35.6%、40.2%及45.1%,下降到加入韵律资讯后的32.4%、36.5%及41.8%;经由实验结果分析,可以发现本系统能成功修正声调及抢词的辨认错误。 A prosody-assisted ASR approach for spontaneous Mandarin speech is proposed. A well-trained hierarchical prosodic model (HPM) is used in two-stage speech recognition. Before recognition the special terms in spontaneous speech, such as particle, marker and paralinguistic are processed first, and then the maximum a posteriori adaptation is employed to generate an adaptation LM. In the first-stage recognition, a word lattice is generated by the HMM method using a tri-phone AM and a bigram LM. Then, the lattice is extended by replacing the LM to a trigram model. A rescoring process is applied in the second-stage recognition to sequentially add factor POS and PM LMs, and the HPM. The method is evaluated on the MCDC database comprising 8 dialogues of 16 speakers with length of 9.09 hours. Error rates of syllable/character/word were reduced from 35.6/40.2/45.1% by the baseline trigram HMM method to 32.4/36.5/41.8% by the proposed method. By error analysis, we find that many tone recognition errors and word segmentation errors were corrected. |
URI: | http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070260261 http://hdl.handle.net/11536/139773 |
显示于类别: | Thesis |