對環境因素強健之國語語音辨認法研究

標題:	對環境因素強健之國語語音辨認法研究 A Study on the Robust Techniques for Adverse Mandarin Speech Recognition
作者:	洪維廷 Wei-Tyng Hong 陳信宏 Sin-Horng Chen 電信工程研究所
關鍵字:	強健式語音辨認;雜訊補償;通道效應補償;語音/非語音切割;聲量調適;強健式訓練法;相似度補償;Robust speech recognition;Noise compensation;Channel bias compensation;Speech/non-speech segmentation;Loudness adaptation;Robust training algorithm;Likelihood compensation
公開日期:	1998
摘要:	本論文探討國語語音辨認系統對環境因素的強健方法。首先，在雜訊語音切割難題上，我們提出一個以遞迴式類神經網路為基礎，適用於國語音節特性的切割方法。此遞迴式類神經網路的功能是用來將輸入音框區分成聲母、韻母和非語音。雜訊模型就根據遞迴式類神經網路的非語音輸出以線上求取並追蹤，並將之用於PMC方法上來調適乾淨語音模型使之符合目前雜訊環境。另外，我們提出兩個根據遞迴式類神經網路輸出資訊來輔助PMC和SBR的方法。第一個方法直接將遞迴式類神經網路輸出當作額外的相似值並引入PMC辨認中，用以解決雜訊環境下粗分類混淆的情形。第二個方法則是利用遞迴式類神經網路輸出值來增進SBR中偏差值的估計準確度。實驗結果顯示此遞迴式類神經網路為基礎的切割方法，不但在雜訊環境下能發揮語音/非語音切割的功能，且能持續追蹤並估算線上雜訊模型。在非穩態雜訊環境下，仍能得到很好的切割結果。而且，此兩個利用遞迴式類神經網路輸出值的輔助方法，確實能增進國語雜訊語音辨認的表現。其次，我們提出一個對環境因素壓抑之強建式訓練法。此方法將測試辨認階段時採用的通道偏差補償和PMC補償引入傳統的區段k均值訓練法中，同時估算語音模型和環境模型。由此強建式訓練法得到的語音模型用於辨認補償上，可以得到更符合目前環境的補償模型。實驗結果顯示，使用我們提出的強健式訓練法得到的語音模型，辨認測試時套入通道補償和PMC雜訊補償，其得到的結果比直接採用乾淨語音做補償要好。最後，我們提出一個以音段式C0調適法，能在雜訊環境下追蹤目前音段之語音強度，並將其引入PMC方法中來增進補償模型的精確度。實驗結果顯示，和傳統PMC辨認法比起來，此方法有較佳的表現。 In this dissertation, several issues of adverse Mandarin speech recognition are addressed. First, the issue of noisy speech segmentation for on-line noise model estimation is discussed. A new RNN-based noisy speech segmentation method is proposed. It employs an RNN to discriminate each input frame among three broad classes of initial, final, and non-speech. Noise model is then estimated recursively from non-speech frames and used in the PMC method to adapt the clean-speech HMM models to the current noise environment. Besides, two new methods of using the broad-class classification information to assist in the PMC method and the SBR method for adverse Mandarin speech recognition are also proposed. The first one directly takes the RNN outputs as additional likelihood scores to help the PMC speech recognizer reducing its recognition errors caused by misaligning with the testing utterance. The second one uses the classification information to improve the accuracy of bias estimation of the SBR method. Experimental results showed that the RNN-based noisy speech segmentation method is very effective and capable of operating in noise environments with both constant and time-varying noise levels. The two methods of using the broad-class segmentation information to help adverse Mandarin speech recognition were all shown to be effective. Second, the robust training issue is studied. A robust environment-effect suppression training algorithm is proposed. It modifies the conventional segmental k-means training algorithm by incorporating a signal bias-compensation operation and a PMC noise-compensation operation into its iterative training procedure for obtaining a set of environment-effect suppressed HMM models. The resulting speech HMM models are expected to be more suitable to the given robust speech recognition method using the same signal bias-compensation and PMC noise-compensation operations in its recognition process. Experimental results showed that the HMM models generated by the proposed training algorithm outperformed both the clean-speech HMM models and those generated by the conventional k-means algorithm for adverse Mandarin speech recognition. Lastly, the gain modeling issue for PMC-based Mandarin speech recognition is discussed. A segment-based C0 adaptation scheme is proposed. It incorporates a new C0 model of speech signal into the PMC method to improve the gain matching between the clean-speech HMM models and the current testing speech signal. Experimental results showed that it outperformed the conventional PMC method using a constant gain factor for the entire testing utterance.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#NT870435108 http://hdl.handle.net/11536/64568
Appears in Collections:	Thesis