標題: | 通道調適在電話語音辨識上的研究 A Study on Channel Adaptation for Telephone Speech Recognition |
作者: | 吳佳錡 Jia-Chi Wu 劉啟民 Chi-Min Liu 資訊科學與工程研究所 |
關鍵字: | 電話語音;語音辨識;語者調適;通道調適;通道效應;調適;telephone speech;speech recognition;speaker adaptation;channel adaptation;channel effect;adaptation |
公開日期: | 1998 |
摘要: | 本論文把電話網路通道和語者聲道特性對語音的影響,視為一個通道效應。探討在電話語音辨識的應用中,(1)通道效應對於語音特徵以及語音辨識的影響,(2)各項調適方法對於通道調適的效果。我們分別對以下四項分類:<1>調適參數的種類、<2>參數估測的準則、<3>補償差異的領域、<4>『監督式』或『非監督式』的調適限制,進行實驗加以探討。另外,我們除了以『所有音素(phone)共用一個調適參數(phone-independent parameter)』的方式,來調適補償環境的差異之外,也探討mixture-dependent的調適,對辨識效果的改進。
在本論文的實驗中,使用以麥克風語料訓練成的語者無關(speaker-independent)模型,對『無背景噪音的麥克風語音』進行辨識,其音節辨識率為70.45%,此一語音模型一旦使用在電話語音辨識,其辨識率遽降到28.49%。通道調適之後,在所有的音素都共用一個調適參數的情況下,我們得出:以監督式MLLR方式,在語音模型上調適mean和covariance矩陣,其調適效果最好,可使辨識率提昇到49.60%;在mixture-dependent調適方式下,我們以VFS(Vector Field Smoothing)方法,可使辨識率進一步提昇至51.42%。 In this thesis, telephone channel and speaker-specific vocal tract characteristics are modeled as a single channel effect on speech. We study two issues: (1)the influence of channel effect on speech feature and the resultant speech recognition, and (2)the performance improvement of various channel adaptation methods. The methods are categorized by the types of adaptation parameters, the adaptation criterion, the compensation domain, and the supervised/unsupervised constraint. In addition to using a single adaptation parameter set for all mixtures of all phones to compensate the mismatch, we also study mixture-dependent adaptation. In the experiments of this thesis, a microphone-speech-trained speaker-independent model is used for comparison. The syllable recognition rate of the background-noise-free microphone speech corpus is 70.45%. However, the correct rate degrades to 28.49% for telephone speech recognition. For the case in which the adaptation parameter set is shared by all mixtures of all phones, we can get a correct rate of 49.60% by applying supervised MLLR adaptation to the means and the covariance matrices of the phone models. In mixture-dependent adaptation case, in which each mixture of each phone state has its own adaptation parameter set, the recognition rate can be further increased to 51.42% by applying VFS(Vector Field Smoothing)method. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#NT870392066 http://hdl.handle.net/11536/64089 |
Appears in Collections: | Thesis |