通道調適在電話語音辨識上的研究

Full metadata record

DC Field	Value	Language
dc.contributor.author	吳佳錡	en_US
dc.contributor.author	Jia-Chi Wu	en_US
dc.contributor.author	劉啟民	en_US
dc.contributor.author	Chi-Min Liu	en_US
dc.date.accessioned	2014-12-12T02:20:20Z	-
dc.date.available	2014-12-12T02:20:20Z	-
dc.date.issued	1998	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#NT870392066	en_US
dc.identifier.uri	http://hdl.handle.net/11536/64089	-
dc.description.abstract	本論文把電話網路通道和語者聲道特性對語音的影響，視為一個通道效應。探討在電話語音辨識的應用中，（1）通道效應對於語音特徵以及語音辨識的影響，（2）各項調適方法對於通道調適的效果。我們分別對以下四項分類：<1>調適參數的種類、<2>參數估測的準則、<3>補償差異的領域、<4>『監督式』或『非監督式』的調適限制，進行實驗加以探討。另外，我們除了以『所有音素（phone）共用一個調適參數（phone-independent parameter）』的方式，來調適補償環境的差異之外，也探討mixture-dependent的調適，對辨識效果的改進。在本論文的實驗中，使用以麥克風語料訓練成的語者無關（speaker-independent）模型，對『無背景噪音的麥克風語音』進行辨識，其音節辨識率為70.45%，此一語音模型一旦使用在電話語音辨識，其辨識率遽降到28.49%。通道調適之後，在所有的音素都共用一個調適參數的情況下，我們得出：以監督式MLLR方式，在語音模型上調適mean和covariance矩陣，其調適效果最好，可使辨識率提昇到49.60%；在mixture-dependent調適方式下，我們以VFS（Vector Field Smoothing）方法，可使辨識率進一步提昇至51.42%。	zh_TW
dc.description.abstract	In this thesis, telephone channel and speaker-specific vocal tract characteristics are modeled as a single channel effect on speech. We study two issues: （1）the influence of channel effect on speech feature and the resultant speech recognition, and （2）the performance improvement of various channel adaptation methods. The methods are categorized by the types of adaptation parameters, the adaptation criterion, the compensation domain, and the supervised/unsupervised constraint. In addition to using a single adaptation parameter set for all mixtures of all phones to compensate the mismatch, we also study mixture-dependent adaptation. In the experiments of this thesis, a microphone-speech-trained speaker-independent model is used for comparison. The syllable recognition rate of the background-noise-free microphone speech corpus is 70.45%. However, the correct rate degrades to 28.49% for telephone speech recognition. For the case in which the adaptation parameter set is shared by all mixtures of all phones, we can get a correct rate of 49.60% by applying supervised MLLR adaptation to the means and the covariance matrices of the phone models. In mixture-dependent adaptation case, in which each mixture of each phone state has its own adaptation parameter set, the recognition rate can be further increased to 51.42% by applying VFS（Vector Field Smoothing）method.	en_US
dc.language.iso	zh_TW	en_US
dc.subject	電話語音	zh_TW
dc.subject	語音辨識	zh_TW
dc.subject	語者調適	zh_TW
dc.subject	通道調適	zh_TW
dc.subject	通道效應	zh_TW
dc.subject	調適	zh_TW
dc.subject	telephone speech	en_US
dc.subject	speech recognition	en_US
dc.subject	speaker adaptation	en_US
dc.subject	channel adaptation	en_US
dc.subject	channel effect	en_US
dc.subject	adaptation	en_US
dc.title	通道調適在電話語音辨識上的研究	zh_TW
dc.title	A Study on Channel Adaptation for Telephone Speech Recognition	en_US
dc.type	Thesis	en_US
dc.contributor.department	資訊科學與工程研究所	zh_TW
Appears in Collections:	Thesis