Title: 以二維倒頻譜為基礎的噪音下多語者語音辨識
GA-based Noisy Speech Recognition using Two Dimensional Cepstrum
Authors: 胡景淵
Hwu, Jiing-Yuan
林進燈
Lin Chin-Teng
電控工程研究所
Keywords: 二維倒頻譜;噪音;類神經網路;基因遺傳演算法則;Two dimensional cepstrum;noise;neural network;Genetic Algorithm
Issue Date: 1996
Abstract: 在眾多語音參數中,二維倒頻譜它同時將語音信號局部的特性與整體的變
化包含在一個係數矩陣中。根據分析在乾淨音下,在係數矩陣較低階係數
區域的參數為重要的語音參數。因此只需一小部份的係數被用來形成參數
向量,代表該語音,每一個語音信號是用一個參數向量而不是一串參數向
量代表,所以它所需儲存參數空間較少且計算量少。但是根據分析在噪音
環境下,它的辨識率會急速下降,為了解決此一問題,本篇論文提出以二
維倒頻譜為基礎的改良式二維倒頻譜方法來提高在噪音下的辨識率。改良
式二維倒頻譜方法是利用一個高通濾波器對音框方向加以濾除噪音成份,
並且應用基因遺傳演算法則從所得的係數矩陣中找尋具有抗雜訊的係數以
提高噪音下的辨識率。在實驗部份我們利用五種噪音源及十位語者的語音
資料來辨識國語數字。最後我們由實驗結果可以看出我們的方法的確可以
提高在噪音下的辨識率。 在本篇論文的附錄A中,我們將改良式二維倒
頻譜方法與類神經網路系統合以更進一步提高在噪音環境下的辨識率,我
們由實驗結果可以看出改良式二維倒頻譜與類神經網路系統結合這種方法
的確可以更進一步提高原系統在噪音下的辨識率。我們所使用的類神經網
路為SONFIN模糊類神經網路。
There are many kinds of parameters that are used for speech
feature extraction. Two dimensional cepstrum (TDC) is one of
them. It can simultaneously represent several kinds of
information contained in the speech waveform: static and dynamic
features, as well as global and fine frequency structures. From
analysis, the coeffi-cients located at lower indexes portion of
the TDC matrix seem to be more significant than others. Hence,
to represent an utterance only some TDC coefficients will be
selected to form a feature vector instead of the sequences of
feature vectors. It has the advantages of simple computation
and less storage space. However, our experiments show that it
is quite sensitive to background noise. In order to solve this
problem, we propose the GA-based M_TDC method in this
dissertation to improve the performance of TDC under noisy
condition. In the GA-based M_TDC method, we use the temporal
filter to remove the components of noise in the feature
extraction phase andwe apply the genetic algorithms (GAs) to
find the robust speech parameters in the M_TDC matrix. From the
experiments with five noise types, we found that the GA-based
M_TDC have better recognition results than the TDC under the
noisy environments. In Appendix A of this thesis, the
combination of GA-based M_TDC with neural network was proposed
to improve the recognition rate under the noisy environment
furthermore. From the experiments with five noise types, we
found that the combi-nation of GA-based M_TDC with neural
network method have better recognition re-sults than the GA-
based M_TDC under the noisy environments. The neural network
used in our system is the Self-cOnstructing Neural Fuzzy
Inference Network (SONFIN).
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT850327049
http://hdl.handle.net/11536/61706
Appears in Collections:Thesis