標題: | 針對非特定語者語音辨識使用不同前處理技術之比較 A Comparison of Different Front-End Techniques for Speaker-Independent Speech Recognition |
作者: | 蕭依娜 陳永平 電控工程研究所 |
關鍵字: | 語音;特徵粹取;非特定語者;Speech;Feature Extraction;Speaker-Independent |
公開日期: | 2003 |
摘要: | 本論文針對非特定語者的系統,使用不同特徵粹取技術,透過以單音素為基礎之非特定語者的語音辨識系統以及以字元為基礎之非特定語者語音辨識系統的表現優劣來做為比較的依據。這些特徵粹取技術可以被分為以「語音產生方式」為主以及以「語音感知」為主兩類。第一類包含了線性預估編碼(LPC)、由線性預估編碼所衍生的倒頻譜係數(LPC-derived Cepstrum)以及反射係數(RC)。第二類則包含了梅爾倒頻譜係數(MFCC)以及感知線性預估(PLP)分析。由架構於非特定語者的實驗結果得知,由語音感知為主的第二類的辨識率較高於由語音產生方式為主的第一類,其中,梅爾倒頻譜係數 (MFCC) 在以單音為基礎下,辨識率為78.3% ,以字元為基礎下,辨識率為98.5%;感知線性預估 (PLP) 係數在以單音為基礎下,辨識率為78.9% ,以字元為基礎下,辨識率為 98.5%。 Several parametric representations of the speech signal are compared with regard to monophone-based recognition performance and syllable-based recognition performance of speaker-independent speech recognition system. The parametric representation, namely the feature extraction techniques, evaluated in this thesis can be divided into two groups: based on the speech production and based on the speech perception. The first group includes the Linear Predictive Coding (LPC), LPC-derived Cepstrum (LPCC) and Reflection coefficients (RC). The second group comprises the Mel-frequency Cepstral Coefficients (MFCC) and Perceptual Linear Predictive (PLP) analysis. From the experimental results, the speech perception group, including MFCC (78.3% for monophone-based and 98.5% for syllable-based) and PLP (78.9% for monophone-based and 98.5% for syllable-based), are superior to the features based on the speech production, including LPC, LPCC and RC, in the speaker-independent recognition experiments. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT009112526 http://hdl.handle.net/11536/44791 |
Appears in Collections: | Thesis |
Files in This Item:
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.