Title: | 基於聽覺感知模型的語音增強技術 Speech Enhancement Method based on Auditory Perceptual Model |
Authors: | 洪詠能 Yung-Neng Hung 冀泰石 Tai-Shih Chi 電信工程研究所 |
Keywords: | 語音增強;聽覺感知;語音辨識;Speech Enhancment;Auditory Perception;Speech Recognition |
Issue Date: | 2007 |
Abstract: | 在早期的語音信號處理問題中,研究人員只從時域或頻域上觀察信號的特性並予以處理,此種方式可以在高訊噪比的環境下,獲得極佳的效能。但隨著訊噪比的降低,效能也急速的下降,而於實際應用中造成很大的影響。相較之下,人類聽覺處理語音訊號,是在時域和頻域上同時執行的,並不會隨著環境的不同以及訊噪比的改變,而受到太大的影響,即有著較佳的健全性。這也是在近年的語音處理研究中,皆會加入人類對於聲音感知特性的原因。本論文中,我們使用一已被提出的,模擬聲音沿著人耳到大腦傳輸路徑的聽覺感知模型,以此模型中頻譜估計的初期階段來抽取語音特徵參數,並透過隱藏式馬可夫模型套件(HTK)來訓練出連續數字串的語音辨認系統並測試此辨識器的效能。其後,以感知模型的時域-頻域分析階段來壓抑噪音,達到語音增強的目的,並由辨識率的提昇來證實語音增強的效果。 In earlier speech signal processing work, signal was observed and processed only on time or frequency domain. Such methods have excellent performance under high SNR environments, but poor performance with low SNR such that practical applications are limited. As a contrast, human hearing deals with sounds on both time and frequency domains such that it is not affected easily by different environments and SNR conditions, i.e., more robust. That is the main reason why perceptual properties are considered in recent speech researches. In this thesis, we investigate an auditory perceptual model in the speech enhancement application. The auditory model simulates the human hearing signal processing principles along the auditory pathway from the ear to the brain. First, the speech perceptual features are extracted from the spectrum estimation stage. A perceptual-feature based consecutive digit string recognizer is trained and evaluated by using Hidden Markov Model Toolkit (HTK). Secondly, we enhance speech signals by suppressing noise from the brain spectral-temporal analysis stage, and confirm its validity by the improvement of the recognition rate. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT009513541 http://hdl.handle.net/11536/38384 |
Appears in Collections: | Thesis |
Files in This Item:
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.