標題: 在感知訊號上使用子空間分析之語音增強技術
Subspace Decomposition of Perceptual Representations for Speech Enhancement
作者: 蕭任伯
Hsiao, Jen-Po
冀泰石
Chi, Tai-Shih
電信工程研究所
關鍵字: 語音增強;感知;子空間;語音辨識;speech enhancement;perceptual;subspace;ACC;HTK
公開日期: 2009
摘要: 在早期的語音訊號處理,是從時域或頻域兩種不同維度分開處理。近年來隨著聽覺模型的建立,我們確認了人類在聽覺上是同時在時、頻兩的維度上處理,基於這樣高維度的分析,人類比之現存的任何演算法擁有更高的健全性。 本論文中,使用了馬里蘭大學NSL(Neural Systems Laboratory)實驗室所開發出來的聽覺感知模型,模擬訊號透過耳朵往上傳遞到中腦聽神經的傳遞路徑,在其時-頻域分析階段先濾出語音最顯著的區域,接著使用子空間分析進一步壓抑殘存之雜訊。最後利用聽覺模型抽取出的語音特徵參數(Auditory Spectrogram Coefficients)在隱藏式馬可夫模型套件(HTK)上做連續數字的語音辨識,由辨識率的提升來印證此演算法的強健性。
In early years, conventional speech enhancement techniques have been developed separately in time domain and in frequency domain. Recent years, with the auditory model being introduced, enhancement techniques are developed in joint spectro-temporal domains to incorporate hearing perception perspectives to enhance their robustness. In this thesis, we use the auditory model, which simulates the hearing physiology from cochlea to cortex, introduced by NSL(Neural Systems Laboratory), Maryland university. At first, the spectrograms are selected within speech regions in cortical domain. Second, we adopt the subspace algorithm to filter the noise that exists in speech regions. Finally, the Auditory Cepstrum Coefficients (ACC) is extracted for HTK recognition task. From HTK evaluations, the robustness of the proposed algorithm is proven.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079613550
http://hdl.handle.net/11536/41986
顯示於類別:畢業論文


文件中的檔案:

  1. 355001.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。