標題: 使用時頻變化調變於強健語音情緒辨識
Spectro-Temporal Modulations for Robust Speech Emotion Recognition
作者: 葉藍霙
Yeh, Lan-Ying
冀泰石
Chi, Tai-Shih
電信工程研究所
關鍵字: 情緒;語音;辨識;分類;特徵參數;emotion;speech;recognition;classification;feature
公開日期: 2009
摘要: 語音情緒的分類是近年來新興的研究題目,目前大多數的研究都著重在乾淨語音中進行分類。在本論文中,我們利用聽覺感知模型提出一種新的時頻變化參數 (joint Rate-Scale features, RS features),藉由此參數來處理有雜訊情況下的語音情緒辨識的問題。我們將柏林情緒語料庫(Berlin Emotional Database)以及愛寶情緒語料庫(FAU AIBO Database)加入不同訊雜比的白雜訊(white noise)及人聲雜訊(babble noise),並且以乾淨語料訓練、有雜訊語料測試的方式評估效能,以模擬真實應用中未能事先預知雜訊程度的狀況。我們也進一步使用循序前進浮動搜尋(Sequential Forward Floating Selection, SFFS)來探討所提出特徵參數的冗餘性,以進一步降低所需參數的維度。實驗於柏林情緒語料庫結果顯示,與傳統音韻參數結合梅爾倒頻率係數參數相比,尤其在低訊雜比的情況下,使用時頻變化參數將有更高的辨識率。實驗結果顯示對於愛寶情緒語料庫,在訊雜比很高的情況下,傳統參數和時頻變化參數皆有過度訓練的情況,需要進一步降低維度及改進參數。
Speech emotion recognition is mostly considered in clean speech. In this thesis, joint Rate-Scale features (RS features) are extracted from an auditory model and are applied to detect the emotion status of noisy speech. The noisy speech is derived from the Berlin Emotional Speech database and the FAU AIBO database with added white and babble noises under various SNR levels. The clean train/noisy test scenario is investigated to simulate conditions with unknown noisy sources. The sequential forward floating selection (SFFS) method is adopted to demonstrate the redundancy of RS features and further dimensionality reduction is conducted. Compared with conventional MFCCs plus prosodic features, RS features show higher recognition rates especially in low SNR conditions on Berlin database. However, both conventional and RS features are over-trained in low SNR conditions on AIBO database. Feature selection or reduction techniques are further required.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079713521
http://hdl.handle.net/11536/44539
顯示於類別:畢業論文


文件中的檔案:

  1. 352101.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。