標題: 使用時頻變化調變之稀疏表示於 強健語音情緒辨認
Robust Speech Emotion recognition via sparse representation of Spectro-Temporal Modulations
作者: 梁詠閎
Liang, Young-Hong
冀泰石
Chi, Tai-shih
電信工程研究所
關鍵字: 情緒;時頻變化調變;稀疏表示;emotion recognition;pectro-temporal modulations;sparse representation
公開日期: 2013
摘要: 語音情緒辨認在這幾年來一直是個熱門的研究題目,目前大多數 的研究主要都是著重在分類乾淨語料的情緒類別,在本論文中,我 們使用實驗室的感知模型提取出的兩種時頻變化參數 (ACC384 以及 RS96),藉由此參數來對參雜了雜訊的語音做情緒上的辨認。我們將 柏林情緒語料庫 (Berlin Emotional Speech Database) 以及愛寶情緒語料庫 (Aibo Emotional Corpus) 加入不同訊雜比 (Signal to Noise Ratio) 的白雜訊 (white noise) 以及人聲雜訊 (babble noise), 並比較我們的時頻變化參數與其他知名的參數 (inter384) 在不同訊雜比下的優劣。在實驗中,我們使用了兩種不同的分類法,分別為稀疏表示分類法 (Sparse Representation Classification) 與向量支持機 (Support Vector Machine) 進行分析,而實驗的結果顯示,我們實驗室的時頻變化參數再受到雜訊的干擾時仍然有較好的辨認率,也發現使用稀疏分類表示法的實驗解果較優於向量支持機。在本論文中,我們也對稀疏分類表示法做了一些討論。
Speech emotion recognition is a popular research topic in last decade. However, most researches are always focus on clean speech, in this thesis, we use two kinds of feature sets which are extracted from our auditory model and applied to recognize the emotion categories of both clean and noisy speech. And the noisy utterance is derived from the Berlin Emotional Speech Database and the Aibo emotional Corpus with additive babble noise and additive white noise under different signal to noise ratio (SNR) value. Comparing with the famous feature set which is proposed in the INTERSPEECH 2009 Emotion Challenge and use two kinds of classifiers, which are sparse representation classification (SRC) and support vector machine(SVM). The robustness of our spectro-temporal modulation feature sets are better than the feature set proposed in INTERSPEECH 2009 Emotion Challenge, and the performance of SRC is better than SVM. Some discuss on SRC are given in this thesis.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT070060260
http://hdl.handle.net/11536/73550
Appears in Collections:Thesis


Files in This Item:

  1. 026001.pdf
  2. 026002.pdf
  3. 026003.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.