標題: 藉由感知特徵對語音品質做客觀的評量
Objective Assessment of Speech Quality by Perceptual Features
作者: 顏廷宇
Ting-Yu Yen
冀泰石
Tai-Shih Chi
電信工程研究所
關鍵字: 語音品質;侵入式;非侵入式;特徵參數;理解性;自然性;基頻失真;joint spectro-temporal;speech quality;MOS;intrusive;non-intrusive;intelligibility;naturalness;pitch distortion
公開日期: 2007
摘要: 在本論文中,我們使用一個同時考慮時間和頻率上變化的人耳聽覺模型來對語音品質做客觀的評量。我們研究目的是希望可以準確的預測聽者對於語音品質主觀的平均意見分數。客觀的評量主要分為兩種方法:一種是侵入式,另一種是非侵入式。首先,我們會在兩個聽覺感知階段觀察和分析乾淨的語音、加上背景雜訊的語音、以及經過各種不同語音壓縮標準的語音,第一個階段是人耳到中腦的頻譜估計,第二個階段是中腦到大腦皮質聽覺區對時域和頻域同時做分析。其次,我們將從這兩個階段,擷取出在人耳感知上可能影響聽者判斷語音品質好壞的特徵當作參數來對語音品質做客觀評量,這三個特徵分別是-理解性、自然性、基頻失真。最後,我們使用複迴歸分析的方法,將三個特徵參數對語音品質影響的關係做結合,希望藉由這三個基本的特徵參數讓我們能對語音品質的好壞做快速並可靠的評量。
In this study, a joint spectro-temporal auditory model was utilized to assess speech quality objectively. In this model, the first stage is to mimic early cochlear functions of the spectrum estimation and the second stage is to mimic cortical functions of the multi-dimensional spectrum analysis. The goal of this study is to predict subjective mean opinion score (MOS). Objective speech quality assessment can be done by two methods:intrusive and non-intrusive. In this study, firstly, we observe and analyze patterns of the clean speech, the noisy speech with different background noise, and the degraded speech through different codecs at two auditory stages. Secondly, we will derive an objective estimate of the MOS from data-driven perceptual parameters which are believed to reflect people’s judgment on speech quality. Four perceptual parameters considered are intelligibility, naturalness, and pitch distortion. Finally, we use multiple regression analysis to combine the relationship between speech quality and these perceptual parameters, and then obtain our predicted MOS. We then demonstrate the MOS can be characterized quickly and reliably by these three perceptual features.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009513537
http://hdl.handle.net/11536/38379
顯示於類別:畢業論文


文件中的檔案:

  1. 353701.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。