標題: 感知訊號非侵入式客觀語音品質測量
Model-based Non-Intrusive Objective Speech Quality Measurement Using Perceptual Parameters
作者: 余尚儒
Shang-Ju Yu
冀泰石
Tai-Shih Chi
電信工程研究所
關鍵字: 語音品質;客觀語音品質;感知訊號;非侵入式語音測量;模型式語音品質測量;聲音變化偵測器;objective speech quality;model-based speech quality;non-intrusive speech quality measurement;perceptual parameters for quality assessing;voice activity detector
公開日期: 2007
摘要: 語音品質的評測一直為通訊系統的重要議題。由於早期的主觀語音品質測量需要耗費較多的人力與金錢,而有了客觀性(Objective Base)語音品質測量方法的需求。又實際的語音品質測量中,常缺乏原始語音訊號,因此,無需原始訊號即可判斷語音品質的非侵入式(Non-intrusive)語音品質測量正符合此需求。其優點除了不需耗費太多人力外,更可做即時且有效率的品質評斷。本文主要是嘗試用人耳模型模擬人類聽覺系統,從接收到的訊號中抽取聽覺參數,而做一客觀的非侵入式語音品質測量。 我們將利用聽覺參數特性提出一聲音變化偵測器(voice activity detector,VAD)演算法,以此演算法將語音分類成:母音(voice),子音(unvoice),及無聲(inactive)部分。接著,在經過人耳聽覺模型(Auditory Model)後的頻譜中求取倒頻譜參數(Cepstral coefficients),在此我們稱為聽覺倒頻譜參數(Auditory Cepstral coefficients,ACC)。為了能夠無需參考訊號即可做語音品質判斷,我們以高斯混合模型(Gaussian Mixture Model)將無雜訊語音利用聽覺倒頻譜參數訓練出一乾淨語音之模型。 在語音品質評測的部分,將經過不同通道及不同編碼技術的語音以聲音變化偵測器做分類,並求其聽覺倒頻譜參數。接著將此參數與乾淨語音之高斯混合模型做比對,由比對高斯分佈後的對數機率分佈函數(log-pdf)做為與理想乾淨語音之差距,並做適當的回歸函數(regression function)將此種差距量化為語音評分,最後將求出的語音評分與實際由實驗者測出的評分做相關性的比對,以驗證此方法。
Assessing speech quality is an important issue in modern communication systems. The subjective speech quality measurements in early days involve much human resource and money such that the need of an objective speech quality measurement emerges. In addition, original speech signals are not always available when measuring speech quality in practical world. Many non-intrusive methods, which do not require original signals in judging the speech quality, are newly developed to meet this criterion. Such non-intrusive methods do not cost much human resource while being used for the real-time quality test with great efficiency. The main theme of this work is to extract perceptual parameters from an auditory model, which mimics the signal processing principles in the human auditory pathway, and build an objective speech quality measurement without reference signal. First, we propose a voice activity detector (VAD) algorithm by using the perceptual parameters from the auditory model. This VAD algorithm detects three basic categories in speech signals: voice, unvoice and inactive. Next, we acquire the auditory cepstral coefficients (ACC) to be the non-intrusive quality judging parameter. A Gaussian Mixture Model (GMM) is used to build the statistical template of the clean signal to represent the absent reference signal. When measuring the quality of speech from different channels and codecs, the VAD is first utilized to distinguish distorted speech into three categories. Then, ACC parameters are extracted and compared to the statistical templates of the clean speech. The log-probability density function (log-pdf) is used to represent the distance between clean and degraded speech signals. Finally, a regression function is used to map the overall distances from those three categories to the subjective quality scores. The correlation between our objective measures and the subjective measures are examined to validate our approach.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009513556
http://hdl.handle.net/11536/38399
Appears in Collections:Thesis


Files in This Item:

  1. 355601.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.