語音品質之客觀估測與分解演算法

標題:	語音品質之客觀估測與分解演算法 Objective Assessment and Decomposition of Speech Quality
作者:	黃冠郎冀泰石 Huang, Kuan-Lang Chi, Tai-Shih 電信工程研究所
關鍵字:	語音品質;客觀估測;語音品質分解;可理解度;清晰度;自然性;連續性;噪聲干擾;speech quality;objective assessment;speech quality decomposition;intelligibility;clarity;naturalness;continuity;noise intrusiveness
公開日期:	2017
摘要:	在這項研究中，我們假設整體語音品質是一個多重維度感知參數，它可以被進一步分解出五個抽象感知參數，包括語音可理解度、清晰度、自然性、連續性和噪聲干擾。我們設計並進行主觀聽覺實驗，以驗證我們的初始假設。並在主觀實驗數據分析中，推導出用於準確預測語音品質(SIG)和整體語音品質(OVL)估計值的主觀權重。這主觀權重隨後將用於我們開發的客觀侵入式整體語音品質估測模型。為了構建出客觀估測模型，我們透過一個能同時解析時、頻域的分析式聽覺模型來分析及捕捉語音信號的失真，並量化嵌入在聽覺頻譜圖中不同時、頻域範圍調變的頻譜能量，進而抽取出這些抽象感知參數的失真量。藉由這五個感知參數失真度量與主觀權重線性組合以估計主觀語音品質(MOS)。初始模型效能表現是可以接受的，我們更進一步將更為複雜的演算法應用於抽象感知參數估計模型和判斷模型中，期望發掘出從輸入的時、頻域調變特徵參數到抽象感知參數，甚至整體語音品質的非線性映射。透過類神經網絡(NN)的分析，我們提出的整體語音品質估測模型表現出令人滿意的結果。另外，與國際電信聯盟(ITU-T)所發佈的侵入式客觀語音品質的標準模型(PESQ)的效能比較，展示出我們所提出模型的潛力。因此我們在抽象感知參數估計模型和判斷模型中，將採納類神經網絡來模擬並估算出，在特定時、頻域調變特徵參數區域上的非線性組合，來預測各個抽象感知參數和整體語音品質。 In this study, we hypothesize that integral speech quality is a multi-dimensional percept, which can be decomposed into five abstract percepts such as speech intelligibility, clarity, naturalness, continuity and noise intrusiveness. A subjective listening experiment was designed and conducted to verify our initial assumption. Subjective weights derived to accurately predict SIG and OVL estimates were utilized in our developed objective instrumental intrusive quality model afterwards. For constructing an objective quality model, a spectro-temporal auditory model was utilized to capture degradations of speech signals and to measure deteriorations of these abstract percepts from different ranges of spectro-temporal energy modulations embedded in spectrograms. Deterioration measures of these four percepts and of the temporal energy profile designed to indicate noise intrusiveness were linearly combined with subjective weights to estimate the subjective MOS. Although performance is acceptable with pre-defined deterioration measures combined with subjective weights to predict MOSLQO, more sophisticated approaches applied in the abstract percept estimator and in the judgment model were explored. Non-linear mappings from input scale-rate features to either abstract percepts or eventually the integral speech quality were expected. Performance comparisons to that of PESQ demonstrate the potential of our quality assessment model. Our proposed quality model demonstrates satisfactory results in assessing abstract percepts and integral quality with non-linear combination of scale-rate features in specific regions, with an abstract percept estimator and a judgment model incorporated both with NN.
URI:	http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT079613522 http://hdl.handle.net/11536/142645
Appears in Collections:	Thesis