語音品質之客觀估測與分解演算法

Full metadata record

DC Field	Value	Language
dc.contributor.author	黃冠郎	zh_TW
dc.contributor.author	冀泰石	zh_TW
dc.contributor.author	Huang, Kuan-Lang	en_US
dc.contributor.author	Chi, Tai-Shih	en_US
dc.date.accessioned	2018-01-24T07:42:32Z	-
dc.date.available	2018-01-24T07:42:32Z	-
dc.date.issued	2017	en_US
dc.identifier.uri	http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT079613522	en_US
dc.identifier.uri	http://hdl.handle.net/11536/142645	-
dc.description.abstract	在這項研究中，我們假設整體語音品質是一個多重維度感知參數，它可以被進一步分解出五個抽象感知參數，包括語音可理解度、清晰度、自然性、連續性和噪聲干擾。我們設計並進行主觀聽覺實驗，以驗證我們的初始假設。並在主觀實驗數據分析中，推導出用於準確預測語音品質(SIG)和整體語音品質(OVL)估計值的主觀權重。這主觀權重隨後將用於我們開發的客觀侵入式整體語音品質估測模型。為了構建出客觀估測模型，我們透過一個能同時解析時、頻域的分析式聽覺模型來分析及捕捉語音信號的失真，並量化嵌入在聽覺頻譜圖中不同時、頻域範圍調變的頻譜能量，進而抽取出這些抽象感知參數的失真量。藉由這五個感知參數失真度量與主觀權重線性組合以估計主觀語音品質(MOS)。初始模型效能表現是可以接受的，我們更進一步將更為複雜的演算法應用於抽象感知參數估計模型和判斷模型中，期望發掘出從輸入的時、頻域調變特徵參數到抽象感知參數，甚至整體語音品質的非線性映射。透過類神經網絡(NN)的分析，我們提出的整體語音品質估測模型表現出令人滿意的結果。另外，與國際電信聯盟(ITU-T)所發佈的侵入式客觀語音品質的標準模型(PESQ)的效能比較，展示出我們所提出模型的潛力。因此我們在抽象感知參數估計模型和判斷模型中，將採納類神經網絡來模擬並估算出，在特定時、頻域調變特徵參數區域上的非線性組合，來預測各個抽象感知參數和整體語音品質。	zh_TW
dc.description.abstract	In this study, we hypothesize that integral speech quality is a multi-dimensional percept, which can be decomposed into five abstract percepts such as speech intelligibility, clarity, naturalness, continuity and noise intrusiveness. A subjective listening experiment was designed and conducted to verify our initial assumption. Subjective weights derived to accurately predict SIG and OVL estimates were utilized in our developed objective instrumental intrusive quality model afterwards. For constructing an objective quality model, a spectro-temporal auditory model was utilized to capture degradations of speech signals and to measure deteriorations of these abstract percepts from different ranges of spectro-temporal energy modulations embedded in spectrograms. Deterioration measures of these four percepts and of the temporal energy profile designed to indicate noise intrusiveness were linearly combined with subjective weights to estimate the subjective MOS. Although performance is acceptable with pre-defined deterioration measures combined with subjective weights to predict MOSLQO, more sophisticated approaches applied in the abstract percept estimator and in the judgment model were explored. Non-linear mappings from input scale-rate features to either abstract percepts or eventually the integral speech quality were expected. Performance comparisons to that of PESQ demonstrate the potential of our quality assessment model. Our proposed quality model demonstrates satisfactory results in assessing abstract percepts and integral quality with non-linear combination of scale-rate features in specific regions, with an abstract percept estimator and a judgment model incorporated both with NN.	en_US
dc.language.iso	en_US	en_US
dc.subject	語音品質	zh_TW
dc.subject	客觀估測	zh_TW
dc.subject	語音品質分解	zh_TW
dc.subject	可理解度	zh_TW
dc.subject	清晰度	zh_TW
dc.subject	自然性	zh_TW
dc.subject	連續性	zh_TW
dc.subject	噪聲干擾	zh_TW
dc.subject	speech quality	en_US
dc.subject	objective assessment	en_US
dc.subject	speech quality decomposition	en_US
dc.subject	intelligibility	en_US
dc.subject	clarity	en_US
dc.subject	naturalness	en_US
dc.subject	continuity	en_US
dc.subject	noise intrusiveness	en_US
dc.title	語音品質之客觀估測與分解演算法	zh_TW
dc.title	Objective Assessment and Decomposition of Speech Quality	en_US
dc.type	Thesis	en_US
dc.contributor.department	電信工程研究所	zh_TW
Appears in Collections:	Thesis