標題: 以時、頻域多重解析聽覺模型為基礎之客觀語音品質估測(II)
The Spectro-Temporal Multiresolution Auditory Model Based Objective Speech Quality Assessment(II)
作者: 冀泰石
CHI TAI-SHIH
國立交通大學電信工程學系(所)
關鍵字: 語音品質;非侵入式客觀語音品質;多重解析聽覺模型;多維度品質測量;Speech quality;non-intrusive objective speech quality;multi-resolutionauditory model;multi-dimensional quality assessment
公開日期: 2008
摘要: 對電信系統使用者而言,語音品質一直是衡量服務品質(QoS)的重要指標之ㄧ。因 此,在過去的十幾年間,語音品質越來越受電信服務業者所注重。研究人員也開始針 對不同的電信通道追求更可靠的客觀品質測量,進而取代耗費耗時的主觀式品質測 量。客觀語音品質可由侵入式及非侵入式兩種方法測量。去年,我們已提出以乾淨語 音作參照比對的侵入式客觀品質測量。今年,我們將發展以統計模型為核心的非侵入 式客觀測量(不含參照比對的乾淨語音訊號)。 近年來研究顯示,客觀品質的測量,已由訊號波形或語音發聲模型參數的距離測 量轉變為聽覺感知模型參數的距離測量。本研究中,我們將發展一套由時、頻域多重 解析聽覺感知模型中的特徵為基礎之客觀品質測量。此聽覺模型是描述外圍及中心聽 覺系統於生物生理學上的功能。因此,本聽覺模型幾乎包含沿者耳朵至大腦階段聽覺 路徑中所有的感知現象。 我們將對語音品質測量有重要影響的感知特徵發展統計模型,這些感知特徵包含 於分別影響語音理解度(speech intelligibility)及語音自然度(naturalness)的母音及子音 之中。此外,我們將建立人類評分行為的高階認知模型。例如,此模型將包含人類對 於語音成份的消失或加成的感知不對稱比重(asymmetry weighting)評比觀念。最後,經 由對主觀品質測量的估計,我們將驗證所提之結合低階感知特徵和高階認知模型的客 觀語音品質測量。更進一步地,我們將對我們所提的測量與現今最先進的客觀品質測 量在大量語料庫下作比較驗證。
Speech quality is considered a major aspect of the quality of service (QoS) by end users. Therefore, it is more and more emphasized during the past decades by telecommunication service providers. Researchers have been pursuing reliable objective measures to replace the time-consuming and costly subjective tests on speech quality through communication channels. Objective measure can be performed either intrusively or non-intrusively. We have addressed the intrusive subjective measure with a clean reference in the previous year. In the coming year, we will tackle the non-intrusive measure, in which no reference signals are available, by a statistical modeling approach. Recent researches reveal that the paradigm of objective speech quality measures have shifted from waveforms or speech-production-model-parameter based distances to perception-model-parameter based distances. In this research, we will develop an objective measure based on features from a multi-resolution spectro-temporal auditory model. This auditory model is based on known biophysics of the peripheral and central auditory systems, such that it includes most of the low-level perceptual properties of hearing from the ear up to the brain level along the auditory pathway. We will develop statistical models for perceptual features which are crucial to speech quality assessment, such as vowels and consonants in pertaining to speech intelligibility and naturalness, respectively. In addition, we will build a cognitive model for human’s scoring behaviors. The cognitive model will include human behaviors such as the “asymmetry weighting” on the missing or addition speech components. In the end, we will validate our approach of cascading the extraction of low-level perceptual features and the scoring of the high-level cognitive model by estimating the subjective quality measurements. Furthermore, we will compare our estimates to the estimates from other state-of-the-art objective measures in a large-scale database evaluation.
官方說明文件#: NSC97-2221-E009-114
URI: http://hdl.handle.net/11536/101932
https://www.grb.gov.tw/search/planDetail?id=1676844&docId=288494
顯示於類別:研究計畫


文件中的檔案:

  1. 972221E009114.PDF

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。