標題: 以傅立葉轉換之時頻域調變為特徵之因子隱藏式馬可夫模型多音高追蹤法
Multipitch Tracking by Factorial Hidden Markov Model Using Spectro-Temporal Modulations of Fourier Spectrogram
作者: 謝坤燁
Hsieh, Kun-Yeh
冀泰石
Chi, Tai-Shih
電信工程研究所
關鍵字: 音高;馬可夫模型;時頻域調變;pitch;hidden Markov model;spectro-temporal modulations
公開日期: 2013
摘要: 近幾年來,音高在語音處理中扮演著一個很重要的角色,音高追蹤被廣泛的應用在各研究領域中,其中單音高追蹤已經可以使90%音框的估計音高和真實音高的誤差小於5%,改進的空間有限,但在單聲道多音高追蹤上仍有許多改進的空間。 本論文將使用RAPT單音高追蹤演算法預先追蹤出各語者的單音高訊號並且建立各語者的音高狀態的初始機率及轉移機率矩陣,接著以模擬人類大腦皮質分析的方式,將頻譜轉為rate-scale圖,並將rate-scale圖當作特徵向量,以高斯混和模型為模板,建立各語者在不同音高下的機率模型,也就是各語者的音高狀態的觀察機率,最後我們使用混和最大化模型,求出混和訊號的特徵向量在不同音高組合下的機率模型,再使用因子隱藏式馬可夫模型將最可能的音高軌跡追蹤出來。實驗結果顯示,有使用rate-scale當作特徵向量的音高追蹤系統比使用頻譜當作特徵向量的音高追蹤系統抵抗雜訊的能力較好。
In recent years, pitch plays an important role in audio signal processing. Pitch tracking used in a wide range of applications. Single pitch tracking can make the error between the estimated pitch and true pitch within 5% in 90% frames, but there is a lot of room for improvement in multiple pitch tracking. In this thesis, we will apply Robust Algorithm Pitch Tracking (RAPT) to track the single speaker signal and to build up the prior probability and transition probability matrix of each speaker, and then we convert the spectrogram into rate-scale domain by the means which is inspired by cortical stage of auditory perceptual model. We use the value of rate-scale domain as feature vector and model the feature vector using Gaussian mixture models. Then we employ the mixture maximization model to establish the probability model for the feature vector of mixture speech. Finally, a FHMM is applied for tracking pitch over time. In the result of experiment, we found the system using rate-scale as feature vector has much capability of resisting noise than spectrum.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT070060272
http://hdl.handle.net/11536/73980
顯示於類別:畢業論文