標題: 建構於隱藏式馬可夫模型之語者辨識
HMM-Based Speaker Recognition
作者: 魏禎德
Wei, Jen-Der
劉啟民
Liu Chi-Min
資訊科學與工程研究所
關鍵字: 語者辨識;隱藏式馬可夫模型;語者判別;語者確認;Speaker Recognition;HMM;Speaker Identification;Speaker Verification
公開日期: 1997
摘要: 摘 要 語者辨識是一種利用機 器自動辨別語者的過程,主要可以分為語者判別及語者確認兩種。 在 一段語者的聲音訊號中,主要包含著兩種訊息,一是有關音素的特徵,另 一則是有關語者聲紋的特徵。在這篇論文中,主要探討此兩種訊息對語者 辨識的重要性。第一個部分,我們主要討論的主題是音素特徵在語者辨識 中的影響。在這個部份,我們對每一位語者分別建立了一個高斯模型以 及10個以數字為單位的隱藏式馬可夫模型。前者我們用來表示在語者辨識 時,不考慮音素特徵的情形;後者則代表同時考慮音素特徵及語者特徵時 的情形。我們對這兩類模型分別進行了有關混合數、訓練語句數、測試句 長度以及語者人數等的實驗。結果顯示後者的效果較好。以總混和數60為 例,前者在語者判別及語者確認的錯誤率分別為7.08% 及6.16%;後者則 為 6.69% 及5.86%。 在第一部分的討論中,以同時考慮音素及語者兩 項特徵的結果較好。因此,在第二個部份中,我們以隱藏式馬可夫模型為 基礎,討論音素特徵在語者辨識中所應佔的比重,並提出四種不同的辨識 策略。在這些策略中,以組合兩類模型的方式可以達到最好的結果,其語 者判別及語者確認的錯誤率分別為 5.73%及 5.15%。其次則是利用與語者 無關的隱藏式馬可夫模型,抽取出具有辨別語者能力的音框進行辨識。這 種方式可以達到的語者判別及語者確認的錯誤率分別為 6.56% 及 5.78%。 Abstract Speaker recognition is the process of automatically recognizingthe speaker on the basis of information obtained from speech waves.It can be usually divided into two subclasses: speaker identificationand speaker verification. The speech signal contains both the phoneme and the speakercharacteristics. While the former carries the phoneme messages, thelater bring the information of the speaker. This thesis considers thetwo characteristics on speaker recognition. First, we discuss theeffects of phoneme characteristics on speaker recognition. We constructone single GMM and 10 digital HMMs for each speaker. The GMM is referredto the condition of reducing the phoneme information, and the HMMs areassociated to that of dealing with both the phoneme information andspeaker characteristics. We exams the performance of the two kinds ofmodels through various mixture numbers, training data quantity, testingdata length, and the speaker population size. With the total mixturesnumber equal to 60, the error rate (ER) of using GMMs is 7.08% in speakeridentification, and equal error rate (EER) 6.16% in speaker verificationsystem. While using the HMMs, we can reduce the ER to 6.69% in speakeridentification, and the EER to 5.86% in speaker verification. Because that the consideration of phoneme and speaker characteristicsresult in better performance, we provide four schemes for speakerrecognition based on the HMMs in the second part. These four schemesconsider different weights of the phoneme characteristics in speakerrecognition. The best scheme is the model-combining method with the compensationmodification. This method can lead to an ER 5.73% in speaker identificationand an EER 5.15% in speaker verification. The second one is theframe-refining method which modify the reliability of each frame of aninput utterance. Using this method can reduce the ER to 6.56% in speakeridentification and the EER to 5.78% in speaker verification.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT860392068
http://hdl.handle.net/11536/62803
Appears in Collections:Thesis