標題: 高斯混合模型的學習與其在語者識別上的應用
Model-based learning for Gaussian Mixture Model and its application on speaker identification
作者: 鄭士賢
Shi-Sian Cheng
傅心家
Hsin-Chia Fu
資訊科學與工程研究所
關鍵字: 高斯混合模型;語者識別;貝氏資訊法則;語者分段;新聞主播;gaussian mixture model;GMM;BIC;clustering;speaker identification;speaker segmentation;news
公開日期: 2001
摘要: 本論文主要在探討高斯混合模型(Gaussian Mixture Model, GMM)的學習與其在語者識別(Speaker Identification)上的應用。在先前的研究中,用語者的GMM來做語者識別已經有很不錯的成果,但未對GMM的高斯元件個數與共變異數矩陣的型態(全(full)或對角(diagonal)共變異數矩陣)做深入的探討。在本論文中,我們提出一個“以BIC(Bayesian Information Criterion)為基礎的自我成長學習法”,用自動決定高斯元件的個數的方式來學習GMM;我們並且分別用全共變異數矩陣和對角共變異數矩陣的GMM來做語者識別,比較其實驗結果。我們將電視新聞節目錄成mpeg檔,從中擷取新聞主播的語料,其中包含了19位女主播和3位男主播。在此測試語料下,全共變異數矩陣GMM語者識別器的識別率可達95.84%;對角共變異數矩陣GMM語者識別器的識別率可達97.90%。我們並且用以GMM為基礎的語者識別方法來偵測新聞主播在新聞節目中的位置,做新聞故事的切割。我們用七小時的新聞節目作為測試資料,對於新聞主播的偵測我們有90.20%的精確率(precision rate),92.5%的召回率(recall rate)。
This paper mainly discusses the learning of Gaussian Mixture Model and its application on speaker identification. In the previous studies, it has been shown that using GMM for speaker identification would perform well. But they do not discuss deeply about the number of gaussian component of GMM and the type of covariance matrix(full or diagonal). In this paper, we propose a BIC-based self-growing learning method for GMM and determine the number of gaussian component of each GMM automatically. We also use full covariance matrix GMM and diagonal covariance matrix GMM for speaker identification separately and then compare their experiment result. Our speaker database include 19 anchor woman and 3 anchor man from mpeg files that we captured from TV news by capture card. Under this database, the GMM speaker identifier with full covariance attains 95.84% identification accuracy rate, and 97.90% accuracy rate with diagonal covariance matrix. In this paper, we also use the GMM-based speaker identification method for TV-news anchor detection and news story segmentation. We use 7 hours of TV-news program as testing data, and in our experiment the precision rate attains 90.20% and the recall rate attains 92.5%。
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT900392076
http://hdl.handle.net/11536/68486
Appears in Collections:Thesis