標題: | 稀少性的輸入資訊下所造成的分佈不匹配問題在語者確認上的可靠度分析 Reliability Analysis Focusing on Sparse Input Data Caused Distribution Mismatch Problems for Speaker Verification |
作者: | 羅文輝 Wen-Hui Lo 陳信宏 Sin-Horng Chen 電機學院電信學程 |
關鍵字: | 分佈不匹配;混合高斯;稀少資料;語者確認;可靠度;distribution mismatch;GMM;sparse data;speaker verification;reliability |
公開日期: | 2005 |
摘要: | 在語音辨識的領域上,往往需要使用少量的資料來對模型進行校估藉以使得模型更為強健(robust)。在語者確認的問題上,時常也需要面對資料量很少的情形之下從事語者模型的訓練或測試的問題。
本研究首先提出稀少性資料(sparse data)的輸入情況下,語者確認(speaker verification) 的問題在混合高斯 GMM (Gaussian mixture model)模型上的度量分數分佈情形會產生和原先假設之間有落差的現象。本研究稱此種現象為「分佈不匹配(distribution mismatch)的問題」。針對此分佈不匹配的問題,本研究首先提出使用截尾分佈機率密度函數(truncated probability distribution function)的概念來近似。最後以此為基礎,使用次序統計(order statistic)量的概念,推導得出一個以圖(graph)為基礎的聯合分佈機率模型;可以同時以機率的形式描述完整機率密度函數和截尾分佈機率密度函數。
本研究建立一個以輸入資料,資料之最小值,資料之分佈範圍大小,資料分佈範圍下的累積機率(覆蓋率)及資料長度五個隨機變數的聯合分佈機率密度函數。配合Gaussian quadrature 積分的取樣概念,得出最少取樣點下最精準的估計公式。最終的目的是希望以較優勢的資訊量補償在傳統的統計推估上,因為資料量稀少所造成的估計標準誤增加的問題。
最後,本研究以語者語句所獲得之相對於UBM(universal background model)模型規一化平均分數對EER(equal error rate)進行假設檢定(hypothesis test);由實驗的結果得知,假設檢定可以有效的減少語者確認時,因為抽樣誤差所造成的誤判。
本研究的另外的主要成果在於確立稀少性的輸入資訊下,如果要出現原先我們所假設的分佈狀況的可能性將是一個機率的隨機行為,不再是一個假設性的確定性(deterministic)描述。本研究所得出的結論為:「當輸入的樣本數量小於20的時候,輸入樣本的覆蓋範圍和原來的假設PDF之間會互相匹配一致」的假設必須使用機率事件來描述才能完全掌握,而本研究完成了這個機率事件的描述公式。 ABSTRACT It is a frequent facing problem for sparse data input to make a robust model testing with speech recognition. This phenomenon also encountered in the field of speaker verification with small data enrollment to do training or testing. A new approach to sparse data input caused problems named “distribution mismatch(DM)” was addressed. The core of DM which was on account of the coverage of the probability distribution function(PDF) of the input data which are applied to GMM(Gaussian mixture model) score calculation is not full mapping to the original PDF assumption. There maybe be some differences between the original assumption PDF to the new one generated by sparse data input and we suggested to using the truncated probability distribution function for modeling this situation. The most important addition to be made to what we have said about DM is that we have derived a new joint PDF based on graph theory with order statistic and the new formula would act as the truncated PDF or the original PDF measured by this joint PDF. We succeed establishing the joint PDF which is compose of five random variables, including the input data, the minimum order of input data, the range of input data, the coverage of input data and the sample size of input data to estimate with Gaussian quadrature integration. In the end of experiment, we take a hypothesis test to the equal error rate(EER) of the average score per frame of per sentence announced by the speaker normalized to the universal background model(UBM) and the same score announced by imposter normalize to the UBM model. There are good evidences to show that hypothesis test could decrease the error probability for speaker verification. The other finding finished by this study is that we discover a special fact caused by sparse data input. We usually regard the input random variable submitted to a certain probability distribution function but it is probabilistic to agree with this assumption when the input sample size is less than 20. Finally, we have derived the joint probability distribution function about it. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT008867547 http://hdl.handle.net/11536/76546 |
Appears in Collections: | Thesis |
Files in This Item:
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.