標題: 深層鑑別式流形學習於語者辨識之研究
Deep Discriminative Manifold Learning for Speaker Recognition
作者: 陳靜懷
Chen, Ching-Huai
簡仁宗
Chien, Jen-Tzung
電機工程學系
關鍵字: 流形學習;非線性降維法;深層類神經網路;鑑別式學習;語者辨識;manifold learning;nonlinear dimensionality reduction;deep neural network;discriminative learning;speaker recognition
公開日期: 2015
摘要: 語者辨識是透過語音的特性來進行身份判定或驗證,是生物特徵識別的重要主題之一,應用範圍包括門禁系統、資安控管、電子商務及犯罪鑑識等。基於i-vector及機率線性鑑別式分析的語者辨識系統是近年來常見的方法之一。機率線性鑑別式分析找到一個線性轉換,使得原始資料被投影到低維特徵空間時,保留特徵間的鑑別性,亦即拉開不同語者的資料在特徵空間的投影量。然而,此方法中的線性假設不足以用來處理資料間複雜的非線性關係。因此,我們希望加入非線性模型來模擬原始資料與低維特徵間的關係。近年來深度學習技術的躍進,提供了一種新選擇。深度學習是近年來機器學習領域中熱門的研究議題,已經成功應用於語音辨識、自然語言處理及機器視覺等領域並獲得突破性的進展。深層類神經網路包含多層非線性轉換,可以模擬資料間複雜的非線性關係。本論文提出基於深層類神經網路架構的流形學習演算法,結合傳統的線性鑑別式機率模型及深層類神經網路的高度非線性特性,能夠萃取出具有鑑別性的特徵。相鄰嵌入法 (Neighbor embedding) 是一類非監督式的流形學習演算法,在降維的過程中並沒有利用到類別的資訊,且具有out-of-sample的問題。線性鑑別式機率模型雖然有考慮類別的資訊卻受限於其線性假設,無法有效掌握資料間複雜的非線性關係。因此,我們的方法結合鑑別性模型及非線性模型,並利用深層類神經網路架構解決out-of-sample問題。在MNIST、USPS、NIST i-vector Machine Learning Challenge資料庫的實驗結果顯示,本論文所提出的方法可以提升正確率。
Speaker recognition has emerged as a crucial research topic in the areas of biometrics for many years. This technology can be used to build biometric system via human voice for various applications such as physical access control, network security control, tele-commerce, forensics, etc. In general, the speaker recognition based on i-vector and probabilistic linear discriminant analysis (PLDA) is known as the state-of-art approach to achieve the most competing performance among different models. PLDA is a linear model for dimensionality reduction, which describes the relationship between observation data and latent classes. One weakness of PLDA is the assumption of linearity which constrains the recognition performance. This concern suggests us to generalize the solution to dimensionality reduction from linear model to nonlinear model based on the manifold learning. In this thesis, we present a new nonlinear dimensionality reduction by using the supervised manifold learning with stochastic neighbor embedding. Considering the powerfulness of deep neural network (DNN) for capturing the highly complicated real-world data, we present the deep and discriminative manifold learning where the class information in the transformed low-dimensional space is preserved. Importantly, the objective function for deep manifold learning is formed as the Kullback-Leibler divergence between the probability measures of the labeled samples in high-dimensional and low-dimensional spaces. Different from conventional methods, the derived objective does not require the empirically-tuned parameter. This objective is optimized to attract those samples from the same class to be close together and simultaneously impose those samples from different classes to be far apart. A DNN is accordingly trained and viewed as a parametric mapping between original data and low-dimensional discriminative representations. In the experiments, we illustrate the effectiveness of the proposed deep discriminative manifold learning on MNIST and USPS datasets in terms of visualization and classification performance. Furthermore, we investigate different methods for speaker verification through the task of NIST i-vector Machine Learning Challenge.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT070250737
http://hdl.handle.net/11536/127178
顯示於類別:畢業論文