具穩定性最佳化理論於深層類神經網路語音辨識之研究

標題:	具穩定性最佳化理論於深層類神經網路語音辨識之研究 Stabilization in Optimization for DNN Speech Recognition
作者:	黃珮雯簡仁宗 Huang, Pei-Wen Chien, Jen-Tzung 電機工程學系
關鍵字:	深層學習;隨機梯度下降法;尼氏加速梯度下降法;隨機對偶坐標上升法;最佳化演算法;語音辨識;deep learning;stochastic gradient descent;Nesterov’s accelerated stochastic gradient descent;stochastic dual coordinated ascent;speech recognition;optimization algorithm
公開日期:	2016
摘要:	深層類神經網路(Deep Neural Network, DNN)現今被廣泛地運用在語音識別之中，以深層類神經網路模型取代傳統的高斯混和模型(Gaussian Mixture Model, GMM)建立成之聲學模型(Acoustic Model)，可以大幅提升語音辨識系統的效能。在複雜的模型結構中，如何有效率且正確地更新深層類神經網路之權重參數是門重要的研究議題。至今，最佳化演算法於深層學習中，最常被使用的是隨機梯度下降法(Stochastic Gradient Descent, SGD)，其更新穩定性高，但在平緩的區間梯度差變小，更新速度變得極度緩慢甚至停滯，並且容易掉進局部最小值。尼氏加速隨機梯度下降法(Nesterov’s accelerated stochastic gradient descent, NAG)，其採用當前的動量(Momentum)資訊來修正梯度更新方向，加速其參數更新速度，達到比隨機梯度下降法更好的收斂，並且降低掉入局部最小值的機率。本篇論文結合隨機梯度下降法的穩定性，與尼氏加速隨機梯度下降法更快速地更新參數與修正梯度方向的特性，提出一套複合性演算法，自動切換權重參數更新系統，以達到更好收斂。另外，本論文於第二部分探討梯度變異性與模型訓練之關係。在文獻中指出，梯度變異性降低(Variance Reduction)有助於參數更新之穩定度，本論文將基於隨機對偶坐標上升法(Stochastic Dual Coordinated Ascent, SDCA)原有梯度變異量降低之特性結合動量法，推導出去對偶性加速隨機對偶坐標上升法(Accelerated Stochastic Dual Coordinated Ascent without duality, Dual-free ASDCA)，去對偶性以目標函數取代對偶目標函數來更新對偶參數，使其可解深層類神經網路模型等具非凸特性函數，本論文提出收斂性之理論證明並計算其時間複雜度，本方法可加速參數更新，並保有更新參數更高的穩定度，使其強健聲學模型。在實驗評估中，我們將使用一些模擬數據，圖像化現今常見最佳化演算法與本研究對模擬函數之參數更新行為。於語音辨識方面，我們使用卡爾迪(Kaldi)語音辨識軟體實現本論文提出的演算法，實驗比較多種最佳化演算法於CUSENT及Aurora-4語音資料庫。 Deep neural network (DNN) is a new trend for speech recognition. The optimization procedure for estimating the abundant DNN parameters is seen as a crucial research topic. Conventionally, the learning procedure based on the stochastic gradient descent (SGD) algorithm is stable but prone to be trapped into local optimum. More recently, the Nesterov’s accelerated stochastic gradient descent (NAG) algorithm upgrades the learning rule by adopting the current momentum information. Therefore, NAG is less likely jumped into local minimum and is faster than SGD. In general, the optimization based on SGD is more stable while that based on NAG is faster and more accurate. This study aims to boost the performance of DNN acoustic model by combining these two complimentary optimization algorithms for speech recognition. This hybrid algorithm automatically switches both algorithms based on the convergence rate over validation data. On the other hand, this thesis further investigates the stability of training algorithm and improves the convergence by reducing the variance of the gradients in an optimization procedure. We upgrade the optimization based on the stochastic dual coordinated ascent (SDCA) to a new optimization method called the accelerated SDCA without duality (dual-free ASDCA) which can incorporate the momentum method to accelerate the updating of model parameters and simultaneously reduce the variance of the gradients. Using the dual-free ASDCA, the optimization of dual function in SDCA equivalent to a convex loss can be replaced by directly optimizing the primal function to update dual parameters. The optimization for the nonconvexity in DNN can be resolved. This thesis confirms these properties with theoretical illustration and shows the time complexity using this algorithm in this thesis. In the experiments, we evaluate the proposed methods by using the simulated function as well as the DNN acoustic model. The open-source Kaldi toolkit is used in our implementation. The results of DNN speech recognition using CUSENT and Aurora-4 will be reported.
URI:	http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070350718 http://hdl.handle.net/11536/142450
Appears in Collections:	Thesis