貝氏遞迴式類神經網路於語言模型之建立

標題:	貝氏遞迴式類神經網路於語言模型之建立 Bayesian recurrent neural networks for language modeling
作者:	顧原竹 Ku, Yuan-Chu 簡仁宗 Chien, Jen-Tzung 電信工程研究所
關鍵字:	貝氏學習;遞迴式類神經網路;海森矩陣;快速近似法;語言模型;語音辨識;Bayesian learning;recurrent neural network;Hessian matrix;rapid approximation;language model;speech recognition
公開日期:	2013
摘要:	本篇論文提出一套貝氏學習法則來建構遞迴式類神經網路(Recurrent Neural Network)語言模型並應用於大詞彙連續語言辨識系統上，這套法則的目標在於解決遞迴式類神經網路語言模型之模型正規化問題，透過補償模型參數的不確定性(Uncertainty)，提升語音辨識系統的強健性及辨識效能，我們的作法是使用一組高斯事前機率(Gaussian Prior)表示類神經網路參數的隨機性，此機率分布的超參數(Hyperparameter)或正規化參數(Regularization Parameter)是經由最大化邊緣相似度(Marginal Likelihood)函數估測出來，遞迴式類神經網路參數則是透過最大化事後(Maximum a Posteriori)機率所獲得，事後機率的計算是受正規化參數所影響，取過負對數(Negative Logarithm)的事後機率相當於是一種正規化後的交叉熵誤差函數(Cross Entropy Error Function)，這套演算法可以建立具正規化之遞迴式類神經網路模型，然而實現本方法的過程需要大量高維度梯度向量(Gradient Vector)的外積(Outer Product)計算以求取模型參數二次微分的海森矩陣(Hessian Matrix)，我們提出一套快速近似法以擷取少量突出的外積項做海森矩陣的計算，大量降低貝氏類神經網路模型的實現成本。在華爾街日報(Wall Street Journal)大詞彙連續性語音語料庫，Penn Treebank和十億詞彙(1-Billion-Word)標準資料庫的初步評估實驗結果顯示，快速貝氏學習法可以有效提升遞迴式類神經網路語言模型的預估機率量測(Perplexity)及語音辨識率。 This study presents the Bayesian framework to construct the recurrent neural network language model (RNN-LM) for speech recognition. Our idea is to regularize the RNN-LM by compensating the uncertainty of the estimated model parameters which is represented by a Gaussian prior. The objective function in Bayesian RNN is formed as the negative logarithm of the posterior distribution or equivalently the regularized cross entropy error function. The regularized model is not only constructed by training the regularized parameters according to the maximum a posteriori criterion but also estimating the Gaussian hyperparameters according to the type 2 maximum likelihood method. Hessian matrix is calculated to implement the Bayesian RNN. However, a critical issue in Bayesian RNN-LM is the heavy computation of Hessian matrix which is formed as the sum of a large amount of outer-products of high-dimensional gradient vectors. We present a rapid approximation to reduce the redundancy due to the curse of dimensionality and speed up the calculation by summing up a small set of salient outer-products. Experiments on Wall Street Journal, Penn Treebank and 1B Word Benchmark corpora show that rapid Bayesian learning for RNN-LM consistently improves the perplexity and word error rate in comparison with standard RNN-LM.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#GT070160261 http://hdl.handle.net/11536/75226
顯示於類別：	畢業論文