貝氏遞迴式類神經網路於語言模型之建立

Full metadata record

DC Field	Value	Language
dc.contributor.author	顧原竹	en_US
dc.contributor.author	Ku, Yuan-Chu	en_US
dc.contributor.author	簡仁宗	en_US
dc.contributor.author	Chien, Jen-Tzung	en_US
dc.date.accessioned	2014-12-12T02:42:47Z	-
dc.date.available	2014-12-12T02:42:47Z	-
dc.date.issued	2013	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#GT070160261	en_US
dc.identifier.uri	http://hdl.handle.net/11536/75226	-
dc.description.abstract	本篇論文提出一套貝氏學習法則來建構遞迴式類神經網路(Recurrent Neural Network)語言模型並應用於大詞彙連續語言辨識系統上，這套法則的目標在於解決遞迴式類神經網路語言模型之模型正規化問題，透過補償模型參數的不確定性(Uncertainty)，提升語音辨識系統的強健性及辨識效能，我們的作法是使用一組高斯事前機率(Gaussian Prior)表示類神經網路參數的隨機性，此機率分布的超參數(Hyperparameter)或正規化參數(Regularization Parameter)是經由最大化邊緣相似度(Marginal Likelihood)函數估測出來，遞迴式類神經網路參數則是透過最大化事後(Maximum a Posteriori)機率所獲得，事後機率的計算是受正規化參數所影響，取過負對數(Negative Logarithm)的事後機率相當於是一種正規化後的交叉熵誤差函數(Cross Entropy Error Function)，這套演算法可以建立具正規化之遞迴式類神經網路模型，然而實現本方法的過程需要大量高維度梯度向量(Gradient Vector)的外積(Outer Product)計算以求取模型參數二次微分的海森矩陣(Hessian Matrix)，我們提出一套快速近似法以擷取少量突出的外積項做海森矩陣的計算，大量降低貝氏類神經網路模型的實現成本。在華爾街日報(Wall Street Journal)大詞彙連續性語音語料庫，Penn Treebank和十億詞彙(1-Billion-Word)標準資料庫的初步評估實驗結果顯示，快速貝氏學習法可以有效提升遞迴式類神經網路語言模型的預估機率量測(Perplexity)及語音辨識率。	zh_TW
dc.description.abstract	This study presents the Bayesian framework to construct the recurrent neural network language model (RNN-LM) for speech recognition. Our idea is to regularize the RNN-LM by compensating the uncertainty of the estimated model parameters which is represented by a Gaussian prior. The objective function in Bayesian RNN is formed as the negative logarithm of the posterior distribution or equivalently the regularized cross entropy error function. The regularized model is not only constructed by training the regularized parameters according to the maximum a posteriori criterion but also estimating the Gaussian hyperparameters according to the type 2 maximum likelihood method. Hessian matrix is calculated to implement the Bayesian RNN. However, a critical issue in Bayesian RNN-LM is the heavy computation of Hessian matrix which is formed as the sum of a large amount of outer-products of high-dimensional gradient vectors. We present a rapid approximation to reduce the redundancy due to the curse of dimensionality and speed up the calculation by summing up a small set of salient outer-products. Experiments on Wall Street Journal, Penn Treebank and 1B Word Benchmark corpora show that rapid Bayesian learning for RNN-LM consistently improves the perplexity and word error rate in comparison with standard RNN-LM.	en_US
dc.language.iso	en_US	en_US
dc.subject	貝氏學習	zh_TW
dc.subject	遞迴式類神經網路	zh_TW
dc.subject	海森矩陣	zh_TW
dc.subject	快速近似法	zh_TW
dc.subject	語言模型	zh_TW
dc.subject	語音辨識	zh_TW
dc.subject	Bayesian learning	en_US
dc.subject	recurrent neural network	en_US
dc.subject	Hessian matrix	en_US
dc.subject	rapid approximation	en_US
dc.subject	language model	en_US
dc.subject	speech recognition	en_US
dc.title	貝氏遞迴式類神經網路於語言模型之建立	zh_TW
dc.title	Bayesian recurrent neural networks for language modeling	en_US
dc.type	Thesis	en_US
dc.contributor.department	電信工程研究所	zh_TW
Appears in Collections:	Thesis