标题: | Deep Long Short-Term Memory Networks for Speech Recognition |
作者: | Chien, Jen-Tzung Misbullah, Alim 电机工程学系 Department of Electrical and Computer Engineering |
关键字: | speech recognition;acoustic modeling;hybrid neural network;long short-term memory |
公开日期: | 1-一月-2016 |
摘要: | Speech recognition has been significantly improved by applying acoustic models based on deep neural network which could be realized as the feedforward NN (FNN) or the recurrent NN (RNN). In general, FNN is feasible to project the observations onto a deep invariant feature space while RNN is beneficial to capture the temporal information in a sequential data for speech recognition. RNN based on long short-term memory (LSTM) is capable of storing inputs over a long time period and thus exploiting a self-learned mechanism for long-range temporal context. Considering the complimentary FNN and RNN in their modeling capabilities, this paper presents a deep model which is constructed by stacking LSTM and FNN. Through the cascade of LSTM cells and fully-connected feedforward units, we explore the temporal patterns and summarize the long history of previous inputs in a deep learning machine. The experiments on 3rd CHiME challenge and Aurora-4 show that the stacks of hybrid model with FNN post-processor outperform stand-alone FNN and LSTM and the other hybrid models for robust speech recognition. |
URI: | http://hdl.handle.net/11536/146706 |
期刊: | 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) |
显示于类别: | Conferences Paper |