標題: 使用跨語語料改進深層類神經網路中文語音辨認器之效能
Using cross lingual data to improve the performance of DNN-based Mandarin-speech recognition
作者: 林建廷
王逸如
Lin, Chien-Ting
Wang, Yih-Ru
電信工程研究所
關鍵字: 深層類神經網路;中文語音辨識;跨語;Deep neural network;Mandarin speech recognition;Cross lingual
公開日期: 2016
摘要: 目前深層類神經網路已成為語音辨識領域中相當熱門的研究,本論文使用Kaldi speech recognition toolkit的環境為基礎實現共享隱藏層式深層類神經網路(Shared-hidden-layer language DNN, SHL-DNN)聲學模型訓練,因為類神經網路中的隱藏層可視為一連串複雜的特徵轉換,並且描述著各種語音之發音特徵,因此不同語言間可以共享。由於中文語料的不足,故使用較充足的英文語料,透過跨語言模型轉換技術(Cross-lingual model transfer),以改善中文語音之辨識率、降低各語者語音辨認率的變異程度。後續加入語言模型建立語音辨識系統,調整解碼過程中的參數並考量實際時間係數(Real Time Factor, RTF),找出最佳操作點,以得到即時性與辨識率兼顧的辨識系統。在訓練語料的部分,本實驗使用長達960個小時的Librispeech英文語料作為來源語言(Source language),目標語言(Target language)則為約24小時的中文語料TCC300,為了檢測系統強健度(Robustness),測試語料除了TCC300以外,另外加入1.9個小時的Sinica COSPRO02語料庫。
Deep neural network (DNN) has been a popular research in automatic speech recognition. In this dissertation, we focus on implementing shared-hidden-layer language DNN acoustic model. The DNNs can be considered as a model that learns a complicated nonlinearity feature transformation and describe the pronouncing of different phonemes. Therefore, the hidden layers of DNNs could be shared for different languages. Unfortunately, there are only few small Taiwanese Mandarin speech corpora available. Consequently, in this paper, we evaluate cross-lingual model transfer approach to improve the performance of Mandarin-speech recognition and reduce the standard deviation of phone error rate with respect to different speakers. Moreover, we introduce a trigram language model to build a speech recognition system and tune the parameters using in decoding to receive the best operation point. Finally, we can build a real-time and high efficiency speech recognition system. In this experiment, a large 960 hours English corpus, Librispeech, was treated as the source language and a small 24 hours Mandarin corpus, TCC300, was treated as the target language. In the testing data, in addition to TCC300, we add an extra small 1.9 hours’ corpus, COSPRO02, to test the robustness of this system.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070360313
http://hdl.handle.net/11536/139182
Appears in Collections:Thesis