標題: | 基於兩階段之聽覺感知模型之類神經網路應用於語者識別 Two-stage attentional auditory model inspired neural network and its application to speaker identification |
作者: | 羅玉雯 冀泰石 Lo, Yu-Wen Chi, Tai-Shih 電信工程研究所 |
關鍵字: | 聽覺感知模型;語者識別;類神經網路;Auditory model;speaker identification;neural network |
公開日期: | 2017 |
摘要: | 於本論文中,我們根據神經生物學研究,得知在聲音訊號進入耳朵後,即會針對聲音的各個頻率進行分頻的動作,並產生出聽覺頻譜圖,且根據專注聽覺現象和生物聽覺實驗,也發現的大腦聽覺皮質上神經作用的模式,因此結合當今正紅的類神經網路學習,發想出一種獨特的類神經網路模型,並針對語者辨識這個議題做討論,期望能藉由神經生理學的知識,有效的解決工程上的問題。而我們所設計的模型,是利用兩層不同維度的卷積神經網路(Convolutional Neural Network),分別模擬初期耳蝸階段及大腦皮質階段,透過設計卷積核初始值,即耳蝸階段多組一維分頻濾波器和同時解析時頻資訊的二維濾波器,以使模型能夠快速地達到收斂狀態。而透過模型訓練,根據目的與環境變因的不同,模型會自動調整其中參數,使輸入資料映射至目標的型態。同時我們也針對所提出的模型架構,進行了多種形態的比較,進而發現在給定初始值的狀況下,即使訓練不夠充分,也能收斂至較好的狀態。 Revealed by psychophysical and neuro-physiological studies, the cochlea analyzes the incoming sound in the time and logarithmic-frequency domains. Afterward, the neural activities pass through the auditory pathway to the primary auditory cortex (A1) for further analysis. From the functional point of view, the cochlea produces a 2-D auditory spectrogram and the A1 analyzes the 2-D spectrogram. In this thesis, we propose a neural network (NN) to simulate an attentional auditory model and apply it to speaker identification. The proposed NN consists of 1-D and 2-D convolutional neural networks which mimic the functions of the cochlea and the cortex respectively. By deriving initial kernels of the convolutional layers from the neuro-physiological auditory model, we demonstrated that the proposed NN can quickly reach the convergence state with high performance. In addition, even without training, the proposed system with auditory model based kernels outperforms the randomly initialized NN in speaker identification. |
URI: | http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070460233 http://hdl.handle.net/11536/142441 |
Appears in Collections: | Thesis |