標題: | 使用取樣點式聲學參數之音素分段 Phonetic Segmentation using Sample-based Acoustic Parameters |
作者: | 林宥余 Lin, You-Yu 王逸如 Wang, Yih-Ru 電信工程研究所 |
關鍵字: | 取樣點式聲學參數;音素分段;音素端點偵測;Sample-based Acoustic Parameters;Phone Segmentation;Phone Boundary Detection |
公開日期: | 2009 |
摘要: | 精確的自動語音分段,應用於許多語音辨識系統或是語音合成的研究被認為是有助於提升系統效能的資訊,但是擁有龐大數量的語料庫經由人工準確的標記是相當費時費力,因此本研究以獲得一個精確的音素端點偵測以及自動語音分段系統為目標,以期提升語音辨識或是合成系統的效能。
本論文提出數個取樣點式聲學參數如各頻段信號波封、聲學參數之上升率、頻譜熵以及頻譜KL距離,以描述語音信號中各種不同音素之語音特性,加入音素端點偵測以及自動語音分段的系統架構中,再分別針對音素端點以及自動語音分段所選用的基本語音單位訂定目標函數,接著使用前饋式類神經網路多層感知器以半監督式之模型訓練方法來建立音素端點偵測器之模型。最後對於不同語料庫的語句來進行音素端點偵測的實驗與自動語音分段的效能分析。 Automatic speech segmentation with high precision and accuracy is considered worthwhile in some speech recognition and speech synthesis researches. Manual labeling is the most precise way, but a huge database with manual labeling and segmentation are very time-consuming process. In order to promote the performances of speech recognition/synthesis system, sample-based phone boundary detection and segmentation algorithms are proposed in this paper. Some sample-based acoustic parameters are first extracted in the proposed method for modeling acoustic features in the spectral of speech signal, including six sub-band signal envelopes, rate of rise, sample-based KL distance and spectral entropy. Then, the sample-based KL distance is used for boundary candidates pre-selection and a target fuction labeling that specified the state-transistions between different classes which are pre-defined based on the transcription level. Last, a semi-supervised neural network is employed for final phone boundary detection and automatic speech segmentation. Finally, experimental results and analyses for phoneme detection and automatic speech segmentation are disussed with different corpus. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT079713564 http://hdl.handle.net/11536/44581 |
Appears in Collections: | Thesis |
Files in This Item:
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.