標題: | 中文語音屬性偵測之研究 A Research of Mandarin Speech Attribute Detection |
作者: | 許見偟 Chien-Huang Hsu 陳信宏 Sin-Horng Chen 電信工程研究所 |
關鍵字: | 中文語音屬性偵測;新世代語音辨識;發音方法;發音位置;Mandarin speech attribute detection;Next Generation ASR;manner of articulation;place of articulation |
公開日期: | 2006 |
摘要: | 新世代的自動語音辨識技術架構是一個以知識為基礎(knowledge-based),加上資料驅動(data-driven)的模式,其前端為語音屬性與事件偵測器群,藉由抽取不同的語音特徵參數去偵測某一時段中語音的屬性及事件,尋找任何可以提供語音辨識的線索,提供給後級作語音事件及知識整合後,作證據確認及決策,以其能夠突破目前語音辨識的能力與技術。
本論文基於此概念,首先以英文語料製作以高斯混合模型為基礎的語音屬性貝氏偵測器,包含發音方法偵測器及發音位置偵測器,並結合此兩類偵測器的結果觀察是否能夠使效能提昇。再利用中文語料製作語音屬性偵測器,但是中文語料庫並無精確的音素切割位置,因此我們從中文音節的切割位置起始對語料庫作自動切割以求得音素的切割位置,並以此切割位置製作中文語音屬性偵測器。而在中文音節的切割過程中,我們建立一個背景模型(universal background model,UBM)去描述語料庫中的錄製者所發出的呼吸聲、背景雜訊及背景人聲的分佈,以期望能有效的區分語音、靜音與雜訊的部份,使得能夠訓練出更精確的模型。
最後再對中、英文的語音屬性偵測器作效能與錯誤分析。 Next generation ASR system is a knowledge-based and data-driven paradigm. It’s front-end is the bank of speech attribute and event detectors, and it’s function is to detect the speech attributes and events in the speech signal. By organizing the outputs of front-end and knowledge, it would be sent to next stage to make evidence verified and decision. It would be expected to exceed the current state-of-the-art HMM-based ASR. Based on the concept, firstly, in this thesis we will use English corpus to make speech attribute detectors, including manner of articulation and place of articulation detectors. And the performance of combination of this two class detectors was examined. Secondly, speech attribute detectors of Mandarin was investigated. Without precise Mandarin phoneme labeling, the auto-labeling results from HMM system was used as the labels of Mandarin speech attribute detectors. In the process of auto-labeling Mandarin syllable boundary, we build a universal background model (UBM) to model the breath and noise in the corpus, and it is expected to improve the performance of Mandarin speech attribute detectors. Finally, we would make error analysis of English and Mandarin speech attribute detectors. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT009313532 http://hdl.handle.net/11536/78349 |
Appears in Collections: | Thesis |
Files in This Item:
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.