應用隱性馬可夫模型於國語單音辨認之研究

Title:	應用隱性馬可夫模型於國語單音辨認之研究 Hidden Markov Model on Isolated Mandarin Words Recognition
Authors:	賓少煌 Pin, Shao-Huang 陳永平 Dr. Chen Yon-Ping 電控工程研究所
Keywords:	隱性馬可夫模型;單音辨認;線性預估係數;連續性機率;改良式K-群分類演算法;維特比演算法;HMM;Isolated Word Recognition;Linear Prediction Coefficients;Continuous Density;Modified K-means Algorithm;Viterbi Algorithm
Issue Date:	1999
Abstract:	本論文之重點在於應用連續性密度的隱性馬可夫模型於少量國語單音，及0 - 9數字的辨認。我們使用線性預估的參數作為主，轉換至倒頻譜為基底的參數，並且加上前後倒頻譜向量的差作為供作語音辨識的特徵向量。在隱性馬可夫模型上，使用的是適合於單字音的"左到右式"的模型，並且在計算機率時引用連續性機率的函數.訓練時，我們使用疊代的方法更新單字音模型，模型因為疊代會使新的模型更接近這個單字音的最佳模型。在疊代中，我們使用維特比演算法找出最佳的隱藏的狀態序列，並根據這個序列上的狀態，以modified k-means 演算法分群，得到的是新一代的模型參數。在辨識中，我們選擇最相似的模型，作為輸入字的選擇。本論文中測試了不同的線性預估的參數，不同的狀態數，不同的混合狀態數，辨識的結果將可用作未來進行"聲控系統"基礎。 An isolated word speech recognizer is introduced in this thesis. This system can recognize Chinese syllables and digits. The recognition system is based on using hidden Markov model (HMM). The linear prediction coefficients (LPC) were used as the features of each speech frame, and the LPC-derived cepstral coefficients and its derivatives were the actual feature vectors used in speech recognition. In the model topology, we use left-to-right HMM as the speech model. We use continuous type of the probability density function (CDHMM). The basic clustering algorithm is the modified k-means algorithm. We implemented Viterbi algorithm in decoding the optimal state sequence both in the training and recognition procedure. The training method is the so-called "segmental K-means algorithm". The final recognition decision picks up the model that has the maximum likelihood for the input word. In the source of speech database, there are utterances made by a fixed group of six people. We made some experiments by varying LPC orders, state numbers, and mixture numbers to test the recognition accuracy. The final results can be a reference to the voice-command application in the future. Chinese Abstract…………………………………………………………….i EnglishAbstract…………………………………………………………….ii Contents……………………………………………………………………iii LIST OF TABLES………………………………………………………………vi LIST OF FIGURES……………………………………………………………vii 1 INTRODUCTION………………………………………………………………1 1.1 Motivation………………………………………………………………1 1.2 Brief History of Speech Recognition Approach…………………3 1.3 Overview…………………………………………………………………4 2 THEORIES FOR SPEECH RECOGNITION………………………………………6 2.1 Speech Signal Analysis………………………………………………7 2.1.1 Phonemes Classification…………………………………………7 2.1.2 Coarticulation………………………………………………………8 2.2 Speech Signal Modeling………………………………………………9 2.3 Theories of Hidden Markov Models………………………………12 2.3.1 Hidden Markov Models of Speech………………………………12 2.3.2 Three Basic Problems……………………………………………16 2.3.3 The Evaluation Problem…………………………………………16 2.3.3.1 The Forward Procedure…………………………………………16 2.3.4 The Decoding Problem……………………………………………18 2.3.4.1 Optimal state sequence………………………………………18 2.3.4.2 Viterbi Algorithm………………………………………………19 2.3.5 The Reestimation Problem………………………………………19 2.3.6 Speech Recognition Using HMM…………………………………20 3 ISOLATED WORD RECOGNITION SYSTEM USING HIDDEN MARKOV MODELS WITH CONTINUOUS DENSITIES………………………………21 3.1 Speech Feature Extraction…………………………………………22 3.1.1 The LPC Model………………………………………………………22 3.1.2 Solution Method……………………………………………………23 3.1.3 LPC Processor for Speech Recognition………………………26 3.1.4 LPC to Cepstrum Conversion……………………………………27 3.1.5 Temporal Cepstral Derivatives…………………………………28 3.1.6 Distance Measure for Feature Vectors………………………28 3.2 The Continuous Mixture Density HMM……………………………29 3.2.1 Elements of HMM……………………………………………………30 3.2.2 Training Algorithms for HMM……………………………………32 3.2.2.1 The Segmental K-means Algorithm……………………………33 3.2.2.2 Classification Algorithms……………………………………37 3.2.2.3 The Viterbi Algorithm…………………………………………39 3.2.3 Isolated Word Recognition………………………………………42 4 EXPERIMENT RESULTS………………………………………………………43 4.1 Speech Detection Test………………………………………………43 4.1.1 The Short-Term Features…………………………………………43 4.2 Linear Predictive Coefficients…………………………………46 4.2.1 The Choice of LPC order…………………………………………47 4.2.2 Time-Frequency View of LPC Spectra…………………………50 4.3 HMM Recognition Results……………………………………………51 4.3.1 Test of the Clustering Algorithm – MKM……………………51 4.3.2 The Effects of Different LPC Orders…………………………53 4.3.3 The Effects of The Mixture Numbers…………………………55 4.3.4 The Effects of State Numbers…………………………………57 5 CONCLUSIONS AND DISCUSSIONS…………………………………………59 5.1 Strengths and Limitations…………………………………………59 5.2 Future Works…………………………………………………………61 BIBLIOGRAPHY…………………………………………………………………63
URI:	http://140.113.39.130/cdrfb3/record/nctu/#NT880591071 http://hdl.handle.net/11536/66304
Appears in Collections:	Thesis

APA	賓., Pin, S., 陳., & Dr. C. Yon-Ping (1999). 應用隱性馬可夫模型於國語單音辨認之研究. http://hdl.handle.net/11536/66304.
Bibtex	@article{賓少煌 and Pin1999, title={應用隱性馬可夫模型於國語單音辨認之研究}, author={賓少煌 and Pin, Shao-Huang and 陳永平 and Dr. Chen Yon-Ping}, journal={http://hdl.handle.net/11536/66304}, year={1999}, url={https://ir.lib.nycu.edu.tw/handle/11536/66304}, }