標題: 結合語音與唇形識別技術研製互動式華語學習系統
The Prompt of Lip Shape Modification of Cacology Based on the Speech Evaluation Techniques---A Case of Basic Chinese Learning
作者: 謝奇文
Hsieh Chi-Wen
國立交通大學電子物理學系(所)
關鍵字: 互動式華語學習系統;語音評分;脣形修正;類神經網路
公開日期: 2008
摘要: 本計畫中,我們將研製以語音訊號與唇形影像做為互動式的華語學習系統。在語音 評分方面利用線性預估分析與倒頻譜分析分別來取出線性預估係數(LPC)、線頻譜對係 數(LSP)與梅爾倒頻譜參數(MFCC)三個參數來表示語音聲紋,此外,也取出基頻軌跡 (Pitch Contour)與能量曲線(Energy Curve)分別用來表示語音聲調與強度部分。而唇形影 像方面將利用區域成長、型態學、GRB 向量空間分割與橢圓曲線擬合等方法,來取出唇 形的高度與寬度當作影像參數。之後利用動態時間扭曲(DTW)演算方式算出標準語音與 其它語音在LPC、LSP、MFCC、Pitch Contour、Energy curve 的差異量並配合模糊理論 (Fuzzy Theory)、輻射半徑基底函數網路(RBFNN)與機率神經網路(PNN)來訂出一套判定 學習者學習程度的法則,同時我們也將學習者的唇形影像與標準唇形影像利用DTW 演 算方式求出兩者唇形在高度與寬度的差異程度,用來提醒使用者需改進的地方達到學習 互動上的最佳性。 在先期模擬結果中發現,3 種語音聲紋參數LPC、LSP、MFCC,以MFCC 來分辨 語音優劣的正確率約為84%為最佳。如果再加上Pitch Contour 與Energy Curve 則分辨語 音優劣的正確率將能明顯地再提升,其中以MFCC、Pitch Contour、Energy Curve 為參 數並利用DTW 配合PNN 的方式對語音優劣的程度的辨別為最佳,其正確率可達90%。 我們也利用了ROC Curve 對華語學習中唇形建議方法評估其可行性。
An interactive Chinese learning assisted system is proposed. The system is based on the speech identification technique and the lip movement modification. A test database of synchronous speech signals and images of lip shape had been supported by the Chinese learning experts. During the learning process, the system first plays a demo speech and video, then acquires the learner’s repeat speech and video sequence of mouth, then analyzes and evaluates the utterance of the learner, and indicates to the user the correct way of lip movement and utterance and prompt for repeated practice if the evaluation is graded poorly. The linear prediction coefficient (LPC), line spectrum pair (LSP) and mel-scale cepstrum (MFCC) were examined as the parameters of voiceprint in speech identification processes. Besides, the pitch contour and energy curve were adopted as the parameter of tone and magnitude of speech signals, respectively. On the other hand, the height and width of lip shape were used as the parameters of the lip shape analysis. In the scoring stage of speech utterances, the dynamic time warping (DTW) algorithm combined with Fuzzy theory, radial basis function (RBFNN) and probabilistic neural network (PNN) techniques were applied to determining whether the test speech was qualified or not during Chinese learning processes. The DTW comparison of standard database with unqualified speech signal was introduced to quantitatively prompt the lip shape modification to users. In simulation, we found that the MFCC is the best voiceprint parameter of the three voiceprint parameters and the correct rate achieved 84% by using MFCC parameters with DTW processing and PNN classification. We also found that the hybrid of MFCC, pitch contour, and energy curve parameters of speech signal could slightly promote the accuracy of classification-- could be achieved up to 90%. Finally, the receiver operating characteristic curve (ROC) curve was introduced to quantitatively evaluate the sensitivity and specificity of the performance of the proposed algorithm.
官方說明文件#: NSC97-2218-E415-002
URI: http://hdl.handle.net/11536/102141
https://www.grb.gov.tw/search/planDetail?id=1667190&docId=286364
顯示於類別:研究計畫