标题: | 基于听觉语言学与模糊类神经网路之英文母音辨识技术 Speaker-Independent English Vowel Recognition Technique Based on Acoustic-Phonetics and Fuzzy Neural Networks |
作者: | 洪英士 Ying-Shih Hung 周志成 林进灯 Chi-Cheng Jou Chin-Teng Lin 电控工程研究所 |
关键字: | 语音辨识;频谱分析;语者不相关;听觉语言学;模糊类神经网路;speech recognition;spectrum analysis;speaker-independent;acoustic-phonetic;fuzzy neural network |
公开日期: | 2003 |
摘要: | 在本论文中,我们提出一新的语者不相关的英文母音辨识技术。首先,我们提出一组名为“听学增强型-离散余弦序列系数(AE-DCSC)”的新特征。此特征的想法是将许多听学语言学上有关英文母音的研究成果实现在频谱的强化上,让其更具有代表性与差异化。其中,频谱正规化(Spectrum-Level-Normalization)用以平衡不同共振峰的高度差异。根据语言学的研究,共振峰的位置比其高度来的重要。谐音的强化(Enhancement of Spectral Peaks)则能有效的压抑介于谐音间频谱微小的变化,使其更具强健性。为了能在有限的特征维度里有效地保留母音频谱随时间的变化情形,我们采用了离散余弦序列系数这项技术。此技术具有可改变的频率与时间的弯曲比例,这让我们能根据讯号的特性,找出最具有代表性的特征。而在本系统中,我们采用一前向式自我建构类神经模糊推理网路(SONFIN)做为核心辨识器。利用其可自我建构并调整的架构与参数学习功能,与优异的模糊类神经推论过程,来达到较佳之辨识效果。最后,我们提出一基于语言学特征的确认程序。针对较为混淆的辨识结果,撷取其在听学语言学上的特征,并与我们事先建立的知识库理的模型比对。以找出最可信的辨识结果。实验证明,在TIMIT的资料库下,此系统的辨识率可达74.75%,优于其他在文献上所见的结果。这说明了我们在此所提出的辨识系统所具有的潜力与优越性。 In this thesis, we proposed a novel speaker-independent English vowel recognition technique based on acoustic-phonetics and fuzzy neural networks. At first, we proposed a new feature set called as “AE-DCSC”. It was derived from the researches of acoustic-phonetics and implemented here to enhance the spectrum so that the features became more representative and discriminative. The technique spectrum-level-normalization was used to balance the amplitude difference between formants. Moreover, the enhancement of spectral peaks was used to suppress the variation of valley between harmonics. These processes let the spectrum more robust and noise-free. In order to preserve the temporal cues of vowels, the technique DCSC was used. The flexible time/frequency warping scales were adjusted according to properties of signals. An on-line self-constructing neural fuzzy inference network (SONFIN) was adopted as the main classifier in this system. SONFIN found its optimal structure and parameters automatically and achieved the better classification result via superior inference process. Finally an acoustic-checking procedure was proposed. We applied it to the ambiguous case in which the acoustic characteristics was evaluated and compared with the model in our knowledge-base database. The proposed approach resulted in an accuracy rate of 74.75% in TIMIT database, which higher than other published result for the same task. The potential and effectiveness of the proposed system was verified. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT009112533 http://hdl.handle.net/11536/44868 |
显示于类别: | Thesis |
文件中的档案:
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.