標題: | Spectro-temporal modulation energy based mask for robust speaker identification |
作者: | Chi, Tai-Shih Lin, Ting-Han Hsu, Chung-Chien 電機工程學系 Department of Electrical and Computer Engineering |
公開日期: | 1-May-2012 |
摘要: | Spectro-temporal modulations of speech encode speech structures and speaker characteristics. An algorithm which distinguishes speech from non-speech based on spectro-temporal modulation energies is proposed and evaluated in robust text-independent closed-set speaker identification simulations using the TIMIT and GRID corpora. Simulation results show the proposed method produces much higher speaker identification rates in all signal-to-noise ratio (SNR) conditions than the baseline system using mel-frequency cepstral coefficients. In addition, the proposed method also outperforms the system, which uses auditory-based nonnegative tensor cepstral coefficients [Q. Wu and L. Zhang, "Auditory sparse representation for robust speaker recognition based on tensor structure," EURASIP J. Audio, Speech, Music Process. 2008, 578612 (2008)], in low SNR (<= 10 dB) conditions. (C) 2012 Acoustical Society of America |
URI: | http://hdl.handle.net/11536/16329 |
ISSN: | 0001-4966 |
期刊: | JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA |
Volume: | 131 |
Issue: | 5 |
結束頁: | EL368 |
Appears in Collections: | Articles |
Files in This Item:
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.