Spectro-temporal modulation energy based mask for robust speaker identification

Chi, Tai-Shih; Lin, Ting-Han; Hsu, Chung-Chien

Spectro-temporal modulation energy based mask for robust speaker identification

Files

000303601600003.pdf (201.86 KB)

Date

2012-05-01

Authors

Chi, Tai-Shih

Lin, Ting-Han

Hsu, Chung-Chien

Abstract

Spectro-temporal modulations of speech encode speech structures and speaker characteristics. An algorithm which distinguishes speech from non-speech based on spectro-temporal modulation energies is proposed and evaluated in robust text-independent closed-set speaker identification simulations using the TIMIT and GRID corpora. Simulation results show the proposed method produces much higher speaker identification rates in all signal-to-noise ratio (SNR) conditions than the baseline system using mel-frequency cepstral coefficients. In addition, the proposed method also outperforms the system, which uses auditory-based nonnegative tensor cepstral coefficients [Q. Wu and L. Zhang, "Auditory sparse representation for robust speaker recognition based on tensor structure," EURASIP J. Audio, Speech, Music Process. 2008, 578612 (2008)], in low SNR (<= 10 dB) conditions. (C) 2012 Acoustical Society of America

URI

https://ir.lib.nycu.edu.tw/handle/11536/16329

Collections

期刊論文;;Articles

Full item page

Spectro-temporal modulation energy based mask for robust speaker identification

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

DOI

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By