完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.author | Li, Na | en_US |
dc.contributor.author | Mak, Man-Wai | en_US |
dc.contributor.author | Lin, Wei-Wei | en_US |
dc.contributor.author | Chien, Jen-Tzung | en_US |
dc.date.accessioned | 2018-08-21T05:54:11Z | - |
dc.date.available | 2018-08-21T05:54:11Z | - |
dc.date.issued | 2017-09-01 | en_US |
dc.identifier.issn | 0885-2308 | en_US |
dc.identifier.uri | http://dx.doi.org/10.1016/j.csl.2017.04.001 | en_US |
dc.identifier.uri | http://hdl.handle.net/11536/145646 | - |
dc.description.abstract | Although i-vectors together with probabilistic LDA (PLDA) have achieved a great success in speaker verification, how to suppress the undesirable effects caused by the variability in utterance length and background noise level is still a challenge. This paper aims to improve the robustness of i-vector based speaker verification systems by compensating for the utterance-length variability and noise-level variability. Inspired by the recent findings that noise-level variability can be modeled by a signal-to-noise ratio (SNR) subspace and that duration variability can be modeled as additive noise in the i-vector space, we propose to add an SNR factor and a duration factor to the PLDA model. In this framework, we assume that i-vectors derived from utterances with comparable durations share similar duration-specific information and that i-vectors extracted from utterances within. a narrow SNR range have similar SNR-specific information. Based on these assumptions, an i-vector can be represented as a linear combination of four components: speaker, SNR, duration, and channel. A variational Bayes algorithm is developed to infer this latent variable model via a discriminative subspace training procedure. In the testing stage, different variabilities are compensated for when computing the likelihood ratio. Experiments on Common Conditions 1 and 4 in MST 2012 SRE show that the proposed model outperforms the conventional PLDA and SNR-invariant PLDA. Results also show that the proposed model performs better than the uncertainty-propagation PLDA (UP-PLDA) for long test utterances. (C) 2017 Elsevier Ltd. All rights reserved. | en_US |
dc.language.iso | en_US | en_US |
dc.subject | Speaker verification | en_US |
dc.subject | Duration variation | en_US |
dc.subject | SNR mismatch | en_US |
dc.subject | Variational Bayes | en_US |
dc.subject | I-vector | en_US |
dc.subject | PLDA | en_US |
dc.title | Discriminative subspace modeling of SNR and duration variabilities for robust speaker verification | en_US |
dc.type | Article | en_US |
dc.identifier.doi | 10.1016/j.csl.2017.04.001 | en_US |
dc.identifier.journal | COMPUTER SPEECH AND LANGUAGE | en_US |
dc.citation.volume | 45 | en_US |
dc.citation.spage | 83 | en_US |
dc.citation.epage | 103 | en_US |
dc.contributor.department | 電機工程學系 | zh_TW |
dc.contributor.department | Department of Electrical and Computer Engineering | en_US |
dc.identifier.wosnumber | WOS:000403510500005 | en_US |
顯示於類別: | 期刊論文 |