完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.author | Chien, Jen-Tzung | en_US |
dc.contributor.author | Peng, Kang-Ting | en_US |
dc.date.accessioned | 2019-09-02T07:46:17Z | - |
dc.date.available | 2019-09-02T07:46:17Z | - |
dc.date.issued | 2019-11-01 | en_US |
dc.identifier.issn | 0885-2308 | en_US |
dc.identifier.uri | http://dx.doi.org/10.1016/j.csl.2019.06.003 | en_US |
dc.identifier.uri | http://hdl.handle.net/11536/152673 | - |
dc.description.abstract | This paper presents the adversarial learning approaches to deal with various tasks in speaker recognition based on probabilistic discriminant analysis (PLDA) which is seen as a latent variable model for reconstruction of i-vectors. The first task aims to reduce the dimension of i-vectors based on an adversarial manifold learning where the adversarial neural networks of generator and discriminator are merged to preserve neighbor embedding of i-vectors in a low-dimensional space. The generator is trained to fool the discriminator with the generated samples in latent space. A PLDA subspace model is constructed by jointly minimizing a PLDA reconstruction error, a manifold loss for neighbor embedding and an adversarial loss caused by the generator and discriminator. The second task of adversarial learning is developed to tackle the imbalanced data problem. A PLDA based generative adversarial network is trained to generate new i-vectors to balance the size of training utterances across different speakers. An adversarial augmentation learning is proposed for robust speaker recognition. In particular, the minimax optimization is performed to estimate a generator and a discriminator where the class conditional i-vectors produced by generator could not be distinguished from real i-vectors via discriminator. A multiobjective learning is realized for a specialized neural model with the cosine similarity between real and fake i-vectors as well as the regularization for Gaussianity. Experiments are conducted to show the merit of adversarial learning in subspace construction and data augmentation for PLDA-based speaker recognition. (C) 2019 Elsevier Ltd. All rights reserved. | en_US |
dc.language.iso | en_US | en_US |
dc.subject | Probabilistic linear discriminant analysis | en_US |
dc.subject | Adversarial learning | en_US |
dc.subject | Manifold learning | en_US |
dc.subject | Data augmentation | en_US |
dc.subject | Speaker recognition | en_US |
dc.title | Neural adversarial learning for speaker recognition | en_US |
dc.type | Article | en_US |
dc.identifier.doi | 10.1016/j.csl.2019.06.003 | en_US |
dc.identifier.journal | COMPUTER SPEECH AND LANGUAGE | en_US |
dc.citation.volume | 58 | en_US |
dc.citation.spage | 422 | en_US |
dc.citation.epage | 440 | en_US |
dc.contributor.department | 電機工程學系 | zh_TW |
dc.contributor.department | Department of Electrical and Computer Engineering | en_US |
dc.identifier.wosnumber | WOS:000477663800023 | en_US |
dc.citation.woscount | 0 | en_US |
顯示於類別: | 期刊論文 |