Multisource I-Vectors Domain Adaptation Using Maximum Mean Discrepancy Based Autoencoders

doi:10.1109/TASLP.2018.2866707

Full metadata record

DC Field	Value	Language
dc.contributor.author	Lin, Wei-wei	en_US
dc.contributor.author	Mak, Man-Wai	en_US
dc.contributor.author	Chien, Jen-Tzung	en_US
dc.date.accessioned	2019-04-02T05:58:17Z	-
dc.date.available	2019-04-02T05:58:17Z	-
dc.date.issued	2018-12-01	en_US
dc.identifier.issn	2329-9290	en_US
dc.identifier.uri	http://dx.doi.org/10.1109/TASLP.2018.2866707	en_US
dc.identifier.uri	http://hdl.handle.net/11536/148095	-
dc.description.abstract	Like many machine learning tasks, the performance of speaker verification (SV) systems degrades when training and test data come from very different distributions. What's more, both training and test data themselves could be composed of heterogeneous subsets. These multisource mismatches are detrimental to SV performance. This paper proposes incorporating maximum mean discrepancy (MMD) into the loss function of autoencoders to reduce these mismatches. MMD is a non-parametric method for measuring the distance between two probability distributions. With a properly chosen kernel, MMD can match up to infinite moments of data distributions. We generalize MMD to measure the discrepancies of multiple distributions. We call the generalized MMD domainwise MMD. Using domainwise MMD as an objective function, we propose two autoencoders, namely nuisance-attribute autoencoder (NAE) and domain-invariant autoencoder (DAE), for multisource i-vector adaptation. NAE encodes the features that cause most of the multisource mismatch measured by domainwise MMD. DAE directly encodes the features that minimize the multisource mismatch. Using these MMD-based autoencoders as a preprocessing step for PLDA training, we achieve a relative improvement of 19.2% EER on the NIST 2016 SRE compared to PLDA without adaptation. We also found that MMD-based autoencoders are more robust to unseen domains. In the domain robustness experiments, MMD-based autoencoders show 6.8% and 5.2% improvements over IDVC on female and male Cantonese speakers, respectively.	en_US
dc.language.iso	en_US	en_US
dc.subject	Speaker verification	en_US
dc.subject	domain adaptation	en_US
dc.subject	i-vectors	en_US
dc.subject	maximum mean discrepancy	en_US
dc.title	Multisource I-Vectors Domain Adaptation Using Maximum Mean Discrepancy Based Autoencoders	en_US
dc.type	Article	en_US
dc.identifier.doi	10.1109/TASLP.2018.2866707	en_US
dc.identifier.journal	IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING	en_US
dc.citation.volume	26	en_US
dc.citation.spage	2412	en_US
dc.citation.epage	2422	en_US
dc.contributor.department	電機工程學系	zh_TW
dc.contributor.department	Department of Electrical and Computer Engineering	en_US
dc.identifier.wosnumber	WOS:000443761500003	en_US
dc.citation.woscount	1	en_US
Appears in Collections:	Articles