Local Wavelet Acoustic Pattern: A Novel Time-Frequency Descriptor for Birdsong Recognition

doi:10.1109/TMM.2018.2834866

Full metadata record

DC Field	Value	Language
dc.contributor.author	Hsu, Sheng-Bin	en_US
dc.contributor.author	Lee, Chang-Hsing	en_US
dc.contributor.author	Chang, Pei-Chun	en_US
dc.contributor.author	Han, Chin-Chuan	en_US
dc.contributor.author	Fan, Kuo-Chin	en_US
dc.date.accessioned	2019-04-02T05:59:13Z	-
dc.date.available	2019-04-02T05:59:13Z	-
dc.date.issued	2018-12-01	en_US
dc.identifier.issn	1520-9210	en_US
dc.identifier.uri	http://dx.doi.org/10.1109/TMM.2018.2834866	en_US
dc.identifier.uri	http://hdl.handle.net/11536/148467	-
dc.description.abstract	Investigating the identity, distribution, and evolution of bird species is important for both biodiversity assessment and environmental conservation. The discrete wavelet transform (DWT) has been widely exploited to extract time-frequency features for acoustic signal analysis. Traditional approaches usually compute statistical measures (e.g., maximum, mean, standard deviation) of the DWT coefficients in each subband independently to yield the feature descriptor, without considering the intersubband correlation. A new acoustic descriptor, called the local wavelet acoustic pattern (LWAP), is proposed to characterize the correlation of the DWT coefficients in different subbands for birdsong recognition. First, we divide a variable-length birdsong segment into a number of fixed-duration texture windows. For each texture window, several LWAP descriptors are extracted. The vector of locally aggregated descriptors (VLAD) is then used to aggregate the set of LWAP descriptors into a single VLAD vector. Finally, principal component analysis (PCA) plus linear discriminant analysis (LDA) are employed to reduce the feature dimensionality for classification purposes. Experiments on two birdsong datasets show that the proposed LWAP descriptor outperforms other local descriptors, including linear predictive coding cepstral coefficients, Mel-frequency cepstral coefficients, perceptual linear prediction cepstral coefficients, chroma features, and prosody features. Furthermore, the proposed LWAP descriptor, followed by VLAD encoding, PCA plus LDA feature extraction, and a simple distance-based classifier, yields promising results that are competitive with those obtained by the state-of-the-art convolutional neural networks.	en_US
dc.language.iso	en_US	en_US
dc.subject	Birdsong recognition	en_US
dc.subject	discrete wavelet transform (DWT)	en_US
dc.subject	vector of locally aggregated descriptors (VLAD)	en_US
dc.title	Local Wavelet Acoustic Pattern: A Novel Time-Frequency Descriptor for Birdsong Recognition	en_US
dc.type	Article	en_US
dc.identifier.doi	10.1109/TMM.2018.2834866	en_US
dc.identifier.journal	IEEE TRANSACTIONS ON MULTIMEDIA	en_US
dc.citation.volume	20	en_US
dc.citation.spage	3187	en_US
dc.citation.epage	3199	en_US
dc.contributor.department	資訊工程學系	zh_TW
dc.contributor.department	Department of Computer Science	en_US
dc.identifier.wosnumber	WOS:000450212600001	en_US
dc.citation.woscount	0	en_US
Appears in Collections:	Articles