標題: CNN BASED TWO-STAGE MULTI-RESOLUTION END-TO-END MODEL FOR SINGING MELODY EXTRACTION
作者: Chen, Ming-Tso
Li, Bo-Jun
Chi, Tai-Shih
電機工程學系
Department of Electrical and Computer Engineering
關鍵字: Melody extraction;multi-resolution;convolution neural network;end-to-end learning;music information retrieval
公開日期: 1-Jan-2019
摘要: Inspired by human hearing perception, we propose a two-stage multi-resolution end-to-end model for singing melody extraction in this paper. The convolutional neural network (CNN) is the core of the proposed model to generate multi-resolution representations. The 1-D and 2-D multi-resolution analysis on waveform and spectrogram-like graph are successively carried out by using 1-D and 2-D CNN kernels of different lengths and sizes. The 1-D CNNs with kernels of different lengths produce multi-resolution spectrogram-like graphs without suffering from the trade-off between spectral and temporal resolutions. The 2-D CNNs with kernels of different sizes extract features from spectro-temporal envelopes of different scales. Experiment results show the proposed model outperforms three compared systems in three out of five public databases.
URI: http://hdl.handle.net/11536/152923
ISBN: 978-1-4799-8131-1
ISSN: 1520-6149
期刊: 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
起始頁: 1005
結束頁: 1009
Appears in Collections:Conferences Paper