標題: | CNN BASED TWO-STAGE MULTI-RESOLUTION END-TO-END MODEL FOR SINGING MELODY EXTRACTION |
作者: | Chen, Ming-Tso Li, Bo-Jun Chi, Tai-Shih 電機工程學系 Department of Electrical and Computer Engineering |
關鍵字: | Melody extraction;multi-resolution;convolution neural network;end-to-end learning;music information retrieval |
公開日期: | 1-Jan-2019 |
摘要: | Inspired by human hearing perception, we propose a two-stage multi-resolution end-to-end model for singing melody extraction in this paper. The convolutional neural network (CNN) is the core of the proposed model to generate multi-resolution representations. The 1-D and 2-D multi-resolution analysis on waveform and spectrogram-like graph are successively carried out by using 1-D and 2-D CNN kernels of different lengths and sizes. The 1-D CNNs with kernels of different lengths produce multi-resolution spectrogram-like graphs without suffering from the trade-off between spectral and temporal resolutions. The 2-D CNNs with kernels of different sizes extract features from spectro-temporal envelopes of different scales. Experiment results show the proposed model outperforms three compared systems in three out of five public databases. |
URI: | http://hdl.handle.net/11536/152923 |
ISBN: | 978-1-4799-8131-1 |
ISSN: | 1520-6149 |
期刊: | 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) |
起始頁: | 1005 |
結束頁: | 1009 |
Appears in Collections: | Conferences Paper |