完整後設資料紀錄
DC 欄位語言
dc.contributor.authorChen, Ming-Tsoen_US
dc.contributor.authorLi, Bo-Junen_US
dc.contributor.authorChi, Tai-Shihen_US
dc.date.accessioned2019-10-05T00:09:44Z-
dc.date.available2019-10-05T00:09:44Z-
dc.date.issued2019-01-01en_US
dc.identifier.isbn978-1-4799-8131-1en_US
dc.identifier.issn1520-6149en_US
dc.identifier.urihttp://hdl.handle.net/11536/152923-
dc.description.abstractInspired by human hearing perception, we propose a two-stage multi-resolution end-to-end model for singing melody extraction in this paper. The convolutional neural network (CNN) is the core of the proposed model to generate multi-resolution representations. The 1-D and 2-D multi-resolution analysis on waveform and spectrogram-like graph are successively carried out by using 1-D and 2-D CNN kernels of different lengths and sizes. The 1-D CNNs with kernels of different lengths produce multi-resolution spectrogram-like graphs without suffering from the trade-off between spectral and temporal resolutions. The 2-D CNNs with kernels of different sizes extract features from spectro-temporal envelopes of different scales. Experiment results show the proposed model outperforms three compared systems in three out of five public databases.en_US
dc.language.isoen_USen_US
dc.subjectMelody extractionen_US
dc.subjectmulti-resolutionen_US
dc.subjectconvolution neural networken_US
dc.subjectend-to-end learningen_US
dc.subjectmusic information retrievalen_US
dc.titleCNN BASED TWO-STAGE MULTI-RESOLUTION END-TO-END MODEL FOR SINGING MELODY EXTRACTIONen_US
dc.typeProceedings Paperen_US
dc.identifier.journal2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)en_US
dc.citation.spage1005en_US
dc.citation.epage1009en_US
dc.contributor.department電機工程學系zh_TW
dc.contributor.departmentDepartment of Electrical and Computer Engineeringen_US
dc.identifier.wosnumberWOS:000482554001047en_US
dc.citation.woscount0en_US
顯示於類別:會議論文