Full metadata record
DC FieldValueLanguage
dc.contributor.authorLin, Jen-Chunen_US
dc.contributor.authorWei, Wen-Lien_US
dc.contributor.authorYang, Jamesen_US
dc.contributor.authorWang, Hsin-Minen_US
dc.contributor.authorLiao, Hong-Yuan Marken_US
dc.date.accessioned2019-10-05T00:09:42Z-
dc.date.available2019-10-05T00:09:42Z-
dc.date.issued2017-01-01en_US
dc.identifier.isbn978-1-4503-4906-2en_US
dc.identifier.urihttp://dx.doi.org/10.1145/3123266.3123399en_US
dc.identifier.urihttp://hdl.handle.net/11536/152906-
dc.description.abstractAn automated process that can suggest a soundtrack to a user-generated video (UGV) and make the UGV a music-compliant professional-like video is challenging but desirable. To this end, this paper presents an automatic music video (MV) generation system that conducts soundtrack recommendation and video editing simultaneously. Given a long UGV, it is first divided into a sequence of fixed-length short (e.g., 2 seconds) segments, and then a multi-task deep neural network (MDNN) is applied to predict the pseudo acoustic (music) features (or called the pseudo song) from the visual (video) features of each video segment. In this way, the distance between any pair of video and music segments of same length can be computed in the music feature space. Second, the sequence of pseudo acoustic (music) features of the UGV and the sequence of the acoustic (music) features of each music track in the music collection are temporarily aligned by the dynamic time warping (DTW) algorithm with a pseudosong-based deep similarity matching (PDSM) metric. Third, for each music track, the video editing module selects and concatenates the segments of the UGV based on the target and concatenation costs given by a pseudo-song-based deep concatenation cost (PDCC) metric according to the DTW-aligned result to generate a music-compliant professional-like video. Finally, all the generated MVs are ranked, and the best MV is recommended to the user. The MDNN for pseudo song prediction and the PDSM and PDCC metrics are trained by an annotated official music video (OMV) corpus. The results of objective and subjective experiments demonstrate that the proposed system performs well and can generate appealing MVs with better viewing and listening experiences.en_US
dc.language.isoen_USen_US
dc.subjectAutomatic music video generationen_US
dc.subjectcross-modal media retrievalen_US
dc.subjectdeep neural networksen_US
dc.titleAutomatic Music Video Generation Based on Simultaneous Soundtrack Recommendation and Video Editingen_US
dc.typeProceedings Paperen_US
dc.identifier.doi10.1145/3123266.3123399en_US
dc.identifier.journalPROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17)en_US
dc.citation.spage519en_US
dc.citation.epage527en_US
dc.contributor.department交大名義發表zh_TW
dc.contributor.departmentNational Chiao Tung Universityen_US
dc.identifier.wosnumberWOS:000482109500061en_US
dc.citation.woscount0en_US
Appears in Collections:Conferences Paper