Automatic Music Video Generation Based on Simultaneous Soundtrack Recommendation and Video Editing

doi:10.1145/3123266.3123399

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.author	Lin, Jen-Chun	en_US
dc.contributor.author	Wei, Wen-Li	en_US
dc.contributor.author	Yang, James	en_US
dc.contributor.author	Wang, Hsin-Min	en_US
dc.contributor.author	Liao, Hong-Yuan Mark	en_US
dc.date.accessioned	2019-10-05T00:09:42Z	-
dc.date.available	2019-10-05T00:09:42Z	-
dc.date.issued	2017-01-01	en_US
dc.identifier.isbn	978-1-4503-4906-2	en_US
dc.identifier.uri	http://dx.doi.org/10.1145/3123266.3123399	en_US
dc.identifier.uri	http://hdl.handle.net/11536/152906	-
dc.description.abstract	An automated process that can suggest a soundtrack to a user-generated video (UGV) and make the UGV a music-compliant professional-like video is challenging but desirable. To this end, this paper presents an automatic music video (MV) generation system that conducts soundtrack recommendation and video editing simultaneously. Given a long UGV, it is first divided into a sequence of fixed-length short (e.g., 2 seconds) segments, and then a multi-task deep neural network (MDNN) is applied to predict the pseudo acoustic (music) features (or called the pseudo song) from the visual (video) features of each video segment. In this way, the distance between any pair of video and music segments of same length can be computed in the music feature space. Second, the sequence of pseudo acoustic (music) features of the UGV and the sequence of the acoustic (music) features of each music track in the music collection are temporarily aligned by the dynamic time warping (DTW) algorithm with a pseudosong-based deep similarity matching (PDSM) metric. Third, for each music track, the video editing module selects and concatenates the segments of the UGV based on the target and concatenation costs given by a pseudo-song-based deep concatenation cost (PDCC) metric according to the DTW-aligned result to generate a music-compliant professional-like video. Finally, all the generated MVs are ranked, and the best MV is recommended to the user. The MDNN for pseudo song prediction and the PDSM and PDCC metrics are trained by an annotated official music video (OMV) corpus. The results of objective and subjective experiments demonstrate that the proposed system performs well and can generate appealing MVs with better viewing and listening experiences.	en_US
dc.language.iso	en_US	en_US
dc.subject	Automatic music video generation	en_US
dc.subject	cross-modal media retrieval	en_US
dc.subject	deep neural networks	en_US
dc.title	Automatic Music Video Generation Based on Simultaneous Soundtrack Recommendation and Video Editing	en_US
dc.type	Proceedings Paper	en_US
dc.identifier.doi	10.1145/3123266.3123399	en_US
dc.identifier.journal	PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17)	en_US
dc.citation.spage	519	en_US
dc.citation.epage	527	en_US
dc.contributor.department	交大名義發表	zh_TW
dc.contributor.department	National Chiao Tung University	en_US
dc.identifier.wosnumber	WOS:000482109500061	en_US
dc.citation.woscount	0	en_US
顯示於類別：	會議論文