完整後設資料紀錄
DC 欄位語言
dc.contributor.authorChou, Chien-Lien_US
dc.contributor.authorChen, Hua-Tsungen_US
dc.contributor.authorLee, Suh-Yinen_US
dc.date.accessioned2017-04-21T06:56:36Z-
dc.date.available2017-04-21T06:56:36Z-
dc.date.issued2017-02en_US
dc.identifier.issn1520-9210en_US
dc.identifier.urihttp://dx.doi.org/10.1109/TMM.2016.2614426en_US
dc.identifier.urihttp://hdl.handle.net/11536/133182-
dc.description.abstractTraditional video annotation approaches focus on annotating keyframes/shots or whole videos with semantic keywords. However, the extraction processes of keyframes/shots might lack semantic meanings, and it is hard to use a few keywords to describe the content of a long video with multiple topics. In this work, near- scenes, which contain similar concepts, topics, or semantic meanings, are designed for better video content understanding and annotation. We propose a novel framework of hierarchical video-to-near-scene annotation not only to preserve but also to purify the semantic meanings of near- scenes. To detect near-scenes, a pattern- based prefix tree is first constructed to fast retrieve near- duplicate videos. Then, the videos containing similar near- duplicate segments and similar keywords are clustered with consideration of multimodal features including visual and textual features. To enhance the precision of near-scene detection, a pattern-to-intensity- mark (PIM) method is proposed to perform precise frame- level near- duplicate segment alignment. For each near- scene, a video-to-concept distribution model is designed to analyze the representativeness of keywords and discriminations of clusters by the proposed potential term frequency and inverse document frequency and entropy. Tags are ranked according to video-to-concept distribution scores, and the tags with the highest scores are propagated to near-scenes detected. Extensive experiments demonstrate that the proposed PIM outperforms state-of-the-art approaches compared in terms of quality segments and quality frames for near-scene detection. Furthermore, the proposed framework of hierarchical video- to-near- scene annotation can achieve high quality of near-scene annotation in terms of mean average precision.en_US
dc.language.isoen_USen_US
dc.subjectNear-duplicate segment alignmenten_US
dc.subjectnear-duplicate video retrievalen_US
dc.subjectnear-scene detectionen_US
dc.subjectnear-scene annotationen_US
dc.subjectvideo annotationen_US
dc.titleMultimodal Video-to-Near-Scene Annotationen_US
dc.identifier.doi10.1109/TMM.2016.2614426en_US
dc.identifier.journalIEEE TRANSACTIONS ON MULTIMEDIAen_US
dc.citation.volume19en_US
dc.citation.issue2en_US
dc.citation.spage354en_US
dc.citation.epage366en_US
dc.contributor.department交大名義發表zh_TW
dc.contributor.department資訊工程學系zh_TW
dc.contributor.departmentNational Chiao Tung Universityen_US
dc.contributor.departmentDepartment of Computer Scienceen_US
dc.identifier.wosnumberWOS:000395795800011en_US
顯示於類別:期刊論文