利用運動強度分析,影片片段辨識及畫面字幕偵測建構視訊影片內容結構之研究

標題:	利用運動強度分析,影片片段辨識及畫面字幕偵測建構視訊影片內容結構之研究 Motion Activity Based Shot Identification and Closed Caption Localization for Video Structuring
作者:	林淑娟 Shu-Jiuan Lin 李素瑛 Suh-Yin Lee 資訊科學與工程研究所
關鍵字:	運動強度;字幕;影片結構化;Motion Activity;Closed Caption;Video Structuring
公開日期:	2001
摘要:	在這篇論文中，針對MPEG-Ⅱ壓縮的運動類視訊影片，利用影片片段之運動強度及影片字幕之文字資訊，提出一個新方法產生視訊影片之目錄。為了能加快場景變化偵測的運算速度，先以GOP為單位一個GOP接一個GOP地檢查，當發現可能發生場景變化的GOP位置時，然後在影像畫面這一層找出真正的場景邊界。利用物體的運動強度當描述子來描述已切割好的影片片段。這個描述子是考慮影片片段中移動物體的空間－時間關係之長期一致性來計算物體的２維直方圖而得到的。利用影片片段中的運動強度特徵，我們提出影片片段辨識演算法辨認出影片片段的種類（發球片段，全場片段及特寫片段）。選擇特定的片段(即發球片段)，利用字幕偵測演算法來偵測這些片段的字幕。此外，用自我組織映射圖演算法設計一個過濾器，能夠從複雜的背景區域中區分出字幕。最後，我們建構一個運動類影片視訊系統並提供視訊影片之目錄－它是由故事單元，即發球、全場及特寫等連續的影片片段，和字幕組成的階層式架構。此外，我們進而提供可依使用者需求而動態調整視訊影片內容的樹狀結構。實驗結果顯示這個系統的有效性及影片內容之階層式架構的可行性。 In this paper, we propose a novel approach to generate the table of video content based on shot description of motion activity and textual information of closed caption in MPEG-Ⅱ sports videos. In order to speed up in scene change detection, instead of examining scene cut frame by frame, GOP-based approach first checks video streams GOP by GOP and then finds out the actual scene boundaries in the frame level. Segmented shots are described by the proposed object-based motion activity descriptor. The descriptor is computed based on the object 2D-histogram, in which long-term consistency of spatial-temporal relationship of moving objects within video shots is considered. Utilizing the characterized features of motion activity in video shots, video clips are recognized by the proposed algorithm of shot identification. Subsequently, the specific shots of interest are selected and the proposed mechanism of closed caption localization is exploited to detect captions in these shots. Moreover, the SOM (Self-Organization Map) based algorithm is designed as a filter to distinguish the superimposed closed captions from the high-textured background regions. Finally, we can construct a sports video content visualization system and provide the table of video content composed of the hierarchical structure of story units, consecutive shots and closed captions. Furthermore, we supply users with the dynamic tree structure of video content. The experimental results show the effectiveness of the proposed system and reveal the feasibility of the hierarchical structuring of video content.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#NT900392028 http://hdl.handle.net/11536/68441
Appears in Collections:	Thesis