標題: 視訊資料庫中內容擷取之研究
A Study on Content-Based Retrieval for Video Databases
作者: 鄭卜壬
Pu-Jien Cheng
楊維邦
Wei-Pang Yang
資訊科學與工程研究所
關鍵字: 視訊資料庫;內容擷取;視訊索引;MPEG7;相似查詢;video database;content-based retrieval;video indexing;MPEG7;similarity retrieval
公開日期: 2000
摘要: 視訊資料庫系統不只負責管理大量的視訊資料,更需提供使用者有關內容擷取的功能,內容擷取的方法已成為視訊資料庫中最重要的研究課題之一,近年來相關的研究很多,然而仍有許多重要問題值得進一步的探討。首先,過去的研究主要是針對特定的應用環境設計,缺乏一個適用於各種系統需求的模型架構;其次,過去的研究缺乏有效的方法以整合視訊資料之高階語意內容與低階的視覺特徵;第三,過去的研究忽略了視訊內容特徵的組合運算,以表現視訊特徵在空間與時間上的豐富排列;第四,過去的研究未充份利用視訊壓縮的特徵,直接在視訊壓縮的格式下查詢相似影像以增加執行效率;第五,目前仍沒有系統化的評量方法,以檢視視訊存取方法的效能。 本論文中,我們提出一個半自動的內容擷取方法以符合上述需求,其包含一個兩層的資料模型(語意推理模型與視覺特徵組合模型)分別描述視訊資料中的語意內容與視覺特徵內容,基於現存的場景轉換偵測及物件追蹤方法,此模型提供8種操作運算以協助使用者描述不同的語意階層與物件組合,並設計一個具有高描述能力與低計算複雜度的描述語言,配合領域知識與索引結構,我們提出查詢處理的演算法則,包含語意查詢、時間查詢、相似查詢、乏晰查詢與混合式查詢,同時提供結構化的語意查詢方式與相似性的視覺特徵範例查詢方式。此外,為增進壓縮視訊的擷取效能,我們提出過濾與預測兩種法則,不同於傳統索引技術,此方法適用於線上視訊擷取的需求環境,如即時廣播、保全監視系統與視訊會議。過濾與預測的方法已應用於MPEG視訊中場景偵測、全影像範例查詢與部份影像範例查詢。 經過各種實驗檢測,結果顯示我們提出的內容擷取方法(資料模型與查詢處理)具有較高的召回值與精確值,過濾與預測的方法亦大幅減少壓縮視訊查詢所需的執行時間。
A video database system has the property that it must not only manage a collection of video data, but also provide content-based access to users. One of the most important topics in video database systems is to support content-based retrieval, on which numerous researches have been done in recent years. However, some critical issues are left for further researches. First, the systems proposed in previous work are basically suitable only for some applications. A system, which is adaptive to various system environments, is required. Second, high-level semantic concepts should be integrated into low-level audio-visual features to associate what users desire with what systems can deliver. Third, video features should be manipulated and queried efficiently. Fourth, in order to obviate the need to decompress the video data, it is efficient to search and index video data in the compressed form. Fifth, a formal computational model of simulating user-defined preference relation among video data is required to judge retrieval performance of the systems. In this dissertation, a semi-automatic content-based video retrieval system, which meets the requirements mentioned above, is proposed. The system provides a two-layered conceptual model with two major components, semantic inference model and visual aggregation model, for describing semantic and visual contents of video data, respectively. Based on available automatic scene segmentation and object tracking algorithms, the proposed model supports eight operations to manipulate the metadata at various levels of semantic abstraction and object granularity. An annotation language is designed to describe scenarios in video data and can be efficiently analyzed. With the assistance of domain knowledge and index organizations, this investigation also develops algorithms to efficiently process five types of familiar video queries: semantic query, temporal query, similar query, fuzzy query, and hybrid query. In addition, query-by-example and an SQL-like query language for video retrieval are provided. We also propose two novel methods, filtering and prediction, to improve the performance of video retrieval by example images in the compressed domain. In contrast to conventional off-line image and video indexing techniques, these methods are focused on efficient on-line video retrieval that can be applicable to the applications with time constraint such as live broadcast, surveillance and videoconference. Three algorithms are developed according to the proposed methods including scene change detection, full image query and partial image query for MPEG streams. Without decompression of video archive, the filtering and prediction approaches aim to prevent superfluous computations from filtering dissimilar images and predicting similar images, respectively. The filtering approach employs DCT features while the prediction approach macroblock and motion vector information. Through a series of simulation experiments, the system is shown to perform better than previously proposed models and matching algorithms in maximizing recall and precision. And filtering and precision methods can significantly improve computational performance.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT890394014
http://hdl.handle.net/11536/66914
Appears in Collections:Thesis