標題: 強健式的視覺追蹤
Robust Visual Tracking
作者: 唐政元
Cheng-Yuan Tang
陳稔
洪一平
Zen Chen
Yi-Ping Hung
資訊科學與工程研究所
關鍵字: 視覺追蹤;強健式估測;運動分割;人頭偵測與追蹤;光流估測;基於模型追蹤;Visual Tracking;Robust Estimation;Motion Segmentation;Detection and Tracking of Human Heads;Optical Flow Estimation;Model-Based Tracking
公開日期: 1998
摘要: 視覺追蹤是電腦視覺研究領域中相當重要的研究課題。在視覺追蹤中,常因為對應錯誤或估測誤差而使得在所觀測的資料中常常存在有不少的錯誤資料(outliers)。因此強健式的(robust)估測器遂被採用來抗拒此類的錯誤資料。在本論文中,第二章將介紹並比較在電腦視覺中常被採用的兩種強健式估測器,LMedS和RANSAC,此類的強健式的估測器將在後續幾章中被使用到。 在第三、四章所敘述的方法是屬於一般目的之研究。第三章是使用兩階段視覺運動估測的方法來估測運動位移量。其中,影像元素部分是使用模版比對(template matching)來估測,而次影像元素(subpixel)部分是使用抗雜訊的LMedS-differential方法來估測。在第四章裡,我們使用剛體一致性的原理,提出一個基於RANSAC的分群法來做運動分割與估測。這個新的特徵分群演算法提供一個系統化的方法來管理多個運動物體的分離、結合、出現與消失。因為使用基於RANSAC的方法所找出的運動估測當作是卡式濾波器(Kalman filter)的估測值,所以可以使用線性的卡式濾波器取代常被使用的非線性的卡式濾波器來作預測視覺追蹤。 在第五、六、七章所敘述的方法是屬於特殊目的之研究。在第五章,提出一個非常快速的最大可能性(maximum likelihood)人頭偵測器在複雜的背景下找出人頭。在Sparc 20的機器上,偵測一張512x512影像上的人頭位置與大小,只需要約0.02秒(不包含取像時間)。這個ML人頭偵測器的偵測結果並不易受照明、雜訊、人頭的大小與旋轉所影響。在第六章,我們提出一個方法來估測人頭的方位。一開始,根據在第五章所偵測到的人頭,自動地找出人臉上的特徵。接著,追蹤人臉的特徵,如眼睛、鼻孔、嘴巴等,再以這些特徵的三維位置結合RANSAC估測其人頭旋轉方位。第七章提出一個基於模型的追蹤方法來追蹤影像序列中的已知3D剛性物體。此方法是結合P3P-ICP與LMedS算出一系列影像中之物體相對於攝影機的三維位置,並將其應用在加入式虛擬實境(augmented reality)上。 第八章則是最後的總結與未來的研究方向。
Visual tracking is an important research field in computer vision. Due to the errors caused by measurements or mismatch in visual tracking, there usually exist outliers in the data observed. Hence, the robust estimation is often applied to resist the outliers. In this dissertation, two commonly-used robust estimators, LMedS and RANSAC, are introduced and some comparisons between these two estimators are given in Chapter 2. The research with general purpose is described in Chapters 3 and 4. A two-stage approach including pixel-level stage and subpixel stage for robust estimation of visual motion is proposed in Chapter 3. The pixel-level component is mainly estimated by using the template matching and the subpixel component is estimated by using the LMedS-differential approach.. In chapter 4, we proposed a RANSAC-based clustering method for motion segmentation and estimation using the principle of rigid body consensus. This new method leads to a feature-clustering algorithm which provides a systematic method for managing splitting, merging, appearance and disappearance of multiple moving rigid objects. Using the motion estimates obtained with the RANSAC-based method as the measurements for the Kalman filters (KF), we are able to use linear KFs for predictive visual tracking instead of commonly-used extended Kalman filters (EKF). The research with special purpose is described in Chapters 5, 6 and 7. In Chapter 5, we proposed a fast Maximum Likelihood (ML) head detector to locate human head in images having complex background. The execution time for detecting human heads in a 512x512 image is about 0.02 seconds in a Sparc 20 workstation (not including the time for image acquisition). Our ML head detector is insensitive to illumination, noise, scale and rotation of human heads. In Chapter 6 presents a method for head orientation estimation. Based on the head detected in Chapter 5, facial features are automatically extracted in the first image. After that, those facial features are tracked, and used for estimating the head orientation by using RANSAC. Chapter 7 presents a robust and efficient model-based tracking method for tracking known 3D rigid objects in an image sequence. This method combines LMedS and P3P-ICP to track the objects robustly and reliably. The tracking results are successfully applied in the application of augmented reality. Finally, the summary and future research will be given in Chapter 8.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT870392102
http://hdl.handle.net/11536/64130
Appears in Collections:Thesis