標題: | 使用RGB-D影像評估景深估測演算法 Evaluation of Disparity Estimation Schemes using Captured RGB-D Images |
作者: | 劉哲瑋 Liu, Che-Wei 杭學鳴 Hang, Hsueh-Ming 電子工程學系 電子研究所 |
關鍵字: | 深度估測演算法;Kinect;3D影像;depth estimation;Kinect;3D video |
公開日期: | 2013 |
摘要: | 在3D影像處理的領域中,基於左右視角的景深估測法,或者稱作立體匹配演算法,被廣泛運用於許多3D的應用程式中。其中一種使用景深的資訊作為輔助,抓取鏡頭中人的動作及姿勢,不同的應用程式對於深度圖準確率的要求皆不一樣,於是,為了給予不同的應用程式適合的景深估測法,如何評測這些方法成為研究人員傷腦筋的課題。傳統的方法,使用由少量電腦合成的測試影像組成的圖庫做評測,然而,這樣的方法對於處理真實場景的應用程式來說是不足夠的。
在此篇論文中,我們設計了多個場景,並由RGB-D攝影機抓取這些影像資料,組成一個圖庫,當中包含了多組的立體影像對及其對應的真實視差圖。我們依據可能會影響立體匹配演算法表現的多個因素,將這些立體影像對分成了兩大類:影像內容類及影像品質類。影像內容類包含了背景複雜度、物品數量、不同的手勢及不同圖樣的衣服;影像品質類有不同的PSNR及不同的影像校正誤差。
此外,每組立體影像對都有它對應的真實視差圖,所有的影像皆是擷取自一對Kinect,為了產生適合圖庫的影像,我們需要對這些擷取到的左右彩色影像進行相機校正及影像校正,對於擷取到的景深圖,我們分別將它製成真實視差圖及而後評測用的trimap。來自左右Kinect的彩色影像,我們用相機校正估測其相機參數,這些參數在影像校正時會用到;我們還需作色彩校正,解決兩張影像色調不同的問題。為了使真實視差圖更加完美,我們提出了一個適應性補洞方法,其能夠自動分辨三個不同的黑洞種類,再依據種類的不同做補洞;最後,我們使用影像切割的概念,製作trimap:一張由前景部分、後景部分及前景背景間區域所組成的影像。
評估方面,我們利用trimap分出前後景,並將錯誤量測的重點擺在前景區域,使用錯誤視差值比率及均方誤差作為錯誤量測的機制。實驗方面,我們用所提出的評估方法評測了三個立體匹配演算法,並從所收集的數據做比較分析。 In 3D image processing, the depth estimation based on the given left and right images (the so-called stereo matching algorithms) has been widely used in many 3D applications. One type of applications tracks the body motion and/or poses with the aid of depth information. How to evaluate depth estimation algorithms for different applications becomes an issue. The conventional method of evaluating these depth estimation algorithms is often using a small number of test computer-generated images, which is insufficient to reflect the problems in the real world applications. In this study, we design a number of scenes and capture them using the RGB-D cameras; that is, our dataset consists of stereo pair images and their corresponding ground truth disparity map. Our dataset contains two categories of factors that may affect the performance of the stereo matching algorithms. They are image content factors and image quality factors. The image content factor group includes simple and complex backgrounds, different number of objects, different hand poses and clothing with various color patterns. In the group of image quality factor, we create images with different PSNR and rectification errors. In addition, each stereo pair has their ground truth disparity map. All images and the depth maps are captured by a pair of Kinect devices. To generate appropriate images for the test dataset, we need to calibrate and rectify the captured RGB image pairs and we also need to process the captured depth maps and create the so-called trimaps for evaluation purpose. For the left and right color images, because they come from different sensors, we must perform camera calibration to obtain the camera parameters, and color calibration to match colors in two images. Also, we align the left and right images using the existing camera rectification technique. To generate the ground truth disparity map, we first capture the raw depth map from Kinect, and we warp it from the view of the IR camera to the RGB camera. These depth maps have many black holes due to its sensing mechanism. To make the ground truth disparity map more reliable, we propose an adaptive hole-filling algorithm. Last, we adopt the matting segmentation concept to create a tri-value map (trimap) that classifies image pixels into foreground, background, and in-between regions. Our error metrics are bad-matching pixel rate and the mean square error between the ground truth disparity map and the estimated disparity map. We focus on the performance in the foreground region. In our experiments, three stereo matching algorithms are used to test our dataset and evaluation methodology. We analyze these algorithms based on the collected data. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT070050266 http://hdl.handle.net/11536/72903 |
顯示於類別: | 畢業論文 |