標題: | 使用景深感應器與RGB相機對用於影像合成的景深改善 Depth Refinement for View Synthesis using Depth Sensor and RGB Camera |
作者: | 邱義文 Chiou, Yi-Wen 杭學鳴 蔡彰哲 Hang, Hsueh-Ming Tsai, Jang-Jer 電子研究所 |
關鍵字: | 景深照相機;影像合成;校正;景深改善;Kinect;View Synthesis;Calibration;Depth Map Refinement;Alignment |
公開日期: | 2011 |
摘要: | 近年來,自從電影阿凡達(Avatar)在2009年上映後,立體視訊就成為一股新的風潮,不管是電視、電影甚至是手機螢幕都有立體效果的產品出現。立體影像顯影技術中,以視點合成技術最為關鍵。現今視點合成的技術大多以既有的2維影像加上對應的景深資訊,來產生虛擬視點的影像。所以景深資訊在影像合成中扮演重要的角色。
一般常見取得景深資訊的方法,是以兩張二維不同視角的二維影像,用立體匹配(Stereo Matching)的方法,找出兩張影像中對應的方塊,再設法估測出景深影像,我們稱其為「被動式景深估測」。此方法在特徵點較少的區域,常估測出錯誤的景深資訊。並且因為視角的不同,造成遮蔽(Occlusion)區域的出現,而遮蔽區域在合成過程中會有破洞(Hole)現象的產生。
本論文中,為改善被動式景深估測的缺點,我們使用主動式景深感應器-Kinect取得景深資訊。主動式景深影像在缺少特徵點的區域,可以估測出較準確的景深資訊,且其執行速度較快,可以做即時(Real-Time)的處理。我們使用兩台Kinect感應器來做為左右相機,利用其實現影像合成。考慮感應器在3維空間的資訊,及景深影像的邊緣雜訊與缺陷(Defects),我們提出一套演算法來改進,最後利用改善後的景深資訊與彩色影像來做合成,利用合成後的影像檢測其結果。
在「校正處理」的步驟,利用彩色影像之間的資訊,來估測出兩個感應器之間的3維空間幾何關係,並且考慮兩感應器接收到的光源特性不同,會針對其色彩差異做調整。考慮景深感應器與彩色相機位於不同視角,為得到互相匹配的景深影像與彩色影像,在「校準處理」會考慮兩種影像間的不匹配,除利用Kinect SDK的函數外,我們會針對其不足,做水平方向與垂直方向的誤差考慮,最後讓這兩種相機投影在相同的視角上。考慮Kinect擷取到景深影像上的雜訊及缺陷問題,利用聯合雙向濾波器(Joint Bilateral Filter)濾除雜訊以及補「洞」,及參考彩色影像的資訊來修復景深影像上的缺陷。最後的實驗結果中,除主動式景深資訊會較被動式景深資訊來的正確外,改善後的景深影像,也較原始的景深影像佳。以及合成結果中,可以看出改善前的合成影像,與改善後的合成影像相比較,可以看到改善後的合成影像可以得到較好的表現。 In recent years, three-dimension (3D) video has been a trend after the very popular science fiction movie – Avatar produced in the 2009. Many 3D movies, TV sets and even mobile phone sets have been developed. The view synthesis technology is an essential element in a 3D video system. The current technology adopted by the international MPEG committee on 3DVC standard is to generate a virtual viewpoint scene by using the received 2D views and their associated depth information. Therefore, the depth information plays an important role in 3D view synthesis. In general, we can estimate the depth information by using two 2D texture images of different viewpoints using various depth estimation methods, and this approach is called Passive Depth Estimation. Very often, this approach fails to provide accurate depth information on the textureless regions. Furthermore, the occlusion regions which always exist due to 2 cameras, often lead to the “hole” defects on the synthesized views. In this study, we adopt an active sensor – Kinect to capture the depth information. This active sensor provides pretty accurate depth information on the textureless regions, and it can operate in real-time. To generate a new view or virtual view, we use a pair of Kinect sensors as the left and right cameras. The Kinect depth sensor can be treated as another camera and thus we employ and improve some conventional techniques to estimate its camera parameters, calibrate its images (depth maps) and reduce artifacts. In the calibration step, we use the information between two texture images to estimate the 3D geometry relationship between two Kinect sensors. Furthermore, because the depth sensor and color camera are located at two different positions, we propose an “Alignment” procedure to match the coordinates of the depth image and the texture image. In designing and implementing our alignment procedure, we use a disparity model and the Kinect SDK functions. Finally, we use the Joint Bilateral Filter and color information to reduce noises and defects on the sensor-acquired depth map. Comparing to the depth map estimated by using the MPEG Depth Estimation Reference Software (DERS), the captured and processed depth map clearly provide more accurate depth information. At the end, we examine and compare the synthesized images using the original and the refined depth maps. The quality of the refined synthesized image is noticed improved. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT079911634 http://hdl.handle.net/11536/49161 |
顯示於類別: | 畢業論文 |