標題: 應用於立體視訊之景深資訊編碼
Depth Coding for 3D Video
作者: 楊凱翔
Yang, Kai-Hsiang
杭學鳴
Hang, Hsueh-Ming
電子研究所
關鍵字: 景深編碼;立體視訊;Depth Coding;3D Video
公開日期: 2011
摘要: 近年來,電影以及電視都朝提供消費者立體視訊的方向持續前進,立體視訊相關的研究也在最近十年來如雨後春筍般的快速發展。現今的趨勢是利用既有視角的影像以及相對應的景深資訊來合成虛擬視點上的影像。因此,影像的景深資訊在立體視訊相關研究中扮演著不可或缺的角色。如何有效地將深度資訊壓縮並傳送至解碼端藉以產生立體影像的確為立體視訊相關研究裡最重要的課題之一。 景深圖是一張灰階圖,用來表示影像上每個畫素所對應到的景深值。景深圖能夠以現有的壓縮標準來進行壓縮,像是H.264/AVC。然而,景深圖的特性並不同於一般的影像,其組成主要是大範圍的平滑區域以及變動強烈的交界處。而H.264/AVC (JM 18.0) 是利用離散餘弦轉換的方式來進行影像壓縮。此種壓縮方式對於像素值變化較劇烈的區域容易造成編碼失真,因此在合成立體影像時便很有可能在物體的邊緣上產生人造的不自然現象。不幸的是,對於觀看者而言影像中物體邊緣的破損是最容易被觀察到的。此外,在景深資訊編碼相關的研究中,大部分的碼率(bitrate)都將消耗在物體的邊界處。因此,如何在可接受的碼率範圍內有效地針對物體邊緣進行編碼並藉以提升景深圖的品質是景深資訊編碼領域裡的一大挑戰。 在此篇論文中,我們提出了一套新的演算法來針對景深資訊編碼,使得合成端可以得到更加適當的景深圖。此演算法是先利用H.264/AVC來壓縮整張景深圖,再利用我們所提出的編碼方式來傳送物體邊緣較複雜的景深資訊,藉以改善經過壓縮過的景深圖。我們提出的旁路資訊編碼方式首先針對景深圖中前景物體邊緣上的景深資訊進行萃取的動作。接著我們提出128個模型來模型化這些萃取出來的景深資訊並且將其傳送至解碼端以進行景深圖加強的動作。根據提出的演算法,合成端將會得到一張經由H.264/AVC (JM 18.0) 壓縮過的景深圖以及經由我們提出的編碼方式傳送過來的旁路資訊。我們將利用這些旁路資訊來改善壓縮過後的景深圖藉以得到較好的合成影像品質。在實驗中,有3種功能會各自被使用或交錯組合執行,共有4種可能的模式會被測試。實驗結果顯示無論在編碼效率或是合成出來的虛擬視點影像品質都能夠有一定程度的改善。某些模式針對較簡單的測試影像表現較佳而某些模式對較複雜的測試影像表現較佳。在適當的參數設定下,虛擬視點合成影像的整體品質會有顯著的改善。
In the recent years, more and more three-dimension (3D) movies and video devices have been developed. The current trend is to synthesize one or more virtual views based on the received or recorded views and their associated depth information. Consequently, the depth information plays an essential role in the virtual-view (or free-viewpoint) 3D video systems. How to transmit the depth information efficiently is one of the most important issues in 3D video coding. The depth map may be treated as a gray-level image and thus can be compressed using the conventional video coding schemes such as the H.264/AVC standard. However, the depth map has a unique feature that it mainly consists of a few large objects, which are made of nearly flat planes and high contrast boundaries. The Discrete Cosine Transform (DCT) based coding scheme adopted by the standard often blurs the object boundaries, which lead to noticeable artifacts around object boundaries in the synthesized image. It was also observed that a high percentage of bits are needed to encode the depth information around the object boundaries. How to encode the depth information particularly near object boundaries efficiently and with good quality becomes a challenge. In this thesis, we propose a new algorithm to code a depth map that is appropriate for virtual view synthesis. The idea is to use H.264/AVC to represent the global shape (including depth values) of a depth map and then the additional information is transmitted to improve the depth values around object boundaries. We first propose a method to extract the boundaries of foreground objects and their dominant depth values. Then, we propose 128 3x3 block patterns to approximate the extracted object boundaries and their associated depth values. The goal of this coded block pattern representation is to improve the synthesized image quality at a virtual viewpoint. Both the H.264/AVC compressed depth map and the side information containing object boundary enhancement are sent to the receiver. To transmit the depth map information, we also design a binary syntax that represents the coded block patterns. The complete encoding and decoding simulation system was built based on the H.264/AVC JM 18.0 software. In our experiments, 3 tools are turned on separately or with different combinations and thus four coding modes are possible and tested. Our collected data show that these proposed tools offer advantages in either coding efficiency or image quality improvement and some tools work best on simple images while the others work best on complex images. With proper parameter setting, the overall quality of virtual view rendering is noticeably improved.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079811537
http://hdl.handle.net/11536/46717
顯示於類別:畢業論文