標題: | 空間相關性在影視訊壓縮標準之應用 空間相關性在影視訊壓縮標準之應用 |
作者: | 王俊能 Chung-Neng Wang 劉啟民 蔣迪豪 Chi-Min Liu Tihao Chiang 資訊科學與工程研究所 |
關鍵字: | 影像壓縮標準;視訊壓縮標準;N個皇后圖樣;快速動態偵測法;視覺性擾動技術;二重轉換技術;MPEG;H.264;Image Coding Standards;Video Coding Standards;N-Queen lattice;Fast Motion Estimation;Perceptual Dithering;Double Transformation;MPEG;H.264 |
公開日期: | 2002 |
摘要: | 在影視訊的畫面中,小範圍內的資料數值大小之間存在相似性,意指這些資料存在空間上相關性。這空間上相關性可應用於改進,如MPEGx、H.26x、JPEG和JPEG2000等,現有的影視訊壓縮標準之效能,以使其效能最大化。因此,本論文研究與探討如何應用空間上相關性到現有的影視訊壓縮標準,以使壓縮標準的編碼效能最大化並且加速區塊配對的處理速度之目的。
在利用空間上相關性來使壓縮效能最大化方面,基於頻譜係數之間的相似圖樣(亦即這些圖樣之間存在一些編碼的多餘性),我們提出兩個新的多餘性去除的方法—統計上方法與視覺上方法。視覺上方法稱之為視覺性擾動技術,所謂的視覺性擾動(Perceptual Dithering)技術是利用人眼感官模型來去除影像資料之間編碼的多餘性。我們已證明可以利用視覺性擾動技術將原始影像轉換成另一個在肉眼上看不出差異的影像;但是轉換後的影像資料在統計上的相關性已被增加。這個增加的統計相關性可以直接被用來提升影視訊號之壓縮率。在壓縮標準中應用此視覺性擾動技術的方法,我們稱之為視覺性擾動編碼技術。本論文也舉證此視覺性擾動編碼技術在理論上與實際上對影視訊號壓縮效能有實質的改進。至於統計上方法則稱之為二重轉換(Double Transformation)技術,所謂的二重轉換技術是將影像資料經過兩次的空間到頻率之轉換,以善用空間相關性來提高壓縮效能但不需傳輸多餘的解碼資訊。我們發現只用一次轉換,對於具有紋路與物件輪廓的影像,其壓縮效能較差。其原因是影像中一小區域的資料其能量被分散在不同的高頻頻帶係數中;這樣的能量分散減低Entropy編碼模組之效能。因此,二重轉換技術可以藉由去除空間相關性並且將一區域內的資料之能量集中在較少的頻帶係數中。將輸入資料的能量集中在少數頻帶係數有助於提升Entropy編碼器的效能以改善整體編碼器之壓縮效能。經由在MPEG-4參照軟體平台上實際評量的結果,在品質相同的情況下相較於MPEG-4 Visual Texture Coding所需之位元流長度,視覺性擾動壓縮技術與二重轉換技術所能節省的位元流長度皆大於10%。
在利用空間上相關性來加速區塊配對的處理速度方面,我們藉由從每一個區塊中找到其中一部分最具代表性的像素點來作區塊差異的計算,以達到加速配對速度的目的。為了找到最具代表性的像素點,我們提出一個新的評量準則。這個評量準則已被應用於現有的像素點取樣法的效能評量。評量結果的分析,也證明了這個新準則的正確性。藉由這個準則,我們也提出一個新的所謂的「N個皇后像素點取樣法」。這個N個皇后像素點取樣法可留存區塊中之像素點在所有方向上最多的的空間特性。同時,這個新的像素點取樣法可以經由多層次的組合方式來產生適用於矩形區塊的區塊配對之取樣點。雖這個N個皇后像素點取樣法所產生的是隨機的取樣點,利用所提出的資料重組與儲存方法,仍可以很有效率地對這些取樣點作記憶體存取,並且可以用簡單硬體設計來實現基於這個取樣法的快速區塊配對演算法。同時,我們也已經在Single Instruction Multiple Data (SIMD)處理器上實現基於4個皇后的取樣法之快速動態偵測模組來驗證其硬體實現的容易性。最後,經由實際地在MPEG-4參照軟體上模擬,其結果皆證實N個皇后方法能夠在可忽略的視訊品質(用Peak-Signal-to-Noise-Ratio (PSNR)值評量)下降之下,相對於使用全部像素點做區塊比對之動態偵測演算法有N倍的加速;因此,N個皇后方法之效能優於其他現存的像素點取樣法。 The neighboring image/video source information is similar in magnitude, which means that there exists strong inter-element spatial correlation of the source material. The spatial correlation is exploited to maximize coding efficiency of the existing image/video standards including MPEG-1/2/4, H.261/2/3/4, JPEG, and JPEG2000. The spatial correlation is investigated in this dissertation to further improve the coding efficiency and to speed up the derivation of a minimal discrepant data for each block within the input data. For maximal coding efficiency, the spatial correlation is removed by mapping the input data into spectral coefficients that are less correlated in statistical characteristics. This dissertation investigates the spectral coefficients with visually similar patterns, which indicate redundancy. To exploit such redundancy reduction, we propose novel approaches by taking the spatial correlation into account in both statistical and perceptual aspects, respectively. We show that the original image can be perceptually dithered to form a visually equivalent image with increased interband correlation that can be used to achieve higher compression. To remove the redundancy among the sibling subbands of an image, we provide a novel perceptual dithering coding (PDC) that is based on entropy reduction technique with psychovisual effects. The theoretical basis for the entropy reduction is proven by a theorem for Gaussian distribution. Our results show that there exists perceptual interband redundancy even though the original interband correlation is statistically small. Moreover, for images with significant texture and sharp contrast, the energy of the data in a local area is spread over spectral coefficients. A double transformation coding (DTC) approach is used to remove the correlation between the spectral coefficients and compact the energy to a smaller number of coefficients. For performance verification, the PDC and DTC have been evaluated on the reference software of MPEG-4 coding standard. For a perceptually transparent image quality, the PDC can achieve bit savings over MPEG-4 Visual Texture Coding (VTC) by 11~34% while maintaining compatibility with the MPEG-4 standards. For the same quality in Peak-Signal-to-Noise-Ratio (PSNR), we find that the DTC achieves bit savings over MPEG-4 VTC by about 11.28~15.71% for the finer quantizers and around 0~14.65% for the coarser quantizers. In addition, the spatial correlation is adopted to speed up the derivation of a minimal discrepant block for each block. The speedup is based on the use of a representative subset of the data in calculating the distortion measure. To spatially sub-sample a block of pixels, we presented a novel N-Queen lattice. Although this lattice is pertinent to many applications, we present an application to speed up motion search with minimal loss of coding efficiency. The N-Queen lattice is constructed to characterize spatial features in all directions. It can be hierarchically organized for motion estimation with variable non-square block size. Despite the randomized lattice structure, we demonstrate that it’s possible to achieve compact data storage architecture for efficient memory access and simple hardware implementation. Our simulations show that the N-Queen lattice is superior to several existing sampling techniques with improvement in speed by about N times and small loss in PSNR. Based on the N-Queen lattice, we design compact data storage architecture for efficient memory access and simple hardware implementation. The 4-Queen lattice is adopted to explain the architecture implementation based on the N-Queen lattice in this dissertation. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#NT910392006 http://hdl.handle.net/11536/70076 |
Appears in Collections: | Thesis |