標題: | 結合調色盤與轉換編碼之視窗影像壓縮 Screen Content Coding Combining Transform and Palette-based Coding |
作者: | 朱弘正 Jhu, Hung-Cheng 彭文孝 Peng, Wen-Hsiao 多媒體工程研究所 |
關鍵字: | 高效能視訊壓縮;視窗影像編碼;調色盤編碼;High Efficiency Video Coding;HEVC;Screen Content Coding;SCC;Palette-based Coding |
公開日期: | 2013 |
摘要: | 視窗影像是一種由文字、電腦圖像或自然影像等合成之複合式影像,經常包含許多銳利的高頻訊號,洽為人眼最容易觀察到的部分(如文字邊緣)。傳統以Transform及量化為主流的壓縮技術容易將頻率域係數發散於高頻區段,並在量化後對多個高頻頻段訊號產生失真,不僅無法有效壓縮,且容易對主觀視覺造成影響。為解決此問題,本篇論文提出結合調色盤編碼與轉換編碼的架構,將視窗影像以區塊為單位,分為類似自然影像特值的部分以及非自然影像特值的部分,並分別以轉換編碼及調色盤編碼。其中分類方式扮演重要角色,本篇論文嘗試四種分類方式,首先是最常見的K-means,分為k類並滿足誤差平方和最小;其次為Spectral Method,其以擷取出一群相似特性的像素值為目標;而Agglomerative Method,一開始視每個像素值為一群,藉由重複合併距離最近的兩群,直到所要群數;最後一個方法藉由像素值變化時,根據區域面積變化的程度建立Maximally Stable Extremal Regions,並依此分類。實驗於高效能視訊壓縮標準測試影像,相對於HM-10.1+RExt-3.0位元率下降量平均23.1%,其最大可達50.5%。 Screen contents, generally including text, graphics, nature-scene videos, or mixtures of them, contain signals with extremely different characteristics from nature-scene videos. The transform-based hybrid video codecs perform quite efficient in representing signals of nature-scene videos, but they have been proved ineffective to form sparse representations of mixture contents with respect to the 2-D DCT basis. In this thesis, a coding technique of screen contents, combining transform and palette-based coding, is proposed. Pixels within each mixed-content block are segmented into two groups, one of which contains pixels of nature scenes and are processed transform coding, while the other consists of graphics and text and tend to be represented by a palette with a few base colors. A fast yet efficient pixel segmentation is thus needed to effectively differentiate pixels for separate coding. Four methods are evaluated in the context. K-means is aimed at separating samples into k clusters in which each sample belongs to the cluster with the nearest mean. Spectral method begins with constructing similarity matrix which represents the similarity between samples, followed by binarizing the principal eigenvector of the similarity matrix, and then pixels with higher similarity can thus be extracted. Agglomerative method starts by regarding each samples as a cluster, and then it repeatedly merges two clusters which are judged closest together. Extremal regions are defined as connected components of pixels that are all of a either higher or lower value than the pixels on the boundary of the region. Maximally stable extremal regions are selected by the stability function defined by the extremal region’s relative change of area over an interval of intensity levels. Relative to the HM-10.1+RExt-3.0 anchor, the best of them achieves an average BD-rate saving of 23.1%, with a maximum of 50.5%. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT070056617 http://hdl.handle.net/11536/73949 |
Appears in Collections: | Thesis |