標題: 唯一可解一對一編碼之分析與實作
Analysis and Practice of Uniquely Decodable One-to-One Code
作者: 劉勁甫
Liu, Chin-Fu
陳伯寧
陸曉峯
Chen, Po-Ning
Lu, Hsiao-feng (Francis)
電信工程研究所
關鍵字: 一對一碼;資料壓縮;data compression;one-to-one code;source coding;uniquely decodable
公開日期: 2012
摘要: 在這篇論文中,我們提出可藉由在連續的一對一字碼(One-to-One Code)之間安插特定字元(Unique Word)來分開一對一字碼,使得連續的一對一字碼符合唯一可解(Uniquely Decodable)性質,藉此產生一種新的編碼方式,我們稱為「唯一可解一對一碼」(Uniquely Decodable One-to-One Code)。我們接著分析唯一可解一對一碼的組合特性與壓縮能力,並由此推導出給定長度n的唯一可解一對一碼的個數公式。經由此公式,我們即可計算給定統計特性之信息源的唯一可解一對一碼的最佳平均碼長。我們另外推導出三個平均碼長的上限公式,與兩個平均碼長的極限值的上限公式,據此,我們證明當特定字元(Unique Word)的長度與信息源字元的整合個數皆趨近無限大時,唯一可解一對一碼的平均碼長的極限值可以達到信息源的理論極限熵值(Entropy)。論文接著提供在不同條件下的快速編碼與快速解碼的演算法。我們的數值實驗結果顯示,相對於赫夫曼碼(Huffman Code),在相同的解碼複雜度的前提下,唯一可解一對一碼具有可與其相比的低壓縮率。相對於藍柏吉夫碼(Lembel-Ziv Code),唯一可解一對一碼則具有較佳的平均碼長,此數值結果說明了唯一可解一對一碼在實際應用的可行性。
In this thesis, we consider the so-called uniquely decodable one-to-one code (UDOOC) that is formed by inserting a “comma” indicator, termed the unique word (UW), between consecutive one-to-one codewords for separation. Along this research direction, we first investigate the general combinatorial properties of UDOOCs, in particular the enumeration of UDOOC codewords for any (finite) codeword length. Based on the obtained formula on the number of length-n codewords for a given UW, the average codeword length of the optimal UDOOC for a given source statistics can be computed. Upper bounds on the average codeword length of UDOOCs are next established. The analysis on bounds of the average codeword length then leads to two asymptotic bounds on ultimate per-letter average codeword length, one of which is achievable and hence tight for a certain source statistics and UW, and the other of which proves the achievability of source entropy rate of UDOOCs when both the block size of source letters for UDOOC compression and UW length go to infinity. Efficient encoding and decoding algorithms for UDOOCs are subsequently given. Numerical results show that when grouping three English letters as a block, the UDOOCs with UW = 0001, 0000, 000001 and 000000 can respectively reach the compression rates of 3.531, 4.089, 4.115 and 4.709 bits per English letter (with the lengths of UWs included), where the source stream to be compressed is the book titled Alice’s Adventures in Wonderland. In comparison with the first-order Huffman code, the second-order Huffman code, the third-order Huffman code and the Lembel-Ziv code, which respectively achieve the compression of 3.940, 3.585, 3.226 and 6.028 bits per single English letter, the proposed UDOOCs can potentially in comparable compression rate to the Huffman code under similar decoding complexity and yield a better average codeword length than that of the Lempel-Ziv code, thereby confirming the practicability of the simple idea of separating OOC codewords by UWs.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT070060201
http://hdl.handle.net/11536/71799
顯示於類別:畢業論文