標題: 超長指令集架構之可變長度編碼及解碼架構
Variable-Length VLIW Encoding & its Decoding Architectures
作者: 劉佳憲
Chia-Hsien Liu
劉志尉
Chih-Wei Liu
電子研究所
關鍵字: 超長指令集;指令編碼;數位信號處理器;解碼架構;VLIW;instruction encoding;DSP;decoding architecture
公開日期: 2004
摘要: 高效能數位信號處理器普遍採用超長指令集架構,由於它有較為簡單的硬體和可預測的執行時間。但是超長指令集架構需要較多的程式記憶體,因為 (1)採用固定長度的指令編碼, (2)在有限制的指令平行度下必須插入NOP, (3)迴圈攤平(loop unrolling)造成重複的程式碼。本篇論文提出一套創新的指令編碼技巧來解決這三個問題同時改進超長指令集架構的程式密度。首先,每一個指令依照所需要運算元的個數和它出現的頻率去做可變長度的編碼。在同一個週期執行的有效指令再加上一個‘CAP’包成一個超長指令封包,而NOP指令並不編碼。對於類似和重複的程式碼,此套技巧提供SIMD模式和差分編碼(differential encoding) 來移除多餘的程式碼。在我們的模擬結果,提出的指令編碼技巧可以節省68% - 70%的程式大小。再者,為了簡化可變長度超長指令封包的記憶體存取,‘CAP’和 ‘head-tail’放置在固定長度指令捆(instruction bundle)的兩端。然而,這樣會造成一些多餘的位元而降低編碼的效率。指令捆大小必須在硬體複雜度和程式壓縮比率之間作取捨。從我們的實驗數據,512-bit的指令捆長度是最佳的解決方案。它節省了65% - 67%的程式大小且需要10%的處理器面積。最後,這個指令編碼技巧已經被採用在PicaCHIP上。這個雛形是一個4-way 超長指令集架構的數位信號處理器,採用0.13微米聯電CMOS製程,最高的工作頻率為333MHz。
VLIW-based architectures are very popular in high-performance DSP processors, for their relatively simpler implementations and more predictable execution times. But they need more program memory because of (1) the fixed-length instruction encoding, (2) NOP insertion due to limited parallelism, and (3) repetitive codes for loop unrolling. This thesis presents a novel instruction encoding scheme to address these three problems and improve the VLIW code density. Each instruction is first variable-length encoded depending on the number of required operands and its occurrence frequency. The effective instructions issued concurrently are grouped with a ‘CAP’, where no NOP instruction is encoded. For similar and repeated codes, SIMD and differential modes are supported to remove the redundancy. In our simulations, the proposed instruction encoding can save 68% - 70% code sizes. Moreover, in order to simplify the memory accesses of the variable-length VLIW packets, caps and head-tails are bundled from two ends of the fixed-length bundle. However, bundling may introduce some overhead bits which degrade the encoding efficiency. The bundle size is the trade-off between hardware complexity and code compression ratio. From our experimental results, 512-bit bundle is an optimal solution. It reduces 65% - 67% code sizes and consumes 10% area of the processor core. Finally, this proposed encoding scheme has been used for PicaCHIP. The prototype is a 4-way VLIW DSP processor implemented in the 0.13um UMC CMOS technology with its operating frequency at 333MHz.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009211606
http://hdl.handle.net/11536/66802
顯示於類別:畢業論文