標題: 具階層式暫存器組及封裝指令之嵌入式數位訊號處理器設計
An Embedded Digital Signal Processor Design with Hierarchical Register File & Packed Instructions
作者: 李承家
Chen-Chia Lee
劉志尉
Chih-Wei Liu
電子研究所
關鍵字: 超長指令字元;暫存器組;數位訊號處理器;指令集;VLIW;Register File;DSP;Instruction set
公開日期: 2003
摘要: 現今多媒體系統的運算需求越來越龐大,處理器已無法單靠提升時脈速度來提供相對的運算能力。目前可行的解決方案為增加平行運作的功能模組,並同時提高指令的發出(issue)率。傳統集中式暫存器組的複雜度(不論是速度、矽面積或功率消耗)將會隨著處理器整合大量的功能模組而急遽成長。本論文提出了一適於數位訊號處理之乒乓暫存器組架構,其將集中式暫存器組分割成數個區塊,並使用暫存器置換技巧達成資料的互換,可大幅降低暫存器組的硬體複雜度。另外,為降低硬體之複雜度並擁有可預測的執行時間,我們選定採用靜態指令排程的超長指令字元(VLIW)處理器為我們此多指令發出率(multi-issue)之數位訊號處理器的基本架構,但相對地必須付出指令密度偏低的代價。本論文針對此問題提出了階層式的指令編碼方式及其有效之解碼器架構,以改善超長指令字元處理器程式碼過大的問題。模擬及實作的結果顯示,我們所提出的乒乓暫存器組可使用相當的時脈數完成多數的訊號處理演算法,但它省下91.46%的晶片面積、並縮短68.57%的資料存取時間;另外,我們所提出的階層式指令編碼可降低70.8%~75.8%的程式記憶體使用。最後,我們完成了一個數位訊號處理器雛型機–PicaCHIP (Packed Instruction and Cluster Architecture)的指令集架構設計及模擬,同時也完成此處理器的晶片實作並經由CIC下線。在UMC 1P6M 0.18um 的CMOS製程下,其最高的工作的頻率為185MHz,而包含16KB指令記憶體及16KB資料記憶體之晶片面積為3.23mm× 3.23mm。
Wireless and multimedia applications nowadays demand more and more computing power and the microprocessors can no longer improve their performance accordingly solely by increasing the clock speed. A common solution is to integrate more concurrent functional units (FU) in the processor and increase the instruction issue width. The complexity of the conventional centralized register files (RF) (i.e. the access time, the silicon area and the power consumption) grows dramatically as the number of FUs in a processor. This thesis presents a novel ping-pong RF, which partitions the centralized RF into several sub-blocks and uses register permutation for data communication, to significantly reduce the hardware complexity. Moreover, we have adopted the Very Long Instruction Word (VLIW) architecture with static instruction scheduling and deterministic execution times in our design of the multi-issue digital signal processors (DSP) to reduce the hardware complexity. But VLIW has a serious problem in its poor code density, and this thesis also presents a hierarchical instruction encoding method and its effective decoding architecture to reduce the code size. The simulation results show that the proposed ping-pong RF has comparable performance for popular DSP algorithms, but it saves 91.46% silicon area and reduces 68.57% access time of the centralized RF. Moreover, the proposed hierarchical instruction encoding saves 70.8% ~ 75.8% program memory. Finally, we have designed the instruction set architecture (ISA) and completed the simulations of a DSP prototype – PicaCHIP (Packed Instruction and Cluster Architecture). Besides, we have implemented the DSP core in the UMC 1P6M 0.18um CMOS technology, and taped out it through CIC. The chip size is 3.23mm×3.23mm including 16KB instruction and 16KB data memory. The chip can operate at 185MHz with the average power consumption of 102mW.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009111614
http://hdl.handle.net/11536/43791
顯示於類別:畢業論文