標題: 設計適用於嵌入式處理器之具能量知覺的功能單位
Design of Energy-Aware Function Units for Embedded Processors
作者: 劉晉宏
Chin-Hung Liu
劉志尉
Chih-Wei Liu
電子研究所
關鍵字: 時脈閘門;輸出量;能量知覺;資料路徑;功能單位;clock gating;throughput;energy awareness;datapath;function unit
公開日期: 2004
摘要: 時脈閘門是目前電路設計常使用的技術,它藉由交集(ANDing)時脈訊號與閘門控制訊號來關掉目前系統不需要使用的電路,因此它能有效的減少時脈功率消耗。雖然時脈閘門技術能有效的減少時脈功率在管線暫存器、管線閂鎖器或者是動態邏輯電路上的消耗,然而當系統的輸出量(throughput)變慢時,時脈閘門技術仍然會浪費多餘的功率於更新有效的輸入資料。為了進一步減少多餘的時脈轉換,在本篇論文中提出了兩種on-demand pipelining的實現方法,分別為以暫存器實現的技術和以閂鎖器實現的技術。這兩種實現方法可視為一個即時的、精細的、可重新配置的管線架構,即管線級數能週期地動態調整成所需的級數,因此它只溢散所需的能量在同步元件上。相較於以暫存器實現on-demand pipelining的架構,以閂鎖器所實現的架構能進一步改善多工器在面積和功率上的多餘負擔,此外閂鎖器的閘級數只需暫存器的一半,因此它省了近一半的所需同步元件的閘級數。在UMC 0.18μm CMOS製程下所做的實驗結果,是以2D-DCT為運算核心其輸入的運算資料是512x512灰階的影像(Lena),數據顯示以暫存器所實現的架構能減少41%的能量溢散於一般的管線架構中和5%的能量溢散於僅使用時脈閘門的架構中。同樣地,以閂鎖器所實現的架構能在一般的管線架構和僅使用時脈閘門的架構中能更進一步的減少分別48%和16%的能量溢散。此外,使用我們所提出的技巧並不會因工作模式的不同而改變其輸入至輸出間的延遲,此特性能有效地降低系統整合上的複雜度。在相同的工作下,當所允許的運算時間變長時,本篇論文所提的架構會隨著運算時間的增長而有效的減少在一般管線架構中所需的能量溢散。也就是說,此架構在會動態調整輸出量(throughput)的應用中會有最好的能量知覺(Energy awareness)。
Clock gating is a well-known technique to effectively reduce the clock power. By ANDing the clock with a gate-control signal, clock gating essentially disables the clock to a circuit whenever the circuit is not used. Specifically, clock gating targets the clock power consumed in pipeline latches/registers and dynamic circuits used for speed and area advantages over static logic. However, it still inefficiently latches the valid input data cycle by cycle when a pipelined datapath does not reach its peak throughput. To further reduce, even to eliminate, all redundant clock transitions, two on-demand pipelining schemes, register-based and latch-based techniques, are proposed in this thesis. Both two techniques can be considered as a real-time, fine-grain, reconfigurable pipeline architecture. That is, the pipeline stages can be smoothly reconfigured cycle by cycle in run-time and, consequently, it consumes only the necessary energy in synchronization elements. Latch-based on-demand pipelining architecture can further reduce the multiplexer overhead and, hence, it saves half gate counts of synchronization elements, comparing that with register-based one. The experiment results performing 2D-DCT on a 512x512 grayscale image Lena (in UMC 0.18μm CMOS technology) show that our proposed register-based on-demand pipelining can save up to 57% energy dissipation on conventional pipeline and 5% of those with gating clocks only. In the same case, latch-based on-demand pipelining can further improve up to 61% energy dissipation on conventional pipelines and 13% of those with gating clocks only. Moreover, our schemes have identical input-to-output latency for all operation modes, which effectively simplifies the system integration. We should note here that, as the allowed computation time increases for a given task, the conventional clock gating consumes the constant energy dissipation, while our proposed on-demand pipelining consumes less and less energy dissipation. In other words, the proposed schemes have better energy awareness over varying throughput requirements.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009211622
http://hdl.handle.net/11536/66979
Appears in Collections:Thesis