標題: | 適用於數位訊號處理應用之多級運算引擎設計 Design of Multi-Stage Computing Engine for DSP Applications |
作者: | 陳志清 Chen, Chih-Ching 劉志尉 Liu, Chih-Wei 電子工程學系 電子研究所 |
關鍵字: | 多級運算引擎;串流介面單元;Multi-stage Computing Engine;Stream Interface Unit |
公開日期: | 2013 |
摘要: | 在處理器的發展史中,簡化指令集處理器(RISC)在好幾年前就已經成為設計主流,它有著簡單且規律的資料路徑,我們可以更容易利用管線化(pipeline)來提高處理器的效能。但因為每分派一個指令,只能執行一個基本運算(primitive operation),會導致硬體使用率不高。隨後多指令分發(multi-issue)處理器被提出,利用指令層級平行度(instruction level parallelism)來提高硬體使用率,但它的暫存器檔案面積會隨著運算單元增加而大幅度成長,因而付出太多的成本。近年來,特定應用指令集處理器(ASIP)被大量使用,以客製化的順序串接多個運算單元,在一個指令分派時可以處理多個連續的基本運算,達到硬體使用率的提升。特定應用指令集處理器不僅可以減輕像多指令分發處理器的暫存器檔案面積因運算單元增加而劇烈成長的問題,也因為一個指令抓取運算子後,會做多個運算後才回存暫存器檔案,對暫存器檔案的存取次數隨之減少,進而達到低功率的好處。然而,串接多個運算單元會導致關鍵路徑(critical path)變長,不適合用於速度需求較高的應用,且固定的串接順序會讓某些應用不適用。在本論文中,我們提出一個多級運算架構(multi-stage architecture),在暫存器檔案及運算單元之間插入一個串流介面單元(stream interface unit),用來存放暫存資料,同時也提供運算單元的前饋路徑(forwarding paths),使資料路徑的執行順序可以調整,確保應用在運算時不受運算單元的串接順序限制影響。同時這樣可以節省更多集中式暫存器檔案的存取,進而省下更多的功率損耗。針對幾個典型的數位訊號處理應用分析,我們方法的硬體使用率(operation per cycle)和簡化指令集處理器的1.00相比,平均可以提升至1.57。使用UMC 90nm製程去合成,在相同的運算效能下,多級運算架構的面積可以比多指令分發減少約22%,在較高的效能需求下,面積也會比簡化指令集及特定應用指令集小。而多級運算架構的功率損耗,可以比簡化指令集、多指令分發及特定應用指令集省下約7%~25%。 In the history of processor development, we observe that Reduced Instruction Set Computer (RISC) processors have already become mainstream for several years. It has a simple and regular datapath and thus facilitates pipelining for high performance. But its hardware utilization is low because it executes one operation in single instruction issued. Then, VLIW processor is presented, that takes advantage of the Instruction Level Parallelism (ILP) to improve hardware utilization. But the register file (RF) area of VLIW processor grows dramatically with the increase of the functional unit number. The cost is considerably high. In recent years, Application-Specific Instruction-set Processor (ASIP) is widely used. It cascades several functional units in costumed order to execute consecutive multiple primitive operations in single cycle to enhance hardware utilization. With limited ports to the centralized register file, the area of the ASIP would not increase exaggeratedly as more additional functional units being allocated. In addition, the cascaded datapath can perform several operations after fetching operands and then write the result back to the register file. It reduces the register file access times, thus achieving the benefit of low-power. However, cascaded functional units increase critical path, it cannot satisfy the requirements of the high clock rate and the fixed cascading order is not suitable for some applications. In this thesis, we propose a multi-stage architecture, which inserts a Stream Interface Unit (SIU) between register file and functional units. The immediate output of functional units can be stored in SIU. The SIU also provides forwarding paths among functional units to modify the execution order within the computing datapath to ensure the hardware utilization for different applications. It also reduces more centralized register file access and implies lower power consumption. For several classical DSP kernels analysis, the hardware utilization of multi-stage is 1.57 times in average higher than 1.00 of RISC. We use the UMC 90nm process to synthesis these architecture. Under same performance, the area of multi-stage is 22% less than VLIW; with higher performance requirement, the area is less than RISC and ASIP. The multi-stage saves about 7%~25% power consumption compared with RISC, VLIW, and ASIP. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT079911680 http://hdl.handle.net/11536/75237 |
Appears in Collections: | Thesis |