標題: | 長管線延遲資料路徑之高面積效率設計與實現 Area-Efficient Design and Implementation of Deep-Pipeline Latency Datapath |
作者: | 呂進德 Chin-Te Lu 劉志尉 Chih-Wei Liu 電子研究所 |
關鍵字: | 長管線延遲;資料路徑;高面積效率;deep-pipeline latency;datapath;area-efficient |
公開日期: | 2008 |
摘要: | 處理器的資料路徑(datapath)通常是影響其效能的最重要部分。隨著不同應用需求,資料路徑的配置與設計也會不同,一般說來,針對高效能處理器,例如Intel Pentium處理器、IBM Cell 處理器等,設計者會藉由各種VLSI技術,盡可能的提高資料路徑的操作頻率;但另一方面,對於輕量化(lightweight)應用、如嵌入式系統(embedded system),則會以追求低功率、低晶片面積等方向做最佳化資料路徑設計。同一套指令集架構(instruction set architecture)對於不同的應用而言會有不同的資料路徑設計,針對此,本論文提出一套能針對不同效能需求,而能自動合成一具高面積效率的資料路徑設計流程。此具高面積效率資料路徑產生器,其中包含兩個動作:空間和時間維度做最佳化設計。此具高面積效率資料路徑產生器可延用現有的高效能處理器的指令集、如IBM Cell,和其相關發展軟體與應用程式,並根據應用所需的效能,有系統的對處理器資料路徑做最佳化。空間維度上的最有效率的應用意指資料共享路徑,包含建立函數模型(function modeling)和週期準確模型(cycle-accurate modeling)設計。另一方面,我們也會針對時間維度上做最佳化,並分析指令的延遲(latency)時間,系統化地建立數學方程式以獲得最小面積的微架構(micro-architecture)。我們以Cell SPU(Synergistic Processor Unit)資料路徑設計為例,利用所提出的設計流程分析指令集架構,尋找出最高面積效率的微架構。實驗顯示,針對100MHz到800MHz的嵌入式微處理器的資料路徑設計,我們所提出的設計流程比自動化工具改善約20%的面積。在UMC 90nm的製程下,我們利用前述的設計流程實作SPU數位訊號處理器,晶片面積為2.5mm x 2.5mm,而其操作頻率為400MHz。 Datapath is primarily the most critical element that affects performance. The allocations and design of datapath depends various application requirements. General speaking, for high-performance processors like Intel’s Pentium Processors, IBM’s Cell Processors and so on, the designers extremely rise up operating frequency by board VLSI techniques. On the contrary, such as lightweight applications in the embedded system, the goal of datapath design is to seek low-power, small chip area and so on. The instruction set architecture (ISA) has different ways of implementation for different application requirements. Therefore, this thesis proposes the design flow to automatically generate the area-efficient datapath for various application requirements. The area-efficient datapath generator includes the two-phased including spatial-optimized and temporal-optimized for datapath optimization. It can systematically develop and optimize datapth of the processors while leveraging the instruction set architecture (ISA) of high performance processor like IBM’s Cell and the software toolchain and application programs. Spatial-optimized means that efficient utilization in spatial domain including function modeling and cycle-accurate design. In other phase, temporal-optimization explores the instruction latency to systematically build up mathematical formulation to get the optimal micro-architecture. We take the Cell synergistic processor unit (SPU) as our datapath design example to analyze the optimization space of SPU ISA implementation, and find the area-efficient micro-architecture by using our proposed design flow. In the experiment, the micro-architecture by using our proposed design flow improves about 15-20% of area compared to using CAD tools for datapath design of embedded processors targeted 100MHz to 800MHz. Finally, we use the previous design flow to implement the SPU DSP in the UMC 90nm 1P9M CMOS process. The silicon area is 2.5mm x 2.5mm and the clock rate is 400MHz. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT009511695 http://hdl.handle.net/11536/38215 |
顯示於類別: | 畢業論文 |