标题: | 长管线延迟资料路径之高面积效率设计与实现 Area-Efficient Design and Implementation of Deep-Pipeline Latency Datapath |
作者: | 吕进德 Chin-Te Lu 刘志尉 Chih-Wei Liu 电子研究所 |
关键字: | 长管线延迟;资料路径;高面积效率;deep-pipeline latency;datapath;area-efficient |
公开日期: | 2008 |
摘要: | 处理器的资料路径(datapath)通常是影响其效能的最重要部分。随着不同应用需求,资料路径的配置与设计也会不同,一般说来,针对高效能处理器,例如Intel Pentium处理器、IBM Cell 处理器等,设计者会藉由各种VLSI技术,尽可能的提高资料路径的操作频率;但另一方面,对于轻量化(lightweight)应用、如嵌入式系统(embedded system),则会以追求低功率、低晶片面积等方向做最佳化资料路径设计。同一套指令集架构(instruction set architecture)对于不同的应用而言会有不同的资料路径设计,针对此,本论文提出一套能针对不同效能需求,而能自动合成一具高面积效率的资料路径设计流程。此具高面积效率资料路径产生器,其中包含两个动作:空间和时间维度做最佳化设计。此具高面积效率资料路径产生器可延用现有的高效能处理器的指令集、如IBM Cell,和其相关发展软体与应用程式,并根据应用所需的效能,有系统的对处理器资料路径做最佳化。空间维度上的最有效率的应用意指资料共享路径,包含建立函数模型(function modeling)和周期准确模型(cycle-accurate modeling)设计。另一方面,我们也会针对时间维度上做最佳化,并分析指令的延迟(latency)时间,系统化地建立数学方程式以获得最小面积的微架构(micro-architecture)。我们以Cell SPU(Synergistic Processor Unit)资料路径设计为例,利用所提出的设计流程分析指令集架构,寻找出最高面积效率的微架构。实验显示,针对100MHz到800MHz的嵌入式微处理器的资料路径设计,我们所提出的设计流程比自动化工具改善约20%的面积。在UMC 90nm的制程下,我们利用前述的设计流程实作SPU数位讯号处理器,晶片面积为2.5mm x 2.5mm,而其操作频率为400MHz。 Datapath is primarily the most critical element that affects performance. The allocations and design of datapath depends various application requirements. General speaking, for high-performance processors like Intel’s Pentium Processors, IBM’s Cell Processors and so on, the designers extremely rise up operating frequency by board VLSI techniques. On the contrary, such as lightweight applications in the embedded system, the goal of datapath design is to seek low-power, small chip area and so on. The instruction set architecture (ISA) has different ways of implementation for different application requirements. Therefore, this thesis proposes the design flow to automatically generate the area-efficient datapath for various application requirements. The area-efficient datapath generator includes the two-phased including spatial-optimized and temporal-optimized for datapath optimization. It can systematically develop and optimize datapth of the processors while leveraging the instruction set architecture (ISA) of high performance processor like IBM’s Cell and the software toolchain and application programs. Spatial-optimized means that efficient utilization in spatial domain including function modeling and cycle-accurate design. In other phase, temporal-optimization explores the instruction latency to systematically build up mathematical formulation to get the optimal micro-architecture. We take the Cell synergistic processor unit (SPU) as our datapath design example to analyze the optimization space of SPU ISA implementation, and find the area-efficient micro-architecture by using our proposed design flow. In the experiment, the micro-architecture by using our proposed design flow improves about 15-20% of area compared to using CAD tools for datapath design of embedded processors targeted 100MHz to 800MHz. Finally, we use the previous design flow to implement the SPU DSP in the UMC 90nm 1P9M CMOS process. The silicon area is 2.5mm x 2.5mm and the clock rate is 400MHz. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT009511695 http://hdl.handle.net/11536/38215 |
显示于类别: | Thesis |
文件中的档案:
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.