标题: 智慧型自主指令记忆体设计
Intelligent Autonomous Instruction Memory Design
作者: 王立铭
Li-Ming Wang
钟崇斌
Chung-Ping Chung
资讯学院资讯学程
关键字: 动态分支预测器;指令位址汇流排;指令记忆体;分支目标缓冲器;返回堆叠;dynamic branch predictor;instruction address bus;instruction memory;branch target buffer;return stack
公开日期: 2005
摘要: 智慧型自主指令记忆体的主要概念是将动态分支预测器并入最上层的指令记忆体使后者具备“程式流程追踪” 能力。藉着动态分支预测器的协助,指令记忆体在多數时间可以不需CPU核心提供指令位址而知道要到那个位址去撷取下一道指令。这个概念的目的是要将CPU与指令记忆体之间的指令位址传输量降到最低。实作出这样的概念或许可以比许多已知的指令位址汇流排编码技术要节省更多的能源。当动态分支预测器从CPU移到指令记忆体,新增辅助硬体与一套沟通CPU与指令记忆体之间有效率的控制汇流排传输协定对维持程式流程的正确性以及原本动态分支预测器的运作是不可或缺的。运用上述概念的一个简单设计会先提出來,接着提出配备具有解码分支指令并计算其分支目标位址能力的部份指令解码器的一个强化设计。最后提出的是配备部份指令解码器与返回堆叠的更强化设计。实验结果显示这三个设计比起传统的架构分别减少97.71%, 98.49% 与99.99%的指令位址传输以及84.99%,86.54%与92.01%的总位元变化量。以上提出的设计都胜过T0编码技术许多。第三个设计略胜T0 DAT(128笔)编码技术。
Main concept of Intelligent Autonomous Instruction Memory (iAIM) is to equip top-level instruction memory with “program flow tracing” capability by incorporating dynamic branch predictor into top-level instruction memory. With help of dynamic branch predictor, instruction memory can know where to fetch the next instruction without instruction address supplied by CPU most of the time. The purpose of such concept is to reduce instruction address traffic between CPU and instruction memory to a minimum. The realization of such concept may conserve more energy on instruction address bus than many known instruction address bus encoding techniques. While dynamic branch predictor is removed from CPU to instruction memory, additional auxiliary hardware and an efficient control bus communication protocol between CPU and instruction memory are essential to maintain program flow correctness and original dynamic branch predictor operation. A simple design of iAIM that makes use of the above concept is proposed first, followed by an enhanced design that equips iAIM with a partial instruction decoder capable of calculating branch target address by decoding branch instruction. A more enhanced design that equips iAIM with a partial instruction decoder and a return stack is proposed finally. The experiment results show three proposed designs can reduce instruction address transmission to 97.71%, 98.49% and 99.99% and reduce total bit transitions to 84.99%, 86.54% and 92.01% compared with conventional architecture respectively. All these designs greatly outperform T0 encoding technique. The third design outperforms T0 DAT with 128 entries technique slightly.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009367588
http://hdl.handle.net/11536/80112
显示于类别:Thesis


文件中的档案:

  1. 758801.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.