标题: | 超纯量处理之研究 Study of superscalar processing |
作者: | 萧裕弘 XIAO, YU-HONG 钟崇斌 资讯科学与工程研究所 |
关键字: | 超纯量处理 |
公开日期: | 1992 |
摘要: | 本论文研究超纯量指令排程的问题,因指令排程好坏对超纯量处理机之效能有决定 性之影响。相对其重要性,超纯量指令排程的研究目前仍感缺乏,或基于不合实际 条件的假设,或仅适合某一特殊架构。在本论文中,吾人研究三个相关问题,分别 为:(1)现存排程演算法则之分析;(2)最佳排程方法之设计;(3)有效的 启发式排程演算法则的设计。 首先吾人设计两个抽象模型,一个为超纯量处理机模型,另一个为程式模型。这两 个模型能反映大部份超纯量指令排程问题所具有的特性及统一研究工作。 吾人首先分析排程演算法则本身。大部份现存排程演算法则之评估是采取测试程式 方式。但此举费时且受样本数目之限制或代表性样本选择之困难,此法并不能有效 发掘程式之效能。为此,吾人以最坏情况分析的方法来评估排程演算法则。吾人发 现,现存排程演算法则皆为贪心排程方式,且最佳排程亦可转换成贪心排程。因此 ,吾人分析任意一个贪心排程的最坏情况。令w 表示一个贪心排程的长度, greedy 而w 表示一个最佳排程的长度。吾人研究 w /w 的最大值。吾人导 opt greedy opt 出w /w <k+1-1/max{m }(z+1) ,其中k为处理机型式之数目,m 为 greedy opt ▔ i i 第i种处理机之数目,而z为指令执行所产生最大的延迟周期。此成果具广泛性能 涵盖大部份此类问题,且当问题模型简化后,此成果涵盖前人在单位时间工作排程 研究之成果。 然而所导出w /w 之比值令人失望,当k=4 (如IBMRS/6000)时,此比值接 greedy opt 近5。因此需要一较好之演算法则。在此,吾人研究最佳排程的演算法则,因并无 令人满意之现存方法。吾人相信如能有效利用程式中的一些特性,达成最佳排程法 并不困难。经吾人彻底研究后,定义了指令间及部份排程间的优势及相等关系。这 些关系可用来大幅缩小答案空间。吾人亦实际实验此方法,其结果显示此方法之优 越性。 为实际应用起见,一个有效的启发式演算法则仍有存在之价值。因此,吾人设计了 一个排程演算法则,称为ECG-演算法则。相对于一般关键路径法,此演算法则在指 定指令排程优先权时作较多的考虑。吾人以最坏情况分析的方法来证明ECG-演算法 则之优越性。令w 表示一个ECG-排程的长度。吾人证明w /w <k+1-2/ ECG ECG opt ▔ max{m }(z+1) 。此值为目前已之者中最小者。 i 本研究成果令人满意与鼓舞。尤有甚者,此问题之模式、解题方法及分析技术皆具 广泛性,能适用于大部份此类问题。 In this dissertation, to eliminate or reduce the penalties caused by the possible hurdles, we study the following topics: 1) modeling of superscalar processing; 2) performance improvements provided by various hardware enhancements; 3) performance improvements provided by various techniques of software instruction scheduling; 4) design of a new live range definition and a new register allocation algorithm especially for superscalar processors. At the beginning of this dissertation, three superscalar architectural models, the XPCB, XXPB, and X4P2 models, are first proposed and used as target machine models in this study. Performance evaluations are made to determine which of these three architectural models is the most cost- effective. Considering cost-effectiveness and to avoid excessive blocking due to flow dependencies, simulation results show that the XXPB model is a judicious choice in the design of a superscalar processor system. Register renaming, operand copying, out-of-order instruction issuing, and branch prediction are four hardware methods for enhancing the performance of a superscalar processor. We study the improvement in execution time provided by these four enhancements in superscalar processing. The performance of various software instruction scheduling techniques are also studied in this dissertation. Finally, since the live range definition for a symbolic register and the corresponding register allocation algorithm affect the instruction parallelism exploitable by the architectures, these issues must also be studied. In this dissertation, the effects of hardware instruction scheduling, software instruction scheduling, and several related topics on performance improvement for superscalar processing are studied. We hope the information presented in this research can contribute to the design of future high- performance superscalar systems. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#NT812392001 http://hdl.handle.net/11536/57218 |
显示于类别: | Thesis |