标题: 爪哇处理机之堆叠指令摺叠 :模型建立与实作
Stack Instruction Folding of Java Processors: Modeling and Realization
作者: 张隆昌
Lung-Chung Chang
钟崇斌
Chung-Ping Chung
资讯科学与工程研究所
关键字: 爪哇处理机;爪哇虚拟机器;堆叠运算摺叠;资料相依性;堆叠指令摺叠;Java Processor;Java Virtual Machine;Stack Operations Folding;Data Dependency;Stack Instruction Folding
公开日期: 2001
摘要: 爪哇 (Java) 语言由于其较安全、具有跨平台特性、且其程式码较小 (每一指令平均约1.8 Bytes),被广泛应用于Internet以及嵌入式控制器上。然爪哇处理机 (Java Processor) 受到其虚拟机器 (Virtual Machine) 资料相依性之特性影响,效能受到严重之限制。为了克服以上之问题,在本论文中介绍了静态式 (Pattern-based) 及动态式 (Rule-based) 两种指令摺叠方法。在静态式摺叠中,可摺叠指令之组合型态必须事先找出,然后一一与目前指令串来比对。而在动态式摺叠中,指令将依其可摺叠属性自动地被摺叠在一起。同时它可再分为连续性指令 (Continuous Instruction) 及非连续性指令 (Discontinuous Instruction) 摺叠。前者仅可摺叠一串连续之可摺叠指令,而后者可将被非堆叠指令或已被摺叠过指令隔开之指令摺叠在一起。本论文提出一连续动态摺叠方法—POC Folding,及其架构设计。初期模拟统计资料显示,4-摺叠策略已接近摺叠上限,可省去84%之堆叠运算;且2-、3-及4-摺叠相对于无指令摺叠之效能比为1.22、1.32及1.34。此方法可以最简单之硬体设计来完成。另外,一个非连续之动态摺叠模型,同时也是POC摺叠之扩充模型—Hybrid-EPOC Folding,在本论文中一并被提出。它可藉由软体之重新安排,利用既有之简单POC硬体额外增加一扩充指令 (P’ Bytecode) 来达成。其在4-摺叠策略下可省去94.8%之堆叠运算 (使用第二种测试程式—SPECjvm98 / s10 Data Set),较POC摺叠之80.1%多了14.7%,效能比高了13%;其IIPC (Issued Instructions Per Cycle) 在 2-、3-、4-摺叠策略下为1.60、1.70与1.71,其中1.71达到理论上限值1.77之96.6%。然而其所付出之额外P’ Bytecode程式码少于原先虚拟机器总大小之8%。此种方法在ROM相对面积较小之高效能SoC设计中有其好处。升阳之picoJava-II用静态之摺叠方法,其可省去之堆叠运算仅为39.6%、IIPC为1.25。目前本研究为世界之首,无论在方法及架构上均有最佳之贡献。不但获得美国及中华民国专利,且被学术界其他研究所参考。如 Kim 之Advanced POC 摺叠模型即是参考本POC摺叠模型而来。
Java has been extensively adopted in Internet and embedded applications because of its robust, cross platform and small code size (every instruction is 1.8 bytes on average) characteristics. However, the performance of the Java processor is greatly limited by the true data dependency inherited from its virtual machine. Two kinds of folding models are introduced to overcome this problem. One is pattern-based folding and the other is rule-based folding. In the pattern-based folding model, folding patterns are identified first and then compared with incoming instructions. In the rule-based folding model, bytecode instructions are classified before the folding algorithm automatically perform folding checks based on the folding attributes of each bytecode instruction. Furthermore, the folding can be divided into continuous instruction and discontinuous instruction folding. The former can only fold sequentially ordered instructions. The latter folds instructions that may be blocked by non-stack instruction or may include folded instructions. A continuous rule-based folding model – POC folding – and its corresponding architecture are proposed. The first simulation shows that 4-foldable strategy that almost reaches the performance upper bound can eliminate 84% of all stack operations. The 2-, 3-, and 4-foldable strategies yield in an overall program speed-up of 1.22, 1.32 and 1.34 times, respectively, when compared to a stack machine without folding. This model can be implemented by simple hardware design. A discontinuous rule-based folding model that extends POC folding – Hybrid-EPOC folding – is also proposed. It can be implemented by software re-scheduling accompanied by simple POC folding hardware, with the support of an extra extended bytecode (P’ bytecode). A 4-foldable Hybrid-EPOC Folding model folds 94.8% of stack operations using the second benchmark program – SPECjvm98/s10 data set. The model exhibits a 14.7% increase and a 13% performance gain over the 80.1% of the POC folding model can fold. The IIPC (Issued Instructions Per Cycle) of the Hybrid-EPOC folding model for 2-, 3-, and 4-foldable strategies are 1.72, 1.73, and 1.74, respectively. 1.74 is 98.3% of the theoretical upper bound of 1.77. However, the increased code size of the P’ bytecode is less than 8% of that of the total virtual machine code. This model is suitable for the high performance and low ROM size of the system on a chip design. Sun Microsystem’s picoJava-II uses a pattern-based folding model, which can eliminate 39.6% of all stack operations and 1.25 of IIPC by the 4-foldable strategy. Our research leads this field, in both methodology and architecture. We have obtained US and ROC patents, and the research results has also been referenced by other research, including Kim’s Advanced POC model.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT900392010
http://hdl.handle.net/11536/68424
显示于类别:Thesis