完整後設資料紀錄
DC 欄位語言
dc.contributor.author唐立人en_US
dc.contributor.authorLee-Ren Tonen_US
dc.contributor.author鍾崇斌en_US
dc.contributor.authorChung-Ping Chungen_US
dc.date.accessioned2014-12-12T02:27:43Z-
dc.date.available2014-12-12T02:27:43Z-
dc.date.issued2001en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#NT900392106en_US
dc.identifier.urihttp://hdl.handle.net/11536/68514-
dc.description.abstract近年來,爪哇 (Java) 成為網際網路 (Internet) 上使用最為廣泛的語言之一。因為爪哇的執行平台 – 爪哇虛擬機器 (Java Virtual Machine; JVM) 為一基於堆疊運算的結構,其機器碼 – 位元組碼 (Bytecode) 之執行效能將受限於存取堆疊資料時的真實資料相依性 (True Data Dependency)。在本研究論文中,我們將探討在單一功能單元 (Functional Unit) 的爪哇處理器中各種不同的堆疊運算摺疊 (Stack Operation Folding) 技術以加強爪哇虛擬機器的執行效能。這些堆疊運算摺疊技術包含: 一、 基於樣板 (Template-based) 的連續位元組碼堆疊運算摺疊。 二、 基於法則 (Rule-based) 的連續位元組碼堆疊運算摺疊。 三、 基於法則的全功能位元組碼堆疊運算摺疊。 每個Java的指令將被分類成生產者 (Producer)、運算者 (Operator) 以及消費者 (Consumer) 三類 (POC) 中的一類以進行摺疊之可能性分析。我們首先提出第一個摺疊技術 – POC-樣板連續 (POC-TC),以計算被摺疊堆疊運算樣本的出現百分比來決定是否將該樣本建入Java處理器之指令解碼器 (Instruction Decoder) 之中。依據樣板大小,我們提出三種不同的堆疊運算摺疊策略:雙摺疊、三摺疊與四摺疊,以滿足不同的效能與成本考量。統計資料顯示這三種不同的堆疊運算摺疊策略分別可以消除67%、78%與79%的堆疊運算存取指令,而相對於未使用摺疊技術的堆疊機器來說,其整體的程式執行分別達到1.42、1.52與1.52的加速比。 在第二個摺疊技術中,我們在連續的爪哇位元組碼流 (Bytecode Stream) 中推導出堆疊運算摺疊之法則,稱之為POC-法則連續 (POC-RC) 摺疊模型。其基本觀念為檢查連續的兩個堆疊運算指令N與N+1的POC類別、運算元來源與目的、資料類別與寬度等資訊以判斷它們是否可以摺疊,若它們可被摺疊在一起則會產生一個被摺疊的位元組碼指令 (Folded Bytecode Instruction; FBI),此FBI亦具有由前兩個指令N與N+1所合成的POC類別資訊,並再度成為新的指令N’以繼續和後續之指令N’+1以同一個POC-法則連續摺疊模型檢查進一步摺疊之可能,此步驟將重複到POC-法則連續摺疊模型送出一個結束摺疊檢查的狀態為止。統計資料顯示有69%、82%與83%的堆疊運算存取指令分別被雙摺疊、三摺疊與四摺疊的摺疊策略所消除,而相對於未使用摺疊技術的堆疊機器來說,其整體的程式執行分別達到1.42、1.53與1.54的加速比。 更進一步的堆疊運算摺疊效能可由第三個摺疊技術 – POC-法則全功能 (POC-RA) 摺疊模型來提昇。在此摺疊模型中,在POC-法則連續摺疊模型中被序列執行的非連續位元組碼將透過新設計的硬體結構 – 堆疊運算堆疊 (Stack Operation Stack; SOS) 來進一步完成摺疊。在具有堆疊運算堆疊的爪哇處理器中,一連串解碼過的位元組碼將在被發派 (Issue) 執行前被依序 (Sequentially) 登錄 (Log) 到堆疊運算堆疊中,而指令解碼器中的堆疊運算摺疊單元將同時檢查堆疊運算堆疊與指令緩衝器 (Instruction Buffer) 中指令的POC類別以判斷是否可以進行摺疊。由於有堆疊運算堆疊的結構,對於那些非連續摺疊的堆疊運算可延長它們在堆疊運算摺疊單元中摺疊窗 (Folding Window) 的時間,提供了多次的摺疊機會以達更高的執行效能。統計資料顯示在指令緩衝區大小為7位元組以及堆疊運算堆疊的項目 (Entry) 個數為8時,分別有91%、97%與99%的堆疊運算存取指令被雙摺疊、三摺疊與四摺疊的摺疊策略所消除,而相對於未使用摺疊技術的堆疊機器來說,其整體的程式執行分別達到1.71、1.73與1.74的加速比。 在介紹過各種堆疊運算摺疊技術之後,我們可依據不同的效能與成本的需求來設計單一管線爪哇處理器的堆疊運算摺疊單元俾以開發更高的每週期指令數 (Instructions per Cycle; IPC) 的執行效能。在論文最後我們亦討論在更高效能的多管線爪哇處理器中堆疊運算摺疊技術的效能提昇應用。目前本研究為世界之首,無論在方法及架構上均有最佳之貢獻,不但獲得美國及中華民國專利,且被學術界其他研究所參考,如Austin Kim之Advanced POC摺疊模型即是參考本研究之POC摺疊模型而來,此外,部分指令摺疊之觀念及原則將被工研院電腦與通訊工業研究所之下一個爪哇處理機設計所引用,我們希望在此論文中的成果能進一步對未來其它高效能與低成本的爪哇處理器提供設計的參考。zh_TW
dc.description.abstractIn recent years, Java becomes the most popular language used over Internet. With the stack-based architecture of Java Virtual Machine (JVM), the execution performance is limited by the true data dependencies caused by stack accesses. In this dissertation, various stack operations folding techniques for Java processors are studied to enhance the execution performance of the JVM. The stack operations folding techniques include: i) template-based folding for continuous bytecodes, ii) rule-based folding for continuous bytecodes, and iii) rule-based folding for all-purpose bytecodes. Each bytecode is mapped to one of the Producer, Operator, and Consumer (POC) types for folding analysis. In the first folding technique – the template-based folding for continuous bytecodes (POC-TC), we count the percentages of folded bytecode patterns to determine the foldable templates built in the instruction decoder. With different template sizes, there are three folding strategies: 2-foldable, 3-foldable, and 4-foldable, available for various performance/cost issues. Statistical data shows that 67%, 78% and 79% of stack operations can be eliminated by 2-, 3-, and 4-foldable strategies, respectively. Furthermore, each strategy has an overall program speedup of 1.42, 1.52 and 1.52, respectively, as compared to a traditional stack machine without folding. Considering the design tradeoffs between instruction decoder width and the folding performance, a cost-effective folding unit is recommended. In the second folding technique, we derive the folding rules for continuous bytecode stream called a POC-RC folding model. The basic concept of POC-RC folding model is that it checks the bytecode I and I + 1 to see whether they are foldable or not (based on the POC type, operand sources, operand destination, data type and width). If they are foldable, a Folded Bytecode Instruction (FBI) is generated and becomes the new bytecode I to check the further foldability with the following bytecode I + 1. This process repeats until an ending case of the POC-RC folding states is encountered. Statistical data shows that 69%, 82% and 83% of stack operations can be eliminated by 2-, 3-, and 4-foldable strategies, respectively. Furthermore, each strategy has an overall program speedup of 1.42, 1.53 and 1.54, respectively, as compared to a traditional stack machine without folding. Considering the design tradeoffs between instruction decoder width and the folding performance, a cost-effective POC-RC folding unit is recommended. Further folding performance can be achieved by the third folding technique called POC-RA folding model. In this model, discontinuous bytecodes that are issued in series are further folded with the proposed SOS (Stack Operation Stack). With the SOS structure, all stack operations are logged into the SOS before being issued sequentially. With an instruction buffer size of 7 bytes and the SOS size of 8 entries, statistical data shows that 91%, 97% and 99% of stack operations can be eliminated by 2-, 3-, and 4-foldable strategies, respectively. Furthermore, each strategy has an overall program speedup of 1.71, 1.73 and 1.74, respectively, as compared to a traditional stack machine without folding. Considering the design tradeoffs between instruction decoder width and the folding performance, a cost-effective POC-RA folding unit is recommended. Having dealt with the stack operations folding techniques discussed in this dissertation, a cost-effective folding unit to exploit more IPC (Instructions per Cycle) for a high performance single-pipelined Java processor can be built. We hope the efforts in this research can contribute to the design of future cost-effective high performance Java processors.en_US
dc.language.isoen_USen_US
dc.subject爪哇虛擬機器zh_TW
dc.subject爪哇處理器zh_TW
dc.subject堆疊運算摺疊zh_TW
dc.subjectPOC摺疊模型zh_TW
dc.subject堆疊運算堆疊zh_TW
dc.subjectJava Virtual Machineen_US
dc.subjectJava Processoren_US
dc.subjectStack Operations Foldingen_US
dc.subjectPOC Folding Modelen_US
dc.subjectStack Operation Stacken_US
dc.title具有非連續摺疊能力之爪哇堆疊運算摺疊zh_TW
dc.titleJava Stack Operations Folding with Discontinuous Folding Capabilityen_US
dc.typeThesisen_US
dc.contributor.department資訊科學與工程研究所zh_TW
顯示於類別:畢業論文