標題: NSC98微處理器之模擬與效能評估:暫存器更名與逸序執行
Simulation and Performance Evaluation of NSC98 Microprocessor: Register Renaming and Out-of-order Execution
作者: 謝明燈
Hsieh, Ming-Deng
曾建超
Chien-Chao Tseng
資訊科學與工程研究所
關鍵字: NSC98微處理器;模擬器;暫存器更名;逸序執行;NSC98 Microprocessor;trace-driven simulator;register renaming;out-of-order execution
公開日期: 1996
摘要: 超純量是微處理器設計的潮流,而C微處理器又以x86系列為主流。但 是許多用來提高微處理器效能的機制如:暫存器更名、逸序執行、分支預 測等,實行在 x86相容微處理器上便困難許多。在本篇論文中, 我們針 對計畫中的x86相容微處理器:NSC98設計了一個 trace-driven模擬器, 並將重點放在暫存器更名與逸序執行的實行上,並進行模擬器的效能評估 ,以做為NSC98實際設計上的參考。 因為系統化與模組化的原則,以及可 彈性修改的模擬器組態,我們也可以透過少許的修改來使這個模擬器適用 於其它的類似架構微處理器。 我們的模擬結果顯示了: NSC98的原始 設計中, 最大的效能的瓶頸在於load/store指令的執行上。除了LSU的個 數不足之外,因為NSC98乃是將x86指令解為簡單的POP來執行,所以對於 既要讀取記憶體又要寫記憶體的x86指令的而言, NSC98的LSU重複計算兩 個相同的記憶體位址。這或許是x86相容微處理器設計上未來值得在深入 探討的一個地方。 此外,我們的模擬結果顯示了:現有實際應用程式 本身的平行度還是有限,即使再增加執行單元也不見得能再提高多少的效 能,執行單元的數量是可而止,不然只是更多的浪費而已。 The objective of this thesis is to build a simulator to simulate and evaluate a x86 compatible supercalar microprocessor, namely NSC98. Theperformance evaluation will focus on the implementation of register renaming and out-of- order execution. Superscalar is one of the currenttrends in microprocessors' design. Intel's x86 series are the mostpopular microprocessors on PCs. Although many people have devised many methods or schemes , for example : register renaming, out-of-order execution, and branch prediction, to improve performance ofmicropocessors. Due to x86's CISC property, it impose many designdifficulties on carrying out these performance improvement methods orschemes. We use the concept of object-oriented to build this simulator, so thatit is systematization and modularity. These design principles make it more flexible, and can be use for other microprocessors that havesimilar architecture. Our simulation result shows that, the most serious bottleneck of original NSC98's architecture is on load/store execution. In addition tothe deficiency of the number of LSU, NSC98 performs redundant address generation for the instructions which both read and write memory, thus reduce effective LSU utilization. This point should be further consideredin the future. Our simulation result also shows that, the performance gain is limitedby the instruction level parallelism existing in programs and can not increase linearly by adding more functional units.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT850392005
http://hdl.handle.net/11536/61751
Appears in Collections:Thesis