標題: | 具有高派發率之X86超純量微處理機解碼單元的設計 The Design of the Decoding Unit with High Issue Rate for an X86 Superscalar Microprocessor |
作者: | 鄭信基 Cheng, Shin-Ki 單智君 Jyh-Jium Shann 資訊科學與工程研究所 |
關鍵字: | 超純量;原生運算;解碼單元;儲存緩衝器;解碼原則;Superscalar;Primitive operations (POPs);Decoding unit;Store buffer;Decoding rule |
公開日期: | 1997 |
摘要: | 超純量(superscalar)架構是現今許多微處理機用來提昇效能的技術。 在新一代超純量微處理機的架構下,我們藉著平行執行許多道指令來達到 較高的效能。為了考慮到X86指令集的相容性問題以及提昇指令執行的平 行度,X86指令解碼單元必須將X86指令轉換若干個原生運算(Primitive Operations,POPs)才能提高執行單元的平行度。將X86指令轉換成POPs的 方法有許多的不同,其中最主要的差異是:有些解碼單元將位址產生的動 作由load/store運算中分離出來,另外一些些解碼單元則將位址產生的動 作與load/store的運算合併。在本篇論文中,我們研究這兩個主要轉換 X86指令的方法。由實驗數據得知,在高派發率的解碼單元中,分離位址 產生動作的轉換法比合併的轉換法提昇大約20%到25%效能。此外因為 load/store的X86 運算在程式佔中有甚高的比例,所以我們可以改進 load/store單元(LSU)的機制來提昇效率。若我們增強儲存緩衝器(store buffer)的功能,使其可以監聽結果匯流排(snooping result bus),則較 無此功能者提昇約30%效能。最後,我們在硬體花費及效能的哲衷考量下 ,尋求適當的解碼原則。根據實驗數據的結果,我們建議一個適合現今商 品化程式的解碼原則,並設計出一個下一代的X86指令解碼單元。 In the new generation of the x86 microprocessors, superscalar techniques are used to achieve higher performance by executing multiple instructions in parallel. For compatibility and higher execution parallelism, the decoding units of these microprocessors translate the x86 instructions into primitive operations (POPs). The main difference of the strategies for translating the x86 instructions is that some decoding units translate separate address generation operations and the others merge the address generation into load/store operations. In this thesis, we examine these two translation strategies. Simulation results show that, in high issue rate decoding units, translating separate address generation operations may improve the performance for about 20% to 25%. Besides, since the load/ store instructions appear frequently in an x86 program, we may improve the mechanism of load/store unit (LSU) to increase the performance. We find that enhance the store buffer with the ability of snooping result buses is important for high issue rate decoding units. The improvement of the performance may be up to 30%. Furthermore, considering the tradeoff of hardware cost and performance, we examine the decoding rules for designing a decoding unit. According to the simulation results, we suggest a decoding rule suitable for current commercial programs. Finally, we design a next-generation x86 decoding unit according to the decoding rule. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#NT860392072 http://hdl.handle.net/11536/62807 |
Appears in Collections: | Thesis |