X86超純量處理器指令擷取及解碼之研究

標題:	X86超純量處理器指令擷取及解碼之研究 Study of Instruction Fetching and Decoding for X86 Superscalar Processors
作者:	鄭哲聖 Cheng, Che-Sheng 鍾崇斌 Chung-Ping Chung 資訊科學與工程研究所
關鍵字:	超純量處理器;預先指令解碼快取記憶體;superscalar processor;predecoded instruction cache
公開日期:	1995
摘要:	新一代的80x86處理器都採用與精簡指令集處理器(RISC)類似的超純量(superscalar)核心技術. 這些處理器需要足夠的指令擷取與解碼頻寬( instruction fetch and decode bandwidth), 使得在處理器內的多個功能單元(functional unit)可以同時發揮它們的效能. 雖然, 80x86複雜指令集可以減少處理器對指令頻寬的需求, 但是它也引進二個80x86指令解碼的困難點. 第一個困難點是不定指令長度的問題: 這使得解碼器很難在一個時脈(clock cycle)內決定多個循序指令的長度, 並取得多個指令做解碼. 第二個困難點是不定指令格式的問題: 這使得解碼器很難取得所需的指令欄位做解碼, 而增加指令解碼的困難度. 這二個困難點使得設計80 x86超純量處理器的解碼器面臨了相當大的挑戰, 並且使得80x86處理器的效能可能因此受限. 我們在這篇論文中會先探討目前已知可以解決此一問題的三種方法. 它們分別是(1)Intel P6 的方法, (2)AMD K5 的方法, 以及(3)DIC (decoded instruction cache)的方法. 在整個檢視過以上的三種方法後, 我們提出了自己的辦法, 叫做PDIC (predecodedinstruction cache)的方法. 在這個方法中, 我們提出了兩種不同的PDIC設計方案, 分別稱做DIC-like1 PDIC Scheme和DIC-like2 PDIC Scheme. 儲存於PDIC中的指令都是經過預先解碼且長度, 格式固定, 而這個想法主要來自於DIC的方法. DIC-like1和DIC-like2這二個方案之唯一不同處是它們使用不同的指令對映到PDIC的方法. 為了評估各種方法的效能, 我們為每一個方法都建立了一個簡單但實際的模型(model). 在這個研究中, 我們主要用來測量效能優劣的標準是解碼速率(decoding rate). 我們也測量各個方法中指令快取記憶體的擊中率(hit rate), 並且觀察在各個方法中指令快取記憶體的擊中率和解碼速率之間有何關連. 我們使用五個Spec95的標竿程式, 以及蹤跡追尋模擬方法(trace- drivensimulations). 實驗的結果顯示, 使用DIC-like2 PDIC以及配合一個小的Level 1 指令快取記憶體可以達到較其它方法好的指令解碼效能. New generation 80x86 superscalar processors all use RISC-like superscalarcores. They require sufficiently high instruction fetch and decode bandwidthto keep multiple functional units busy. Although the 80x86 complex instructionset relieves the instruction fetch requirement, it introduces two major difficulties in decoding 80x86 instructions. First, the variable instructionlengths make it difficult for the decoder to identify more than one 80x86instruction per cycle. And secondly, the variable instruction formats make itdifficult for the decoder to decode these instructions. These two difficultiesmake the design of a decoder for 80x86 superscalar processors a challenging task, and the performance of 80x86 processors may be constrained. In this thesis, we first study three current solutions to improvinginstruction decode bandwidth. They are (1) the Intel P6 approach, (2) the AMDK5 approach, and (3) the DIC (decoded instruction cache) approach. We then propose our own approach, called the PDIC (predecoded instruction cache) approach. There are two variations of PDIC schemes: the DIC- like1 and DIC-like2PDIC schemes. The predecoded instructions stored in our PDIC are of fixed length and format, and the idea is based on the DIC approach. The only difference between the DIC-like1 and DIC-like2 PDIC scheme is that these two schemes use different mappings of the predecoded instructions into the PDIC. Inorder to evaluate the performance of each approach, we establish one simple butrealistic machine model for each. The major performance metric in our study isdecoding rate. The instruction cache hit rate is also measured to see how it affects decoding rate in each machine model. Trace-driven simulations of fiveSpec95 benchmarks are used for evaluating each approach in this study. The simulation results show that the DIC-like2 PDIC with a small L1 instruction can achieve a better decoding rate than the other approaches.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#NT840392010 http://hdl.handle.net/11536/60351
Appears in Collections:	Thesis