多層分支預測之研究

標題:	多層分支預測之研究 The Study of Multilevel Branch Prediction
作者:	梁桔端 Gi-Dung Liang 陳昌居 Chang-Jiu Chen 資訊科學與工程研究所
關鍵字:	分支預測;branch;multilevel;lookhead
公開日期:	1999
摘要:	多層分支預測之研究學生 : 梁桔端指導教授 : 陳昌居國立交通大學資訊工程研究所摘要對於現今的超純量管線處理器而言，分支指令一直是個執行能力上的瓶頸，因為分支指令會中斷管線中指令流的穩定進行。為了解決這個問題，各種不同的預測方法已經被相繼提出。有三種常用的預測方法，由最簡單的使用二位元計數器紀錄分支預測的結果 (bimod predictor)，或用兩層式的架構來記錄追蹤鄰近分支指令間的相關性 (2-level adaptive predictor)，以及使用較復雜的混合預測機制 (combination predictor)，將前兩種不同的預測機制整合在一起並使用一個計數表機制來判斷應該使用那一種機制的預測結果。另外，在資料與指令預取上，更須要往前超越只有預測一個分支指令的限制。這些處理方法和機制，很明顯的會因為使用了軌跡處理器 (trace processors) 與分離存取動態記憶體 (decoupled-access DRAM)，而提高指令平行度 (ILP)。在這篇論文中，我們提出數個多層分支指令預測的方法。在 Mubp-Like with not taken BTB 這個方法中，我們運用了上次猜測所得的預測位址來做為所引，以存取分支指令中不執行的部分。在 Mubp-Like with taken BTB這個方法中，我們透過了上次猜測所的得路徑來存取分支指令中執行的部份。至於在 Mubp-Like with RIP 中，我們則是運用了一個輔助的機制, RIP, 來降低巢狀分支指令對於儲存表格的干擾。最後，在模擬驗證的部份，我們則是採用了 SimpleScalar 的工具以資驗證。我們將我們所提出的方法一一和 A. Veidenbaum 所提出的方法作個比較。實驗結果顯示在 Mubp-Like with not taken BTB 的方法中，我們降低了大約 30 %的硬體成本並達到較高的準確率，在 Mubp-Like with taken BTB 中則是節省了大約 60 % 的硬體成本並有相近的準確率。在 Mubp-Like with RIP 這個方法中,對於所有的評測標竿程式而言，則是大約提昇了1% 到 2 % 的準確率。 The Study of Multilevel Branch Prediction Student: Gi-Dung Liang Advisor: Dr. Chang-Jiu Chen Department of Computer Science and Information Engineering National Chiao Tung University ABSTRACT Branch instructions are always the performance bottleneck of modern pipelined superscalar processors for their interrupting the steady flow of instruction stream in the pipeline. To resolve the problem, various branch prediction schemes have been proposed. There are three branch prediction schemes widely used today. The simplest one is bimod predictor using 2-bit saturating counters to record the history outcomes of every branch instruction. The 2-level adaptive predictor uses two-level architecture to trace the correlation of nearby branch outcomes. The most complex is the combination predictor, which consists of the bimod and 2-level predictor and uses a meta-table to choose which result to use. Furthermore, it has become necessary to look further ahead in the instruction stream than a single branch for data and instruction prefetching. This approach obviously increases ILP due to the use of trace processors and decoupled-access DRAM. In order for these techniques to be effective they need to have a sufficient lookahead, i.e. to be far enough ahead of processor execution in requesting data. In this thesis, we will propose several multi-level branch prediction mechanisms. In Mubp-Like (Multilevel branch predictor-Like) with not taken BTB (branch target buffer), it uses the last prediction target as the index of the not taken BTB to reduce the predictor size of not taken BTB. In Mubp-Like with taken BTB, it uses the last prediction path as the index of the taken BTB to reduce the predictor size of taken BTB. In Mubp-Like with RIP (reduce interference predictor), we use the auxiliary mechanism, RIP, to reduce the interference of the predictor table due to the loop instructions. We simulate our design using the SimpleScalar tool set. We compare our schemes with the original Mubp scheme proposed by A. Veidenbaum on some of the SPEC95 benchmarks. The simulation result shows that the Mubp-Like with not taken BTB achieves higher accuracy and reduces 30 % hardware cost. In Mubp-Like with taken BTB, it approximately achieves the same accuracy and reduces 60% hardware cost. In Mubp-Like with RIP, the improvement of accuracy is 1% to 2%.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#NT880392048 http://hdl.handle.net/11536/65447
顯示於類別：	畢業論文