標題: 在發生分支預測錯誤時利用檢查呼叫與返回指令之歷史記錄以更正返回位址推疊之機制
Mechanism for Return-Address-Stack Recovery under Branch Misprediction by Checking Call/Return History
作者: 邱冠穎
鍾崇斌
Chiu, Guan-Ying
Chung, Chung-Ping
資訊科學與工程研究所
關鍵字: 分支預測;返回位址堆疊;branch prediction;return stack
公開日期: 2010
摘要: 現今的處理器設計之中,為了針對程式中呼叫指令與返回指令的特性,會利用return-address-stack來存放與呼叫指令對應的返回指令的target PC。然而處理器做動態分支預測時可能會發生錯誤,當一個分支指令被錯誤的預測,pipe stream將被導向錯誤的方向,繼而提取了不正確的指令。當分支預測錯誤被檢查到時,這些不正確的指令將被flush,但其中可能包含的呼叫與返回指令,已在pipeline前端對return-address-stack做出推入與彈出的動作,很可能將return-address-stack的內容變更導致其後的返回指令預測錯誤。 在這篇論文中,我們提出一個硬體設計的方法,使得當分支預測錯誤被檢查到時,能夠正確的回復return-address-stack的內容。這個設計會去記錄目前在pipeline中的呼叫與返回指令的序列,當檢查出分支預測錯誤,就在return-address-stack裡根據所記錄的序列來逐一對各個呼叫與返回指令的動作做反向處理,做一個完全的回復。同時,為了縮減回復所花的時間,我們進一步設計了一個呼叫-返回配對的機制,這機制能在我們的呼叫與返回序列中,標記出成對的呼叫-返回指令,使得我們在回復return-address-stack時,能夠以簡單的硬體設計來略過這些成對的呼叫-返回指令,減少回復的時間。 我們的實驗環璄使用一個20-stage in-order dual-issue的處理器,在MiBench 與SPEC2000 Benchmarks中,經過我們的回復機制,return-address-stack的預測準確率都能維持100%。在效能改善方面,對於MiBench benchmark suite我們的設計最高可達到9.2%的加速,整體平均加速為7.7%;對於SPEC2000 Benchmark suite我們的設計最高可達到9.5%的加速,整體平均加速為7.5%。 我們所提出的設計提供了一個能夠完全回復return-address-stack的機制,並且有以下的特性: 1. 相當小的硬體儲存空間需求 2. 簡單的管理機制 3. 快速的回復 4. 能夠以簡單且低硬體成本的擴充來符合深管線架構的需求。
In modern processor design, we use a return-address-stack to help BTB predicting the target PC of return instruction. However, in the dynamic branch prediction, a branch may be mis-predicted, and direct the pipe stream into wrong path. When a branch misprediction is discovered, all the instructions fetched from the wrong path will be flushed. But those instructions may contain erroneous calls/returns and therefore corrupt the RAS. In this thesis, we propose a hardware method to restore the RAS under branch misprediction. Our idea is to tracking all speculated calls/returns in the pipeline, and undo their operations when a branch misprediction is discovered. Furthermore, we provide a time-efficient restore mechanism by skipping call-return pairs, reducing the time we spent on RAS recovery. Our experiments show that for a 20-stage in-order dual-issue design, our mechanism can achieve 100% RAS prediction accuracy in both Mibench and SPEC2000 benchmark suite, and improve performance up to 9.2% and 9.5% in Mibench and SPEC2000. In average, we also have performance improvement about 7.7% and 7.5% respectively. Our contribution is proposing a hardware solution to complete restore the RAS and our design has these virtues: 1. Minimum storage required 2. Easy management 3. Efficient recovery 4. Economical on extending for deep pipeline
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT079555619
http://hdl.handle.net/11536/139834
Appears in Collections:Thesis