標題: 軌跡快取記憶體上之啟發式軌跡選擇機制
The Heuristic Trace Selection for Trace Cache
作者: 吳宗翰
Zung-Hang Wu
陳昌居
Chang-Jiu Chen
資訊科學與工程研究所
關鍵字: 軌跡快取記憶體;軌跡選擇機制;trace cache;trace selection
公開日期: 1999
摘要:   對於未來的處理器來說,假如我們想要繼續提升它的執行效率的話,我們勢必要加大它的指令讀取與執行的頻寬。然而,指令讀取機制利用現有的指令快取記憶體是無法在一個循環時間內讀取超過一個不連續的程式區塊。最近,軌跡快取記憶體(trace cache)已經被提出來,藉由儲存不連續的區塊去克服這個限制。 然而,軌跡快取記憶體的效能主要受到軌跡選擇機制(trace selection)的影響。軌跡選擇機制的功能就是如何將動態指令流切割成一條可以一次讀取的軌跡指令。不同的軌跡選擇機制主要影響一條軌跡的長度與軌跡快取記憶體的命中率,而這兩者就間接的影響到指令讀取的頻寬。 在這一篇文章中,我們會加入兩個不同的選擇機制去改進軌跡快取記憶體的效能。第一個機制是我們將一個程序區塊視為不可分割的單元,也就是說一個程式區塊是不能被切割放在兩個不同的軌跡區段中,除非有程式區塊的大小超過16。第二個機制是將截點切在跳躍指令的兩條路的重新會合點上(re-convergent point)。這個機制可以使我們知道那些指令是獨立於分支指令的控制。也就是說,不管分支指令的預測是對還是錯,這些指令都必須被讀取,解碼且執行的。 隨著不同的軌跡選擇機制和不同的指令收集機制,第一個機制可以使軌跡快取記憶體的效能增進3%至30%。對第一種指令收集機制來說,平均可增進3.5%的效能,而對第二種指令收集機制平均可增進7.5%的效能。第二個機制只對某些標竿程式在效能上有所增進且其增加效能也不很明顯,但是這個機制可以使我們擁有更深的猜測機制。
If we want to increase the performance for the future processors, we must increase the bandwidth of the fetch and execution. However, the fetch mechanism with current instruction cache can not fetch more than one basic block per cycle. Recently, the trace cache had been proposed to overcome this limitation by caching traces of the dynamic instruction stream. However, the performance of the trace cache is strongly dependent on trace selection, the algorithm used to divide the dynamic instruction stream into traces. Trace selection primarily affects average trace length and trace cache hit rate, both of which, in turn, affect fetch bandwidth. In this thesis, we will introduce two heuristic trace selections to improve the performance of trace cache. The first one treats the basic block as atomic unit, in other words, a block can not be divided between two trace segments except that the instruction number of a block is over 16. The second heuristic is the algorithm to delineate the trace on the re-convergent point, which is the point of two paths of a branch to re-converge. This heuristic can let us know what instructions is control independence from the branch which maybe predicts incorrect. No matter what the result of the branch prediction, the control independent instructions must be fetched and decoded. The first one technique can improve 3% to 30% of the performance with different trace selection, different way associative and different collect method. The average improvement by using first collect method is 3.5%, and the improvement by using one of second collect method is 7.5%. The second technique just only improves performance of some benchmarks, and the improvement is very slight. But this technique can be introduced to the deeper speculation.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT880392060
http://hdl.handle.net/11536/65460
顯示於類別:畢業論文