標題: 在超純量微處理器上的高頻寬指令抓取機制
High Bandwidth Instruction Fetch Mechanism
作者: 廖宜道
Yi-Tao Liao
陳昌居
Dr. Chang-Jiu Chen
資訊科學與工程研究所
關鍵字: 快取記憶體;超純量微處理器;Trace Cache;Superscalar Processor
公開日期: 1999
摘要: 在超純量微處理器上的高頻寬指令抓取機制 研究生:廖宜道 指導教授:陳昌居 國立交通大學 資訊工程研究所 摘要 Cache的理論基礎在於時間上的區域性以及空間上的區域性。我們相信最近剛執行過的指令有很大的機會再執行一次;而臨近的指令連續執行的機會也很大。這樣的一個觀念在指令抓取頻寬不高的情況下是成立的,但當我們的頻寬加大到一次抓取的指令個數超過一個Basic Block時便不成立了。因為Basic Block的最後一道指令是分支指令,所以連續執行的Basic Block不一定會在記憶體上處於連續的位置。為了解決這樣的問題,前人發展出了很多種方法,主要可分為Enhanced Cache和Trace Cache這兩種。 Enhanced Cache的主要問題在於在我們做指令抓取的時候必須先將指令做重排以及組合的動作。而且為了可以同時讀取數個區塊的資料,我們往往需要一個Multi-port Cache或者是將Cache複製多份。為了解決這樣的問題有人便發展出了Trace Cache這樣的架構。 Trace Cache的主要問題在於將相同的指令重覆儲存。主要的情形可以分為1. Instruction Cache和Trace Cache之間有重複執行的情況 2. Trace Cache不同entry之間有重複的情形。 Block-based trace cache可以解決在不同entry 間指令重覆的情形。但在抓取指令的同時我們必須要做重組的動作而這個動作會增加指令抓取時的嚴遲。另一個based trace cache 所造成的問題在於原作者為了能同時讀取數的Block所以複制了Block Cache,這造成了很嚴重的硬體成本浪費。在這篇論文中,我們解決了這兩個問題。我們所用的硬體資源只有原先的66.67%,而在使用相同資源的情況下我們有比較好的效能表現。
High Bandwidth Instruction Fetch Mechanism in Superscalar Processor Student : Yi-Tao Liao Advisor : Dr. Chang-Jiu Chen Institute of Computer Science and Information Engineering National Chiao Tung University ABSTRACT The major theories of the cache architecture are temporary locality and spatial locality. We believe that once an instruction is executed, it will be executed again in the near future, and the instructions whose address is close by will trend to be executed soon. However, the spatial locality theory will never be useful when the instruction bandwidth increase. It is because that if the number of the instructions, which we fetch per cycle, is larger than the block size of the cache. The locality of these blocks may not be continuous in the cache. This non-continuous situation is caused by the branch instruction. By this reason, we can’t fetch enough instructions to fit the bandwidth of the instruction fetch bandwidth. In order to solve the situation there are two major architectures: Enhanced instruction cache and Trace cache. The major problem in trace cache is redundant problem. The instructions may have two copies among instruction cache and trace cache. The instructions may have two copies among different line in the trace cache. We find that the block-based trace cache can reduce it. However, the replicated block cache and the collapse delay in fetch stage is the major problem in block-based trace cache. In this thesis, we solve the collapse delay in instruction fetch stage and we reduce the hardware cost in block-based trace cache. We need only 66.67% hardware cost of the block-based trace cache. In the same hardware cost, we have better performance than block-based trace cache.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT880392054
http://hdl.handle.net/11536/65453
顯示於類別:畢業論文