標題: x86超純量微處理機中以語意為基礎之載入/儲存指令排程
Semantic-Based Load/Store Scheduling for X86 Superscalar Microprocessors
作者: 蔣昆成
Kuen-Cheng Chiang
單智君
Jean Jyh-Jiun Shann
資訊科學與工程研究所
關鍵字: 超純量;微處理機;以語意為基礎;排程;載入/儲存;superscalar;microprocessor;semantic-based;scheduling;load/store
公開日期: 1999
摘要: x86相容微處理器是目前最被廣泛使用的一般用途計算機架構,然而其效能卻嚴重的受限於載入/儲存指令的執行時間。因此,有許多方法被提出嘗試要突破記憶體存取的限制,例如:位址預測之載入/儲存指令排程、相依性預測之載入/儲存指令排程。但這些載入/儲存指令排程的策略仍受限於位址計算及耗時的記憶體存取。在本篇論文中,我們提出一種新的機制,稱為以語意為基礎的載入/儲存指令排程(Semantic-based load/store scheduling),來減緩這些限制的影響。在x86計算機架構中,大部分函式呼叫的區域變數與呼叫參數都被儲存在堆疊記憶體(Stack memory)中。我們發現如果與堆疊存取相關之指令中的偏移值(Displacement)相同的話,則這些指令將會存取記憶體中相同的位址;因此,我們可以依據指令中的偏移值來追蹤堆疊存取指令間的相依性(dependency)及前饋路徑(forwarding path)。模擬的結果顯示,以語意為基礎的載入/儲存指令排程的機制相對於載入指令規避儲存指令並前饋資料(load bypassing store with forwarding)的效能,其增益可以達到1.47倍。若是在以語意為基礎的載入/儲存指令排程再加上選擇性位址/相依性預測機制的輔助,效能的增益更可達到1.70倍。
The x86 compatible processor is the most widely used general-purpose architecture and the accessing latencies of load/store operations impact the performance of an x86 processor severely. Therefore, many techniques, such as address prediction and dependency prediction load/store scheduling, have been proposed to break the constraint of memory accesses. However, the performances of these strategies of load/store scheduling are limited in address calculation problems and time-consumed memory accesses. In this thesis, we propose a new mechanism called semantic-based load/store scheduling to alleviate these limitations. In x86 architecture, most of the local variables and parameters of a function are stored in stack memory. We find that the addresses of stack accessing operations will be the same if the displacements within these instructions are the same. Therefore, we track the dependencies and forwarding paths between the stack accessing operations according to the displacement values of the operations. From our simulation results, the speedup of semantic-based load/store scheduling alone can achieve 1.47 compared with the strategy of load bypassing stores with forwarding. While combing this scheduling with selective address/dependency prediction, it can achieve the speedup of 1.70.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT880392020
http://hdl.handle.net/11536/65416
顯示於類別:畢業論文