適用於多處理機系統內接網路之新匯流排配置演算法

標題:	適用於多處理機系統內接網路之新匯流排配置演算法 A New Bus Allocation Algorithm for Interconnection Networks of Multiprocessor Systems
作者:	吳東賢 Tony Wu 王國禎 Kuochen Wang 資訊科學與工程研究所
關鍵字:	仲裁者;匯流排配置;內接網路;多處理機系統;arbiter;bus allocation;interconnection network;multiprocessor system
公開日期:	1998
摘要:	在共享記憶體多處理機系統中，內接網路往往是系統效能的瓶頸。記憶體參考通常有兩種區域性：時間區域性和空間區域性。如果處理機參考一個記憶體模組，它傾向於再次參考相同的記憶體模組。如果我們不馬上釋放連接處理機和記憶體模組的匯流排，當此處理機再次參考相同的記憶體模組時，我們可以直接使用這匯流排而不需重新設定。因此，我們提出一個適用於多處理機系統內接網路的新匯流排配置演算法。我們用一個基本上屬於多匯流排架構的疊流式單邊縱橫交換鍵來說明我們的設計方法。我們採用連接表來記錄每一個處理機、每一個記憶體模組、和每一條匯流排的狀況。當一個交易被選上以便執行時，如果處理機和記憶體模組沒有被連接，我們選擇一條匯流排並重新設定疊流式交換鍵以處理這個交易。如果該處理機和記憶體模組已經連接，我們直接使用相同的匯流排而不用重新設定。實驗數據顯示本新演算法能夠更有效地使用匯流排並減少重新設定的次數。使用新匯流排配置演算法的效能比原先高了1.5到3倍。此外，我們使用Verilog硬體描述語言及Xilinx的 FPGA分別描述及實現了一個2×2疊流式單邊縱橫交換鍵。Verilog 模擬的結果驗證了疊流式設計的功能。我們並使用Aptix MP3A FPCB（現場可程式化電路板）與一些電子元件分別來模擬疊流式交換鍵以及處理機與記憶體模組的行為。本研究的貢獻是設計出一個高產量的內接網路以配合高效能的處理機，從而消除效能瓶頸。 In shared multiprocessor systems, the interconnection network is usually the bottleneck of system performance. Memory references usually have two kinds of locality: temporal locality and spatial locality. If a processor references a memory module, it tends to reference the same memory module again. If we do not release a bus that connect the processor and the memory module immediately, we can use the same bus directly without reconfiguration when the processor references the same memory module again. Thus, we propose a new bus allocation algorithm for interconnection networks of multiprocessor systems. A pipelined one-sided crossbar switch, which is essentially a multiple bus, is used to illustrate our design approach. We use connection tables to record the states of each processor, each memory module, and each bus. When a transaction is selected to be issued, if the processor and the memory module are not connected, we select a bus and reconfigure the pipelined switch to proceed the transaction. If the processor and the memory module have already been connected, we use the same bus directly without reconfiguration. Experimental results show that the new algorithm use buses more efficiently and reduce the number of reconfigurations. The performance (throughput) using the new bus allocation algorithm is 1.5 to 3 times higher than that using the original algorithm. In addition, we have described and realized a 2 ×2 pipelined one-sided crossbar switch using Verilog HDL and Xilinx FPGAs, respectively. Verilog simulation results have validated the functionality of the pipelined design. We also use Aptix MP3A FPCB (Field Programmable Circuit Board) and some electronic components to emulate the pipelined switch and the behavior of the processors and the memory modules, respectively. The contribution of this work is designing a high throughput interconnection network to match high performance multiprocessors and to eliminate the performance bottleneck.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#NT870394041 http://hdl.handle.net/11536/64182
顯示於類別：	畢業論文