多處理機上迴圈排序之記憶體延遲紓緩技術

Full metadata record

DC Field	Value	Language
dc.contributor.author	王逸民	en_US
dc.contributor.author	Wang ,Yi-Min	en_US
dc.contributor.author	張瑞川	en_US
dc.contributor.author	Ruei-Chuan Chang	en_US
dc.date.accessioned	2014-12-12T02:15:21Z	-
dc.date.available	2014-12-12T02:15:21Z	-
dc.date.issued	1995	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#NT840394082	en_US
dc.identifier.uri	http://hdl.handle.net/11536/60530	-
dc.description.abstract	平行迴圈演算法(loop scheduling algorithm)是平行編譯器上一項重要的議題，本論文提出了一些在目前及未來適用的平行迴圈演算法。我們製造一個能真實的模擬各種架構之多處理機模擬器。我們可以利用此多處理機模擬器來研究多處理機上的各種平行迴圈排序演算法技術。傳統平行迴圈演算法在新一代大規模共享記憶體多處理機上執行會因通訊代價的增加而變得無效率，因此有一種新的行程排序方法叫親近式行程排序( affinity scheduling algorithm;AFS)，它可以增加記憶體存取發生在快速記憶體 (cache)的機會，以降低通訊代價。然而AFS有一些缺點：AFS浪廢太多時間在同步(synchronization)上，而且親近性亦沒發揮到極致。因此我們首先提出一改良式AFS (modified affinity scheduling algorithm; MAFS)，以改進AFS上述二項缺點，並且在模擬器上執行許多應用程式。實驗結果顯示在MAFS的執行時間比在AFS的短，而且在大部分情況下，MAFS可減少一半的遷移(migration)運算指令。此外，我們提出群聚親近式行程排序(clustered affinity scheduling algorithm; CAFS) ，以適用於大型之NUMA多處理機上。CAFS主要以群 (cluster)為單位做處理機間工作的遷移，以降低同步負擔並減低資料不在快速記憶體之比例。我們在模擬器上執行許多應用程式以比較各種平行迴圈演算法的效能。實驗結果顯示在大部分情況下，CAFS可減少至少 1/3的同步運算指令。而且CAFS可增加資料在快速記憶體之比例，並且平衡各處理機間的工作負擔。最後我們提出階層式親近排序法(hierarchical affinity scheduling policy) 以適用於分群式(clustered)NUMA多處理機。此法改良以前各種親近排序法，例如 AFS及MAFS，我們稱為HAFS及HMAFS。這種方法建議處理機之間工作遷移採用階層(hierarchy)方式，實驗證實此策略可減少不同群之間資料讀取以及紓緩對於工作列陣(work queue)之同步負擔，而且顯示HMAFS是最佳的平行迴圈演算法。 Loop scheduling is one of the most important issues in the development of parallelizing complier on shared memory multiprocessors. In this thesis, we propose loop scheduling algorithms for shared memory multiprocessors. We build a realistic multiprocessor simulator to simulate many modern multiprocessor architectures and use the simulator to evaluate the performance of various loop scheduling algorithms. Firstly, we propose modified affinity scheduling algorithm (MAFS) to improve affinity scheduling algorithm (AFS). When imbalance occurs, MAFS migrates expected work load of each processor to the idle processor. MAFS minimizes the synchronization overhead and enhances the affinity effect. Compared with AFS, MAFS reduces at least 1/2 of the migrations in most cases. Secondly, we propose clustered affinity scheduling algorithm (CAFS) for large-scale NUMA machines. CAFS distributes the processors into several clusters, and each processor belongs to a dedicated cluster. When imbalance occurs, the idle processor just migrates a fraction of iterations from the most loaded processor in its own cluster. Our results show that CAFS reduces at least 1/3 of both remote reads and synchronous writes to the queues under most applications. CAFS also improves the cache hit ratios, and balances the workload. Finally, we propose hierarchical affinity scheduling policy to improve various affinity algorithms for clustered NUMA machines. The basic idea of our policy is that when imbalance occurs, the migration processes are carried on hierarchically. The results show that hierarchical algorithms reduce the inter- cluster remote memory accesses, decrease the locks to the queues, and effectively balance the workload.	zh_TW
dc.language.iso	en_US	en_US
dc.subject	共享記憶體;迴圈排序;記憶體延遲紓緩;親近排序法;	zh_TW
dc.subject	Shared memory; Loop scheduling; Memory latency ty scheduling	en_US
dc.title	多處理機上迴圈排序之記憶體延遲紓緩技術	zh_TW
dc.title	Memory latency reduction in loop scheduling on modern multiprocessors	en_US
dc.type	Thesis	en_US
dc.contributor.department	資訊科學與工程研究所	zh_TW
Appears in Collections:	Thesis