完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.author | 呂勁甫 | en_US |
dc.contributor.author | Lu, Chin-Fu | en_US |
dc.contributor.author | 周景揚 | en_US |
dc.contributor.author | 賴伯承 | en_US |
dc.contributor.author | Jou, Jing-Yang | en_US |
dc.contributor.author | Lai, Bo-Cheng | en_US |
dc.date.accessioned | 2014-12-12T02:43:22Z | - |
dc.date.available | 2014-12-12T02:43:22Z | - |
dc.date.issued | 2013 | en_US |
dc.identifier.uri | http://140.113.39.130/cdrfb3/record/nctu/#GT070150254 | en_US |
dc.identifier.uri | http://hdl.handle.net/11536/75463 | - |
dc.description.abstract | 執行緒層級平行度和快取記憶體利用效率是影響通量處理器效能之重要因素,但這兩種元素在進行效能最佳化時的互斥情形使得排程演算法的設計漸趨複雜。例如:提高執行緒平行度會加重快取記憶體競爭,反之減輕快取記憶體競爭將使得執行緒程度遭到限制,因此如何取得兩者之間的平衡點是一個很重要的議題。尤其當應用本身具有非規律性記憶體存取的特性時,系統對這兩項因素的變化會更加的敏感且難以控制。許多既有的排程演算法僅針對這兩者之一進行改善,此論文將演示當這兩項因素同時被考慮且達到適當的平衡時,系統效能可以獲得卓越的提升。為了針對這兩項因素所造成的效能影響進行分析,此論文構建了排程演算法問題來描述這樣的狀況,且針對非規律性記憶體存取的應用施行一系列的解決方法。此論文將實驗環境架設於Nvidia 之Fermi 架構上,並實作考慮各種最佳化因素之排程演算法進行效能比較。實驗結果顯示,此論文所提出之排程演算法可獲得平均56%的快取記憶體失效(Cache Miss)改善,並反映在整體執行時間上,獲得平均51%的加速。 | zh_TW |
dc.description.abstract | Thread-Level-Parallelism (TLP) and cache utilization are two significant performance factors of modern throughput processors. The conflicting correlation between the two factors has made the design a non-trivial task. Increasing TLP would aggravate cache contention, while avoiding cache contention could limit the TLP. The trade-off becomes even more intrigue and sensitive when dealing with applications with irregular data access patterns. Many existing thread scheduling algorithms addresses only one of these factors at a time. This thesis has demonstrated that there exists a significant performance gain when the two factors are considered together and properly traded-off. To conduct a comprehensive analysis for the performance impact of the two factors, this thesis formulates two thread scheduling problem to characterize the design concerns. A series of solutions are integrated to resolve the scheduling on a set of applications with irregular memory accesses. The experiment results on NVIDIA’s Fermi architecture have shown the performance difference of the proposed thread scheduling addressing various combination of constrains. Compare to a widely-used thread scheduling schemes, the average improvement on execution time can reach up to 51%. | en_US |
dc.language.iso | en_US | en_US |
dc.subject | 執行序排程 | zh_TW |
dc.subject | thread scheduling | en_US |
dc.title | 考慮執行緒平行度且快取記憶體資源並應用於通用 圖形處理器之執行緒排程演算法 | zh_TW |
dc.title | Scheduling Algorithms of Co-optimizing Thread-Level- Parallelism and Cache Utilization for GPGPUs 研 | en_US |
dc.type | Thesis | en_US |
dc.contributor.department | 電子工程學系 電子研究所 | zh_TW |
顯示於類別: | 畢業論文 |