完整後設資料紀錄
DC 欄位語言
dc.contributor.author呂勁甫en_US
dc.contributor.authorLu, Chin-Fuen_US
dc.contributor.author周景揚en_US
dc.contributor.author賴伯承en_US
dc.contributor.authorJou, Jing-Yangen_US
dc.contributor.authorLai, Bo-Chengen_US
dc.date.accessioned2014-12-12T02:43:22Z-
dc.date.available2014-12-12T02:43:22Z-
dc.date.issued2013en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#GT070150254en_US
dc.identifier.urihttp://hdl.handle.net/11536/75463-
dc.description.abstract執行緒層級平行度和快取記憶體利用效率是影響通量處理器效能之重要因素,但這兩種元素在進行效能最佳化時的互斥情形使得排程演算法的設計漸趨複雜。例如:提高執行緒平行度會加重快取記憶體競爭,反之減輕快取記憶體競爭將使得執行緒程度遭到限制,因此如何取得兩者之間的平衡點是一個很重要的議題。尤其當應用本身具有非規律性記憶體存取的特性時,系統對這兩項因素的變化會更加的敏感且難以控制。許多既有的排程演算法僅針對這兩者之一進行改善,此論文將演示當這兩項因素同時被考慮且達到適當的平衡時,系統效能可以獲得卓越的提升。為了針對這兩項因素所造成的效能影響進行分析,此論文構建了排程演算法問題來描述這樣的狀況,且針對非規律性記憶體存取的應用施行一系列的解決方法。此論文將實驗環境架設於Nvidia 之Fermi 架構上,並實作考慮各種最佳化因素之排程演算法進行效能比較。實驗結果顯示,此論文所提出之排程演算法可獲得平均56%的快取記憶體失效(Cache Miss)改善,並反映在整體執行時間上,獲得平均51%的加速。zh_TW
dc.description.abstractThread-Level-Parallelism (TLP) and cache utilization are two significant performance factors of modern throughput processors. The conflicting correlation between the two factors has made the design a non-trivial task. Increasing TLP would aggravate cache contention, while avoiding cache contention could limit the TLP. The trade-off becomes even more intrigue and sensitive when dealing with applications with irregular data access patterns. Many existing thread scheduling algorithms addresses only one of these factors at a time. This thesis has demonstrated that there exists a significant performance gain when the two factors are considered together and properly traded-off. To conduct a comprehensive analysis for the performance impact of the two factors, this thesis formulates two thread scheduling problem to characterize the design concerns. A series of solutions are integrated to resolve the scheduling on a set of applications with irregular memory accesses. The experiment results on NVIDIA’s Fermi architecture have shown the performance difference of the proposed thread scheduling addressing various combination of constrains. Compare to a widely-used thread scheduling schemes, the average improvement on execution time can reach up to 51%.en_US
dc.language.isoen_USen_US
dc.subject執行序排程zh_TW
dc.subjectthread schedulingen_US
dc.title考慮執行緒平行度且快取記憶體資源並應用於通用 圖形處理器之執行緒排程演算法zh_TW
dc.titleScheduling Algorithms of Co-optimizing Thread-Level- Parallelism and Cache Utilization for GPGPUs 研en_US
dc.typeThesisen_US
dc.contributor.department電子工程學系 電子研究所zh_TW
顯示於類別:畢業論文