在多圖形處理器架構下考量裝置能力進行工作量分散運算

Full metadata record

DC Field	Value	Language
dc.contributor.author	趙硯廷	zh_TW
dc.contributor.author	游逸平	zh_TW
dc.contributor.author	Chao, Yen-Ting	en_US
dc.contributor.author	You, Yi-Ping	en_US
dc.date.accessioned	2018-01-24T07:38:56Z	-
dc.date.available	2018-01-24T07:38:56Z	-
dc.date.issued	2016	en_US
dc.identifier.uri	http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070356029	en_US
dc.identifier.uri	http://hdl.handle.net/11536/140124	-
dc.description.abstract	在多GPU 資源管理的幫助下，使用多個圖形處理器（GPU）加速應用程序近年來越來越流行。然而，kernel 間存在著相依性關係的應用程序並沒有從多個GPU 的資源得到任何好處，因為kernel 無法運行同時在這些GPU 上而降低GPU 的利用率。應用程序中如有big-kernel 的行為，通常會啟動了大量的線程進行大量的數據運算，因為在OenCL 的計算模型下，一個kernel 僅能執行於一顆圖形處理器上，此類的應用程序也降低多GPU 系統的總體效能。big-kernel 應用程序需要使用者手動將kernel 分成幾個小kernel 並在不同的GPU 上分派這些kernel 以便利用多個GPU 資源，但是這給使用者帶來了額外的負擔。在本文中，我們提出了XVirtCL，用於自動平衡一個kernel 在多個GPU 之間的工作負載，同時考慮GPU 的能力水平，並最小化在GPU 之間傳輸的數據。XVirtCL 涉及（1）kernel analyzer 用於確定kernel 的工作負載是否合適切割，（2）workload scheduling algorithm，用於平衡的將kernel 分散在多個GPU 之間，同時考慮各種GPU 計算能力水平（3）workload partitioner 用於將kernel 分割成多個子kernel 的工作。實驗結果說明我們提出的系統架構針對big-kernel 能最大化了多個GPU 的使用率與加速效果。	zh_TW
dc.description.abstract	Using multiple graphics processing units (GPUs) to accelerate applications has become more and more popular in recent years, with the assistance of multi-GPU abstraction techniques. However, an application that has only dependent kernels derives no benefit from the power of multiple GPUs since the kernels within the application cannot run simultaneously on those GPUs, thereby decreasing the utilization of GPUs. Applications that have a ‘big’ kernel, which launches a huge number of threads for processing massively parallel data, can also lower the overall throughput of a multi-GPU system. Such an application requires programmers to manually divide the kernel into several ‘small’ kernels and dispatch the kernels on different GPUs so as to utilize multiple GPU resources, but this imposes an extra burden on programmers. In this paper, we present XVirtCL, which is an extension of VirtCL (a GPU abstraction framework) for automatically balancing the workload of a kernel among multiple GPUs while considering the variety of compute capability levels of GPUs and minimizing the data transferred among GPUs. XVirtCL involves (1) a kernel analyzer for determining whether the workload of a kernel is suitable for being partitioned, (2) a workload scheduling algorithm for balancing workload of a kernel among multiple GPUs while considering the variety of compute capability levels of GPUs and (3) a workload partitioner for partitioning a kernel into multiple sub-kernels which have disjoint sub-NDrange spaces. The preliminary experimental results indicate that the proposed framework maximized the throughput of multiple GPUs for applications with big, regular kernels.	en_US
dc.language.iso	zh_TW	en_US
dc.subject	圖形處理器	zh_TW
dc.subject	OpenCL	zh_TW
dc.subject	GPU 抽象技術	zh_TW
dc.subject	workload distribution	zh_TW
dc.subject	load balance	zh_TW
dc.subject	GPGPU	en_US
dc.subject	OpenCL	en_US
dc.subject	device abstraction	en_US
dc.subject	workload distribution	en_US
dc.subject	load balance	en_US
dc.title	在多圖形處理器架構下考量裝置能力進行工作量分散運算	zh_TW
dc.title	Capability-Aware Workload Partition on Multi-GPU Systems	en_US
dc.type	Thesis	en_US
dc.contributor.department	資訊科學與工程研究所	zh_TW
Appears in Collections:	Thesis