標題: | 利用Clang將OpenACC轉換成含SPIR核心函式之LLVM中間表示碼 Translating OpenACC to LLVM IR with SPIR Kernels on Clang |
作者: | 彭皓偉 Peng, Hao-Wei 單智君 Shann, Jyh-Jiun 資訊科學與工程研究所 |
關鍵字: | 異質多處理器平台;OpenCL;SPIR;OpenACC;Clang;低階虛擬機器;核心函式;Heterogeneous multiprocessor platforms;OpenCL;SPIR;OpenACC;Clang;LLVM;kernel function |
公開日期: | 2015 |
摘要: | 在多處理器的平台架構已經成為主流的發展趨勢下,平行處理是提升程式效能的主要方式之一。多處理器平台可以分為同質(Homogeneous)與異質(Heterogeneous)兩大類。一般而言,針對高度運算的程式在異質多處理器平台上執行可以比同質多處理器平台有機會獲得更好的效能,但是在程式的撰寫上卻比較複雜及困難。近年來,有許多平行編程的標準被提出,如由Khronos Group釋出的OpenCL (Open Computing Language)標準,可以被應用在各種不同的異質多處理器平台上,然而使用OpenCL撰寫平行程式依然複雜且易出錯;所以,為了簡化在異質多處理器平台的程式撰寫,有更多平行編程標準被提出,像是由Cray、CAPS、Nvidia 和PGI提出的OpenACC (Open Accelerators)平行編程標準。在本論文中,我們將OpenACC的編譯資訊標註在循序程式(Sequential Program)中之平行計算區域,並將這些帶有資訊的OpenACC程式自動轉換為OpenCL程式,以降低程式設計者為異質多處理器平台撰寫程式的困難度。我們將此設計實作在Clang前端,將OpenACC程式轉換為附有OpenACC節點的抽象語法樹,再利用此架構產生LLVM中間表示碼,包含Host LLVM IR及SPIR核心函式,並利用LLVM的優化器進行優化。而後再利用LLVM之即時編譯器(Just-in-time Compiler, JIT compiler)執行Host LLVM IR。我們選擇了八個包含OpenACC編譯資訊的數學運算程式,與原來的C程式作比較,平行度越高的程式其效能改善越好;此外,與人工撰寫OpenCL版本的程式及PGI OpenACC compiler編譯的程式相比,效能提升約1.07倍及0.93倍。因此,我們提出的方法減輕了使用者撰寫異質多處理器平台程式的困難度,而轉換後的程式具有可攜性且有不錯的程式效能。 Modern multiprocessor platform architectures have become the mainstream of the development trend in computer platforms, and parallel processing is one of the main approaches for enhancing program performance. Multiprocessor platforms may be divided into homogeneous and heterogeneous multiprocessor platforms. In general, highly parallelized programs executed on heterogeneous multiprocessor platforms may get better performance than homogeneous ones, but the programming of these programs is more complex and difficult. In recent years, a variety of parallel programming standards have been proposed. OpenCL (Open Computing Language) standard is released by the Khronos Group, and it is applied in a variety of heterogeneous multiprocessor platform. However, programming on heterogeneous multiprocessor platforms using OpenCL is still complex and error-prone. Therefore, more standards for parallel programming have been proposed to simplify the programming on heterogeneous multiprocessor platforms. OpenACC (Open Accelerators) is one of the programming standards and it is developed by Cray, CAPS, Nvidia and PGI. In this thesis, we use the OpenACC pragma to mark in the parallel region of a sequential program, and then translate the OpenACC programs into OpenCL programs automatically to reduce the difficult of programming on heterogeneous multiprocessor platforms for programmers. We implement our design on Clang front-end to construct the AST with OpenACC node and use the AST to generate the LLVM IR to separate the program into host LLVM IR and SPIR kernel. Then, it is optional to optimize these LLVM IR by LLVM optimizer and execute the host LLVM IR by LLVM JIT-compiler to get the execution result finally. We select eight benchmarks of OpenACC version and compare the translated benchmarks with the corresponding sequential ones. We conclude that programs with higher degree of concurrency get better performance. In addition, compared with manual OpenCL version of the benchmarks and the program compiled by PGI OpenACC compiler, the program performance of our translated programs have about 1.07x speedup and 0.93x speedup. Therefore, our design may reduce the difficulty of writing the programs in heterogeneous multiprocessor platform and the translated OpenCL programs are portable and have good program performance. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT070256068 http://hdl.handle.net/11536/127407 |
顯示於類別: | 畢業論文 |