標題: LLVM編譯器框架上之被註解低階虛擬機器中間表示碼的OpenCL核心函數抽取
Kernel Function Extraction from annotated LLVM IR for OpenCL on LLVM Compiler Framework
作者: 紀佳佑
Chi, Cha-You
單智君
Shann, Jyh-Jiun
資訊科學與工程研究所
關鍵字: OpenCL;llvm;轉換器;LLVM IR;annotated LLVM IR;OpenCL;llvm;Translator;LLVM IR;annotated LLVM IR
公開日期: 2013
摘要: 在現代多處理器平台(Multiprocessor Platform)架構已經成為主流的發展趨勢下,平行處理是提升程式效能的主要方式之一。多處理器平台可分為同質(Homogeneous)與異質兩大類。一般而言,在異質多處理器平台上執行程式比同質多處理器平台有機會得到更好的效能,但是程式撰寫上卻比較困難且複雜。近年來,各種平行編程標準被提出,其中由Khronos Group釋出的OpenCL (Open Computing Language) 標準,可以被應用在各種不同的異質多處理器平台上。然而使用OpenCL撰寫異質多處理器平台程式依然複雜且易出錯。因此在本論文中,我們提出一種註解格式來表示循序程式(Sequential Program)的平行性相關資訊,並將附帶這些註解的循序程式自動轉換為OpenCL程式,以降低程式設計者為異質多處理器平台撰寫程式的困難度。我們將此設計實作成LLVM編譯器框架裡的一個模組,稱為OpenCL核心函數抽取(Kernel Function Extraction)模組。將帶有註解的LLVM中間表示碼(Intermediate Representation, IR),稱為annotated LLVM IR,轉換成Kernel LLVM IR與Host LLVM IR,並由LLVM的優化器進行優化。而後,透過本實驗室之前設計的LLVM OpenCL Backend將Kernel LLVM IR轉成OpenCL核心函數(kernel function),再利用LLVM之即時編譯器(Just-in-time Compiler, JIT Compiler)執行Host LLVM IR。我們選擇九個包含在OpenCL跟CUDA benchmark的C版本數學運算程式,例如:矩陣乘法、向量相加等,加上註解後,透過我們的編譯器框架轉換出OpenCL程式。與原來的C程式作比較,平行度越高的程式其效能改善越好。此外,與這些程式的人工撰寫OpenCL版本比較,效能上差異不大。因此,我們所提出的模組與註解格式減輕了使用者撰寫OpenCL程式的困難度而轉換後的程式也有不錯的程式效能。
Modern multiprocessor platform architectures have become the mainstream of the development trend in computer platforms, and parallel processing is one of the main approaches for enhancing program performance. Multiprocessor platforms may be divided into homogeneous and heterogeneous multiprocessor platforms. In general, highly parallelized programs executed on heterogeneous multiprocessor platforms may get better performance than on homogeneous ones, but the programming of these programs is more difficult and complex. In recent years, a variety of parallel programming standards have been proposed. One of the programming standards is OpenCL (Open Computing Language) standard released by the Khronos Group, which can be applied in a variety of heterogeneous multiprocessor platforms. However, programming on heterogeneous multiprocessor platforms using OpenCL is still complex and error-prone. In this thesis, we propose an annotation format to represent the parallel information of a sequential program and design a module to automatically translate the programs attached with annotations into OpenCL programs to reduce the difficult of programming on heterogeneous multiprocessor platforms for programmers. We implement our design as a module in the LLVM compiler framework, called OpenCL kernel function extraction (KFE) module. The LLVM IR attached with annotations, called annotated LLVM IR, is translated into kernel LLVM IR and host LLVM IR by the KFE module, and then optimize these IR by LLVM optimizer. After that, we translate the optimized kernel LLVM IR into OpenCL kernel functions by LLVM OpenCL backend designed by our previous research, and the execute the host LLVM IR by LLVM JIT-compiler. We select nine benchmarks from OpenCL and CUDA SDK, such as matrix multiplication and vector addition, and attach annotations to these benchmarks. Then, we compare the translated benchmarks with the corresponding sequential ones, and conclude that programs with higher degree of concurrency get better performance. In addition, the program performance between the translated benchmarks and OpenCL version of the benchmarks are almost the same. Therefore, our proposed module and annotation format reduce the difficulty of writing OpenCL programs and the translated OpenCL programs have good program performance.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079955620
http://hdl.handle.net/11536/73994
顯示於類別:畢業論文