標題: 針對矩陣運算應用之高通量處理器設計
Design of Application Specific Throughput Processor for Matrix Operations
作者: 吳秉儒
Wu, Ping-Ju
賴伯承
Lai, Bo-Cheng
電子工程學系 電子研究所
關鍵字: 嵌入式系統;矩陣運算;系統設計框架;embedded system;matrix operations;system design framework;OpenCL
公開日期: 2013
摘要: 在現在的科學運算領域之中,矩陣的運算常常被用於各種不同的應用,包括訊號處理、機器學習,以及數值分析。由於程式的多樣性,矩陣運算被廣泛的運用到許多不同的硬體架構,這些硬體包含了高效能的超級電腦到硬體資源十分限制的移動裝置。近年來,移動裝置的興起以及降低能源消耗的需求讓低功耗矩陣運算的需求開始出現。然而,除了相似於網格運算的行為,矩陣運算的資料存取方式已經讓它成為運算中的主要瓶頸之一。若要有效率的支援矩陣的運算,系統需要特殊的記憶體存取規則並符合特定的矩陣存取模式。在本篇論文中,我們設計了一個整合了硬體模組和軟體的系統,並可以針對矩陣的運算來做加速並減少其資料存取時的工作量。在我們的多核心嵌入式系統中,如果引入硬體的模塊將可以改善整體的表現達 24.09 個百分比。除了硬體設計之外,我們同時設計了一個可以用來快速搭建硬體架構的框架,內部包含了可以用來做硬體模組驗證的模擬環境,以及可以用來與硬體平台溝通的 OpenCL 環境。
Of modern computation routines, matrix operations are frequently used in many scientific realms including signal processing, machine learning, and numeric analysis. Due to the versatility, matrix operations are performed on a wide spectrum of computing platforms, ranging from high performance supercomputers to resource constrained embedded devices. Nowadays, the popularity of mobile devices along with the need for energy efficiency necessitates the low power execution for matrix operations. In additional to the grid-like computation behavior, data accesses have become one of the main overheads for matrix operations. A system needs customized memory access mechanism to efficiently support specific data access patterns of matrix operations. In this thesis, we introduce an integrated system, including software stacks and hardware modules, that can accelerate matrix operations and reduce data access overhead. With the proposed hardware module, the performance of our multicore embedded platform can improve up to 24.09%. Besides the hardware design, we also develop a framework that can facilitate the prototyping of embedded system designs, including functional verification of hardware modules as well as co-simulation with high level OpenCL language.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT070150217
http://hdl.handle.net/11536/74759
顯示於類別:畢業論文