特殊應用計算加速器設計之研究

標題:	特殊應用計算加速器設計之研究 A Study of Application-Specific Compute Accelerator (ASCA) Design
作者:	吳奕緯 Wu, I-Wei 單智君鍾崇斌 Shann, Jyh-Jiun Chung, Chung-Ping 資訊科學與工程研究所
關鍵字:	特殊應用計算加速器;特殊應用運算型態;計算機架構;編譯器;application-specific compute accelerator;application-specific operation pattern;computer architecture;compiler
公開日期:	2013
摘要:	為了滿足嵌入式裝置在效能上需求，目前有不少研究針對處理器架構提出改善，其中特殊應用計算加速器設計（application-specific compute accelerator ，ASCA）就是一種常見的方法。ASCA就是在原本的處理器中增加額外的功能單元（function unit，FU），來加速某些特定的運算型態，在此稱為特殊應用運算型態（application-specific operation pattern，ASOP）。換言之，ASCS就是由一到數個ASOP所建構出來的FU，而ASOP則是從一到數支應用程式中所搜尋出來。由於採用ASCA會增加處理器的硬體成本，因此如何在不損及過多效能的前提下降低ASCA的硬體成本將是一個重要的研究議題。一般而言，為了滿足不同的執行效能與硬體成本的需求，ASCA通常會有不同的硬體實作方案。在這些的方案中，有部分的方案是具有相同的效能提升幅度，但是卻有著不同的硬體成本。因此，我們在本論文的第一部分提出了一個結合硬體設計空間探索技術（hardware design space exploration）的ASOP搜尋演算法（ASOP exploration algorithm）。此演算法除了可以從給定的程式中找出合適的ASOP外，還可以針對每個ASOP找出合適的硬體實作方案。此外，此法與先前的研究相比，在硬體面積效率（area efficiency）上有著滿顯著改善。除了ASCA技術外，另一個經常用來提升處理器性能的方法就是多分發（multiple-issue）技術。然而，目前大部分的ASOP搜尋演算法都是針對單分發（single-issue）處理器架構所開發。對此，我們在本論文的第二部分中將針對multiple-issue處理器架構提出一個新的ASOP搜尋演算法。不同於之前針對single-issue處理器架構所開發的ASOP搜尋演算法，在設計給multiple-issue處理器架構時，我們認為有兩個現象是不能被忽視：（1）僅有特定的運算是適合放到ASOP中，如：位在關鍵路徑（critical path）上的運算、及會因硬體資源不足而需延遲執行的運算等；（2）當某些運算被放到ASOP後，那些尚未放到ASOP的運算，其特性可能會因此而變化，也就是從不是位在critical path，變成在critical path上，或會因硬體資源不足而需延遲執行，變成不會有此現象發生。因此，在我們所提出的演算法中，我們會將這兩個現象考慮進來。根據模擬的結果顯示，我們所提出的演算法是比其他沒有考慮這兩個現象的方法，在效能上有不錯的改善幅度。本論文的第三部分主要是根據由上一部分所產生的ASOP來建構ASCA的架構，並探討ASCA的排程機制。為了讓更多的運算可以同時在ASCA上執行，因此我們在提出的ASCA建構演算法中會將數個平行的ASOP合併成一個，並以這些合併後的ASOP來建構ASCA的架構。此外，為了提升處理器上所有計算單元的使用率，我們在第三部分的另一主題就是設計一個新的ASCA開採（ASCA exploitation）演算法，讓程式中所有的運算可以同時在ASCA與處理器原本的FU上執行。 In order to satisfy the growing demand for high-performance computing in modern embedded devices, several architectural and micro-architectural enhancements have been implemented in processor architectures. Application-specific compute accelerator (ASCA) is an effective approach to improve the processor performance without tremendous modification in its core architecture. ASCA is a special and extra functional unit within the base processor and used to accelerate one or several specified applications. ASCA is usually generated from a set of frequently executed operation pattern, called application specific operation pattern (ASOP), explored from one or several target applications. Since ASCA would increase the implementation cost of the processor core, minimizing the area cost of ASCA without or with a little performance degradation would become an important research issue. Because of different requirements in space and speed, ASCA usually has multiple hardware implementation options. Under pipeline-stage timing constraint, some options could achieve the same speedup but different implementation costs. As a consequence, we proposed an ASOP exploration algorithm with integrated hardware design space exploration to explore not only ASOP but also its hardware implementation option. Compared with the previous research, our approach resulted in significant improvement in area efficiency. Except for ASCA, issuing multiple instructions is a common approach to improve the performance of processor core. Nevertheless, the impact of combining both of these approaches in the same design is not well understood. While previous studies have shown that ASCA can potentially improve performance in some applications on certain multiple-issue architectures, the algorithms used to identify ASOP for multiple-issue architectures yield only limited performance improvement. This is because not all arithmetic operations are suited for ASOP for multiple-issue architectures. To explore the full potential of ASCA for multiple-issue architectures, two important factors need to be considered: (1) the execution performance of an application is dominated by critical (located on the critical path) and highly resource contentious (having a high probability of being delayed during execution due to hardware resource limitations) operations, and (2) an operation may become critical and/or highly resource contentious after some operations are added to the ASOP. The second topic of this thesis presented an ASOP exploration algorithm for multiple-issue architectures that focuses on these two factors. Simulation results show that the proposed algorithm outperforms previously published algorithms. According to the ASOPs generated in the second topic, the way of constructing ASCA architecture is addressed in the first issue of the third topic in this thesis. To make more operations to execute on the ASCA simultaneously, the proposed ASCA construction algorithm merges several data-independent ASOPs to construct the ASCA. After generating the ASCA architecture, the final phase in ASCA design is ASCA exploitation. Because of the area cost limitation, the ASCA generated in previous phase may not support all ASOPs. Accordingly, ASCA exploitation is to determine which operation should be executed on the ASCA and to schedule the execution cycle for each operation. Compared with previous works, the proposed one achieves a further speedup by scheduling operations on ASCA and the FUs of the base processor simultaneously. This issue were addressed in the third topic of this thesis.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#GT079323808 http://hdl.handle.net/11536/75762
顯示於類別：	畢業論文