標題: | 以CUDA 為基礎快速靜態時序分析引擎以及其應用 A Fast CUDA-Based Static Timing Analysis (STA) Engine and Its Application |
作者: | 王鉉崴 Wang, Hsuan-Wei 溫宏斌 Wen, Hung-Pin 電機工程學系 |
關鍵字: | 圖形處理器;靜態時序分析;平行;GPU;STA;Parallel |
公開日期: | 2013 |
摘要: | 圖形處理器使得平行運算可以實作在靜態時序分析上。由於圖形處理器上的有數百個核心,能夠比平常的平行硬體有更多的加速效果。但manycore 使得在記憶體存取以及同步處理上有了困難性,限制了圖形處理器的加速能力。因此,本論文提出一個以CUDA 為基礎快速靜態時序分析引擎,此引擎中運用了依邏輯閘類型排序的分層法、邏輯閘訊號結構重建、表格索引重組以及硬體加速渲染的機制來處理記憶體存取時間過長的問題。依邏輯閘類型排序的分層法將分層後同一層的邏輯閘依類型排好,使多核心處理器中每個核心所負責處理相同類型的閘,就只需讀取一種閘類型的資料;邏輯閘訊號結構重建將幾個小訊號打包成一個符合圖形處理器一次存取的量,提高記憶體吞吐量;表格索引重組後能使用紋理暫存器共同存取更多資料;而硬體加速渲染擴展表格,讓查找表所表示的範圍足以負責整個訊號域,只需做內插法而不需使用外插法而產生執行分枝。實驗結果表示,此論文所提出的以CUDA 為基礎快速靜態時序分析引擎比CPU 版本快上12.85 倍,在最大電路netcard 上,有29.35 倍的加速。與商用軟體primetime,能有3229 倍加速,netcard 上更有8117 倍的加速效果。 Graphics processing unit (GPU) enables the possibility of parallel computing for Static Timing Analysis (STA). However, memory access and synchronization between cores has become more difficult in STA and thus its algorithm needs to be re-designed. In this work, we developed a CUDA-based STA engine that incorporates cell levelization and type sorting (CLTS), timing table restructuring (TTR), table indexing by texture (TIT) and hardware-accelerated rendering (HAR) for high-parallelism. Cell levelization and type sorting (CLTS) levelize cells and sort their types in order to efficiently access the same timing library. Timing table restructuring (TTR) modifies signal structure of one cell to increase the throughput. Table indexing by texture (TIT) combines the axes of each table to access data jointly while hardware-accelerated rendering (HAR) expands look-up tables (LUTs) without extrapolation. As result, our fast CUDA-based STA engine shows an average of 12.85X speedup on experimental circuits over the CPU version. The proposed work outperformed PrimeTime in speedup by three orders of magnitude. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT070050706 http://hdl.handle.net/11536/73384 |
顯示於類別: | 畢業論文 |