標題: 針對分散式暫存器架構之高階合成技術
High-Level Synthesis on Various Distributed Register Architectures
作者: 陳嘉怡
Chen, Chia-I
黃俊達
Huang, Juinn-Dar
電子研究所
關鍵字: 高階合成;分散式暫存器架構;效能導向;位置感知;三維可程式邏輯陣列;架構探索;high-level synthesis;distributed-register archtecture;performace-driven;location-aware;3D FPGA;architectural exploration
公開日期: 2012
摘要: 當製程進入深次微米(deep submicron)的時代,導線的時間延遲逐漸無法被忽略並成為決定系統效能的重要關鍵。分散式暫存器架構(distributed register based architectures)是解決長導線問題的方法之一,其基本原理在於將多數的連線集中在區域性的叢集(local clusters)裡,並盡量減少長導線的數目。在這份論文裡,將從水平和垂直的面向來完整討論各式不同的分散式暫存器架構和其相對應的合成方法。 在水平面向的分散式暫存器架構裡,依據其所使用叢集間連線模型(interconnect delay models)的不同可以分為三種:忽略叢集間連線時間延遲(zero)、單位時間的叢集間時間延遲(unit)和以位置感知的叢集間時間延遲(location-aware)。首先,在這份論文中的第一部分,針對第一種架構提出了一個新的資源限制的通訊合成演算法(resource-constrained communication synthesis algorithm),主要作法在於同時最佳化叢集間連線(inter-island connections)和效能(latency)。 接著,為了使叢集間連線模型的設定更貼近實際應用層面,此份論文的第二部分提出了採用「以單位時間為叢集間時間延遲」為連線模型的分散式暫存器架構。一個以效能為導向的架構合成流程(performance-driven architectural synthesis framework)也同時被建立。在該流程中,各項可以決定成果質量(quality of results)的因素在合成過程中都被仔細考量以得到最佳的合成結果;而叢集間傳遞數目同時也可以作為功率消耗的一個指標。 在此份論文的第三部分,針對採用「以位置感知的叢集間時間延遲」為連線時間模型的分散式暫存器架構,詳述了一個同時採用階層式和迭代式(hierarchical and iterative strategy)的合成流程,對系統效能和資源的使用進行最佳化。實驗結果顯示,和先前研究相比,於系統效能方面有13%的改善,並同時可以節省33%的資源使用。 除了水平面向的分散式暫存器架構外,本份論文在最後一個部份也討論了垂直面向的分散式暫存器架構。針對不同的三維連線模式作垂直連線分佈上的探索以得到一個在面積和能效上有較佳平衡的架構。同時由於晶片溫度在三維積體電路中是一個很重要的課題,本論文同時也對該架構進行溫度的分析和比較。最後所提供的普適性三維可程式邏輯陣列的垂直連線分佈架構,可以在幾乎沒有效能的損耗下節省52%的面積。
In deep submicron era, the wire delay is no longer negligible and is becoming a dominant factor of system performance. Distributed register (DR) based architectures, which try to keep most interconnects local within a cluster and thus minimize the number of long interconnects, is one of the state-of-the-art solution to cope with the increasing wire delay. In this dissertation, various DR-based architectures and the synthesis framework, from the point views of horizontal and vertical, are discussed thoroughly. In horizontal DR-based architecture, synthesis flows targeting DR-based architectures can be classified according to interconnect delay models they adopt: zero, unit and location-aware. First, a new resource-constrained communication synthesis algorithm is proposed for optimizing both inter-island connections (IICs) and latency targeting on DR-based architecture assuming zero inter-cluster delay. Then a DR-based architecture with unit inter-island delay is proposed to be more practical; a performance-driven architectural synthesis framework targeting this architecture is also developed. Several factors for evaluating the quality of results (QoR) are adopted as the guidance while performing architectural synthesis for better optimization outcomes. The experimental results show that the latency and the number of IITs can be reduced by 27% and 38% on average; and the latter is commonly regarded as an indicator for power consumption of on-chip communication. In the third part, a fully location-aware DR-based delay model is considered. Targeting on such a platform, we propose an architectural synthesis flow, which adopts a hierarchical scheme for performance optimization as well as an iterative strategy for hardware resource minimization. The experimental results show that the work does achieve better synthesis outcomes with a 14% higher system performance and a 33% less resource requirement than the prior arts. In addition to the horizontal DR-based architectures, the vertical DR-based architecture is also covered here. In this dissertation, we discover new architectures of vertical links that can achieve a better balance between area and delay. The thermal analysis is also conducted for comparisons and evaluations among the proposed architectures since the thermal issue is considered as one of the most critical challenges in 3D designs. Finally, we recommend several configurations as generic 3D FPGA architectures, which can save up to 52% area with virtually no delay penalty.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079411606
http://hdl.handle.net/11536/40708
顯示於類別:畢業論文