標題: 針對FIR與FFT演算法於超大型積體電路實作上之解析式面積最佳化技術
Analytical Area Optimization for VLSI Implementations of FIR and FFT Algorithms
作者: 林步青
Lin, Bu-Ching
周景揚
黃俊達
Jou, Jing-Yang
Huang, Juinn-Dar
電子工程學系 電子研究所
關鍵字: 數位訊號處理;快速傅利葉轉換;有限脈衝響應濾波嗎;多常數乘法;定點數;Digital signal processing;fast Fourier transform;finite-impulse-response filter;multiple constant multiplication;fixed-point
公開日期: 2013
摘要: 在過去幾十年中,隨著通信系統的複雜度急劇增加,數位訊號處理演算法被廣泛地採用,例如有限脈衝響應(FIR)濾波器和快速傅利葉轉換(FFT)。其中,多常數乘法(MCM) 是在處理輸入資料與常數的乘法時,使用一組加法器取代常規乘法器,其概念更是廣泛地被應用在有限脈衝響應濾波器的設計中。在過去,雖然已有很多降低加法器用量之演算法被提出以達到面績縮小的目的,但是,它們並未考慮每個加法器的實際位元數,而這將會導致估計的硬體成本不夠精確。因此這篇論文中,我們提出了一個保證位元數的多個常數乘法最佳化演算法,著重於最大限度地減少加法器的總位元數,而不是僅考慮減少加法器總數。首先,構建基於給定係數的子表達式圖表,繼而導出一組針對最小化加法器位元數之條件,最後使用整數線性規畫得到最佳化的結果。實驗結果顯示,該演算法的確可以有效地減少所需的加法器位元數並且優於所有的現行技術。 此外,快速傅利葉轉換處理器在眾多以數位訊號處理為基礎的系統中是一個核心的元件;例如,現代無線通信中的正交分頻多工(OFDM)。許多關鍵的設計參數,如架構,位元長度,和數字格式,都必須非常仔細地考慮。在過去的幾十年,針對不同的設計目標,已經有很多最佳化的管線式快速傅利葉轉換架構被提出。雖然固定的管線式快速傅利葉轉換架構能在合理的硬體成本下提供不錯的處理能力,但是,在針對需要大量處理能力的應用上,它可能仍然無法滿足效能的需求。因此在這篇論文中,我們提出了一種可擴展的多路徑延遲累積器式之快速傅利葉轉換架構及其相應的硬體設計產生器,在給定的處理能力條件下,它能夠迅速地產生對應的快速傅利葉轉換核心。實驗結果顯示,此方法所產生之快速傅利葉轉換器比現有的可折疊式多路徑延遲累積器式快速傅利葉轉換架構,面積更小且功率效率更高。 除此之外,我們亦提出了一個快速傅利葉轉換器最佳化的設計流程。在固定位元長度的條件下,正確的調整每一個蝴蝶級的定點表示之數值,以最大化輸出級的信號量化雜訊比(SQNR)。所提出的流程採用機率分佈模型來模擬每個階段的輸出信號的機率行為。由於量化和飽和運算所導致的雜訊可以靜態分析,以了解在進行縮放決策時的影響。因此,不需耗費時間的模擬分析,我們所提出的方法即可有效地決定每一個蝴蝶級的最適當的數字格式,從而最佳化整個輸出級的信號量化雜訊比。此外,建議的流程能夠處理各種快速傅利葉轉換點數、快速傅利葉轉換演算法、字元長度、以及輸入信號的機率分佈。實驗結果顯示,我們的方法可以在8192點且以2為基數的快速傅利葉轉器處理器中節省3位元的字元長度,而且不會對輸出級的信號量化雜訊比造成影響。使用我提出的靜態尺規最佳化的技術所創建的快速傅利葉轉器處理器的信號量化雜訊比可以近似於一個配有額外的動態尺規化方法,但不需要其額外龐大的硬體成本。
As the complexity of communication systems has grown dramatically in the past decades, digital signal processing algorithms are extensively used, such as finite-impulse-response filter (FIR) and fast Fourier transform (FFT). Meanwhile, the notion of multiple constant multiplication (MCM) is extensively adopted in FIR designs. A set of adders are used to replace regular multipliers for the multiplications between input data and constant filter coefficients. Though many algorithms have been proposed to reduce the total number of adders in an MCM block for area minimization, they do not consider the actual bitwidth of each adder, which may not estimate the hardware cost well enough. Therefore, we propose a bitwidth-aware MCM optimization algorithm that focuses on minimizing the total number of adder bits rather than the adder count. It first builds a subexpression graph based on the given coefficients, derives a set of constraints for adder bitwidth minimization, and then optimally solves the problem through integer linear programming (ILP). Experimental results show that the proposed algorithm can effectively reduce the required adder bit count and outperforms the existing state-of-the-art techniques. The FFT processor serves as one of core components in numerous DSP-based systems, such as OFDM in modern wireless communication. The key design parameters, such as architecture, wordlength, and number format, must be all considered very carefully. Many pipelined FFT architectures optimized for different objectives have been proposed in past few decades. Though a fixed pipelined FFT architecture can generally provide good throughput at reasonable hardware cost, it may still fail to meet the performance demand for throughput-hungry design cases. In this dissertation, we propose an expandable MDC-based FFT architecture as well as the corresponding hardware design generator, which is capable of automatically producing an FFT core under a given throughput constraint. The experimental results show that the proposed methodology can generate smaller and power-efficient implementations than the existing foldable MDC-based FFT architecture. Besides, in this dissertation, we also propose an optimization flow that properly scales fixed-point numeric values at each butterfly stage to maximize the output SQNR under a fixed wordlength constraint. The proposed flow utilizes probability distribution to model the probabilistic behavior of the output signal at each stage. The computation errors due to quantization and saturation operations are statically analyzed before making scaling decisions. Therefore, without a need of time-consuming simulation, our method can efficiently determine the most appropriate number format for each stage and thus optimize the overall output SQNR. Besides, the proposed flow is capable of handling various FFT sizes, FFT algorithms, wordlengths, and input signal distributions. Experimental results indicate that the wordlength can be reduced about three bits for an 8K-point radix-2 memory-based FFT processor without compromise in the output SQNR. Furthermore, the FFT processor created using our static scaling optimization technique can produce a comparable output quality as the one equipped with an extra dynamic number scaling unit, which requires significantly more hardware logic.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079511825
http://hdl.handle.net/11536/74563
Appears in Collections:Thesis


Files in This Item:

  1. 182501.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.