標題: 應用於正交分頻多工系統的快速傅立葉轉換處理器之研究
The Study of FFT Processors for OFDM Systems
作者: 林昱偉
Yu-Wei Lin
李鎮宜
Chen-Yi Lee
電子研究所
關鍵字: 快速傅立葉轉換處理器;正交分頻多工;FFT;OFDM
公開日期: 2005
摘要: 正交分頻多工技術 (OFDM) 提供一個有效的方法來消除多路徑通道中的頻率選擇性衰減。由於許多先進的數位訊號處理應用於此系統,因此正交分頻多工系統的運算複雜度相當高,其中快速傅立葉轉換處理器就是其中一個高運算複雜度的模組。當我們在設計快速傅立葉轉換處理器時需考慮快速傅立葉轉換處理器的規格會隨著不同的正交分頻多工系統而有所差異。本論文主要是著重在DVB-T, UWB, IEEE 802.11n 這三個正交分頻多工系統,同時提出三個新式快速傅立葉轉換處理器的架構應用於此三個系統中。 在DVB-T 系統中,我們提出一個8192點快速傅立葉轉換處理器,其具有3-step radix-8 快速傅立葉轉換演算法、新式動態調整的機制及矩陣預取緩衝器。利用所提出的動態調整的機制在 8K點的快速傅立葉轉換中約可省下64 K 位元的記憶體空間。此外,透過資料的排程 (scheduling) 及矩陣預取緩衝器 (matrix prefetch buffer)的架構,單埠記憶體可被採用於所提出的架構中且無任何效能的衰減。一個符合8K模式之DVB-T的測試晶片已被設計及用0.18 um 1P6M CMOS的製程實現。此處理器的核心面積為 4.84 mm2。在操作在20 MHz下,功率消耗約為25.2 mW。 在UWB系統中,所提出的管線式快速傅立葉轉換架構名稱為Mixed-Radix Multi-Path Delay Feedback (MRMDF),其可藉由多條路徑的架構而提供較高的throughput rate。此外,藉由延遲迴授及資料排程,在MRMDF架構下的所需的記憶體及複數乘法硬體複雜度約為已知架構的38.9 %及44.9%。high-radix快速傅立葉轉換演算法也實現在所提出的處理器中以減少複數乘法器的數目。一個符合UWB系統的測試晶片已被設計及用0.18 um 1P6M CMOS的製程實現。此處理器的核心面積為 1.76 x 1.76 mm2, 此面積包含處理器及測試模組。所製作的快速傅立葉轉換處理throughput rate 可達到 1G sample/s,此時的功率消耗為175 mW,當其throughput rate符合UWB的規範時(409.6 MS/s), 其功率消耗為77.6 mW。 在IEEE 802.11n系統中,所提出的處理器不但能支援64點及128點的運算也能同時處理1~4個資料序列以符合IEEE 802.11n的要求。此外,與傳統4個處理器的方式相比,我們提出的架構具有較低的硬體複雜度。所提出的快速傅立葉轉換處理已用0.13 um 1P8M CMOS的製程來設計,此處理器的核心面積為 660 x 2142 um2, 此面積包含處理器及測試模組。當工作頻率操作在 40 MHz下,所提出的處理器可在3.2 us的時間內計算4個長度為128點的獨立資料序列。
Orthogonal Frequency Division Multiplexing (OFDM) technique provides an efficient way to overcome a multipath-fading environment. Because lots of advanced digital signal processing is used in this system, the computational complexity of OFDM is high. The Fast Fourier Transform (FFT) is one of the highest computational components. The specification of FFT which varies with different OFDM systems must be considered when we design the FFT processor. In this dissertation, we focus on three OFDM systems such as digital video broadcasting – territorial (DVB-T), Ultra-wideband (UWB), and IEEE 802.11n and propose three novel FFT architectures for these applications. In a DVB-T system, an 8192-point FFT processor, in which a 3-step radix-8 FFT algorithm, a new dynamic scaling approach, and a novel matrix prefetch buffer are proposed. About 64 K bit memory space can be saved in the 8 K-point FFT by the proposed dynamic scaling approach. Moreover, with data scheduling and pre-fetched buffering, single-port memory can be adopted without degrading throughput rate. A test chip for 8 K mode DVB-T system has been designed and fabricated using 0.18 um single-poly six-metal (1P6M) CMOS process with core area of 4.84 mm2. Power dissipation is about 25.2 mW at 20 MHz In a UWB system, the proposed pipelined FFT architecture called Mixed-Radix Multi-Path Delay Feedback (MRMDF) can provide higher throughput rate by using the multi-data-path scheme. Furthermore, the hardware costs of memory and complex multiplier in MRMDF are only 38.9% and 44.8 % of those in the known FFT processor by means of the delay feedback and the data scheduling approaches. The high-radix FFT algorithm is also realized in our processor to reduce the number of complex multiplications. A test chip for UWB system has been designed and fabricated using 0.18 um 1P6M CMOS process with core area of 1.76 x 1.76 mm2, including an FFT/IFFT processor and a test module. The throughput rate of this fabricated FFT processor is up to 1 G sample/s while it consumes 175 mW. Power dissipation is 77.6 mW, when its throughput rate meets UWB standard in which the FFT throughput rate is 409.6 M sample/s. In a IEEE 802.11n system, the proposed processor based on multi-data-path scheme not only supports the operation of FFT/IFFT in 128 points and 64 points but also can provide the different throughput rates for 1~4 simultaneous data sequences to meet IEEE 802.11n requirements. Furthermore, less hardware complexity is needed in our design compared with the traditional four-parallel approach. The proposed FFT/IFFT processor is designed in a 0.13um single-poly and eight-metal CMOS process. The core area is 660 x 2142 um2, including an FFT/IFFT processor and a test module. At the operation clock rate of 40 M Hz, our proposed processor can calculate 128-point FFT with four independent data sequences within 3.2 us.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009211824
http://hdl.handle.net/11536/67924
Appears in Collections:Thesis