Low-power and High-performance FFT Processors for OFDM Communication Systems

標題:	Low-power and High-performance FFT Processors for OFDM Communication Systems 應用於正交分頻多工通訊系統的低功率高效能快速傅立葉轉換處理器
作者:	陳元 Yuan Chen 李鎮宜 Chen-Yi Lee 電子研究所
關鍵字:	快速傅立葉轉換;正交分頻多工;區塊調整;多管線快取記憶體;動態電壓頻率調整;FFT;OFDM;block scaling;multi-pipelined cached-memory;DVFS
公開日期:	2008
摘要:	正交分頻多工(OFDM)技術，已被廣泛地應用在許多無線通訊系統，如DAB、DVB-T/H、WiMAX、IEEE 802.11a/g/n以及UWB中。由於具有高頻譜效率與抗多路徑干擾(multipath interference)的特性，正交分頻多工成為先進通訊系統中最重要的技術之ㄧ。而在正交分頻多工的傳收機(transceiver)中，快速傅立葉轉換處理器(FFT processor)是負責訊號解調(demodulation)的關鍵模組。因為本身的高運算複雜度，快速傅立葉轉換處理器通常佔據很大比率的系統功率預算。但由於許多正交分頻多工技術都是使用在無線可持應用中，因此需採用功率效能佳的快速傅立葉轉換處理器來延長電池使用時間。在這篇論文中，我們從演算法到硬體階層中提出數個新技術來完成低功率與高效能的快速傅立葉轉換處理器設計。為了展現這些構想的優點，我們也針對不同應用的需要，實作並分析了三個快速傅立葉轉換處理器。在第一個設計中，我們提出了一個針對WiMAX應用的兩路徑多輸入多輸出(MIMO)快速傅立葉轉換與反快速傅立葉轉換處理器。藉由新提出的區塊調整演算法與乒乓快取(cache)－記憶體架構達到降低功率消耗與硬體成本的目的，共可節省50%的記憶體存取次數與64K位元記憶體空間。此外經過適當的資料排程，提出之設計能提高硬體使用率，並可在2052個時脈周期內完成連續兩路徑之2048點快速傅立葉轉換與反快速傅立葉轉換。我們將此2048點快速傅立葉轉換與反快速傅立葉轉換處理器使用聯電0.13 µm 1P8M製程實現，核心面積為1332×1590 µm2，訊號量化雜訊比(SQNR)在QPSK與16/64-QAM輸入下均超過48 dB。在時脈速度為22.8 MHz時(支援WiMAX規範最高產出率)，連續兩路徑之2048點快速傅立葉轉換在1.2伏特下功率消耗約為25.6 mW。在第二個設計中，我們提出一個適於高速低功率應用的多管線(multi-pipelined)快取記憶體快速傅立葉轉換處理器。藉由提出的多管線架構與資料排程機制，可減少一半的記憶體存取次數以降低功率消耗。且其蝶形運算單元(BU)的使用率也較傳統多路徑延遲回授(MDF)方式為高。我們將此4096點快速傅立葉轉換處理器使用聯電90 nm 1P9M製程實現，處理速度可達到8 Gsample/s，核心面積為1760×2650 µm2，訊號量化雜訊比在QPSK與16-QAM輸入下均超過37.2 dB。8 Gsample/s的4096點快速傅立葉轉換運算在1.0伏特下功率消耗約為1055 mW。相較於之前的高速快速傅立葉轉換晶片，我們所提出的方案至少可增加16%的能量效率(energy efficiency)。在最後的設計中，我們提出一個應用於多輸入多輸出正交分頻多工之動態電壓頻率調整(DVFS)快速傅立葉轉換處理器。藉由新提出的多模多路徑延遲回授(MMDF)架構，此處理器可採用最低時脈頻率完成1~8路徑的256點快速傅立葉轉換或單一高速的256點快速傅立葉轉換以支援動態電壓頻率調整運作。除此之外，我們也提出新的開路電壓偵測與調整(OLVDS)技術來達成快速且可靠的電壓控制。藉由這些機制，我們設計的快速傅立葉轉換處理器可在不同狀態下操作在適當的電壓與頻率以達到功率感知(power-aware)要求。我們將此256點快速傅立葉轉換處理器使用聯電90 nm 1P9M製程實現，核心面積為1880×1880 µm2，訊號量化雜訊比在QPSK與16-QAM輸入下均超過35.8 dB。2.4 Gsample/s的256點快速傅立葉轉換運算在0.85伏特下功率消耗約為119.7 mW。而晶片在TT區域時，電壓調整技術依不同的工作模式可節省18%到43%的功率消耗。 Orthogonal frequency division multiplexing (OFDM) technology has been widely adopted in many wireless communication systems such as DAB, DVB-T/H, WiMAX, IEEE 802.11a/g/n, and UWB. The properties of high bandwidth efficiency and excellent multipath immunity have made OFDM become one of the most promising technologies in the advanced communication systems. In an OFDM transceiver, the fast Fourier transform (FFT) processor is the key component for signal demodulation. Due to the inherently high computational complexity, an FFT processor often consumes a large percent of system power budget. Since many OFDM systems are designed for wireless portable applications, the design of power-efficient FFT processors is demanded to increase the battery life. In this dissertation, several new techniques from algorithm to hardware level are proposed for low-power and high-performance FFT design. To demonstrate these proposed ideas, three FFT designs for different applications are also implemented and analyzed. In the first design, a two-stream multiple-input multiple-output (MIMO) FFT/IFFT processor for WiMAX applications is presented. A novel block scaling method and a new ping-pong cached-memory architecture are proposed to reduce the power consumption and hardware cost. With these schemes, half the memory accesses and 64-Kbit memory can be saved. Furthermore, by proper scheduling of the two data streams, the proposed design achieves better hardware utilization and can process two 2048-point FFTs/IFFTs consecutively within 2052 cycles. A test chip of the proposed 2048-point FFT/IFFT processor has been designed using UMC 0.13 µm single-poly eight-metal (1P8M) CMOS process with a core area of 1332×1590 µm2. The SQNR performance of the 2048-point FFT/IFFT is over 48 dB for QPSK and 16/64-QAM modulations. Power dissipation of two 2048-point FFT computations is about 25.6 mW (1.2 volt) at 22.8 MHz which meets the maximum throughput rate of WiMAX applications. In the second design, a novel multi-pipelined cached-memory FFT processor for high-throughput and low-power applications has been presented. By the proposed multi-pipelined architecture and data scheduling scheme, half the memory accesses can be eliminated for low power. Besides, the utilization of butterfly units (BUs) is also increased compared to the traditional multipath delay feedback (MDF) structure. A test chip of the proposed 4096-point FFT processor has been designed using UMC 90 nm single-poly nine-metal (1P9M) CMOS process to achieve 8 Gsample/s processing rate. The core area of this chip is 1760×2650 µm2. The SQNR performance of this FFT processor is over 37.2 dB to support QPSK/16-QAM modulation. Power dissipation of 8 Gsample/s 4096-point FFT computations is about 1055 mW at 1.0 volt. Compared to the previous high-throughput FFT chip, our proposal has at least 16% improvement in energy efficiency. The last design presents a new dynamic voltage and frequency scaling (DVFS) FFT processor for MIMO OFDM applications. By the proposed multimode multipath delay feedback (MMDF) architecture, our FFT processor can process 1~8-stream 256-point FFTs or a high-speed 256-point FFT at minimum clock frequency for DVFS operations. Furthermore, a novel open-loop voltage detection and scaling (OLVDS) mechanism is proposed for fast and robust voltage management. With these schemes, the proposed FFT processor can operate at adequate voltage/frequency under different configurations to support the power-aware feature. A test chip of the proposed 256-point FFT processor has been fabricated using UMC 90 nm single-poly nine-metal (1P9M) CMOS process with a core area of 1880□1880 µm2. The SQNR performance of this FFT chip is over 35.8 dB for QPSK/16-QAM modulation. Power dissipation of 2.4 Gsample/s 256-point FFT computations is about 119.7 mW at 0.85 volt. Depending on the operation mode, power can be saved by 18%~43% with voltage scaling in TT corner.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#GT009311840 http://hdl.handle.net/11536/78181
Appears in Collections:	Thesis