標題: | 矩陣預取緩衝器架構之快速傅利葉轉換處理器設計 The matrix prefetch buffer based FFT processor |
作者: | 林昱偉 Yu-Wei Lin 李鎮宜 Dr. Chen-Yi Lee 電子研究所 |
關鍵字: | 快速傅利葉轉換;快速傅利葉轉換處理器;FFT;FFT processor |
公開日期: | 2002 |
摘要: | 正交多工分頻技術(OFDM)提供一個有效的方法來消除多路徑通道中的頻率選擇性衰減。由於許多先進的數位訊號處理應用於此系統,因此正交多工分頻系統的運算複雜度相當高,其中傅立葉轉換處理器就是其中一個高運算複雜度的模組。傅立葉轉換的運算複雜度是正比於它的長度,在數位影音廣播系統裡,傅立葉轉換的點數高達8192點,此點數是所有無線正交多工分頻技術應用中最長的點數,運算複雜度也是最高的。因此應用於數位影音廣播系統的傅立葉轉換處理器無論是功率消耗和硬體成本都相當大。在本論文中,我們提出一個傅立葉轉換處理器不但能符合數位影音廣播系統的要求並且能有效的減少功率消耗和硬體成本。
經過傅立葉轉換處理器功率消耗的分析,我們發現記憶體存取和複數乘法器運算的功率消耗共占整個處理器的75%。因此我們提出用矩陣預取緩衝器來減少記憶體存取,同時利用高基數的傅立葉轉換演算法來減少複數乘法運算並提出改良式的蝴蝶運算器和重新排程的方法。藉著這些方法,我們能更有效率的實現基數-8的傅立葉轉換演算法。此外,在記憶體方面,透過主記憶體、矩陣預取緩衝器、蝴緤運算器,三者之間適當的排程,能成功的使用單一埠型式的記憶體,並且沒有任何效能的衰減。利用此方法,能大量的減少主記憶體的功率消耗和硬體成本。
我們已經使用TSMC 0.35 um 1P4M CMOS 的製程實現此設計。傅立葉轉換處理器所占的面積為6.1 x 6.1 mm2 包含8192 x 24 位元的記憶體。我們所提出的傅立葉轉換處理器功率消耗只有 250 mW,約其他傅立葉轉換處理器的40%。邏輯閘總數為260,000,約比別人少56%。處理器在20 Mhz的工作頻率下,運算8192點時所需的時間為718.35 us,能符合數位影音廣播系統的要求。 Orthogonal Frequency Division Multiplexing (OFDM) technique provides an efficient way in overcoming a multipath-fading environment. Because of lots of advanced digital signal processing are used in this system, the computational complexity is high. FFT is one of the key components with higher computational complexity. Since the computational complexity of FFT processor is proportional to its length, 8 K-point FFT for DVB standard, which is the longest FFT length in all wireless OFDM applications, has the highest computational complexity. Therefore, both the power consumption and hardware cost of a FFT processor are very large. In this thesis, we propose a FFT processor which not only can meet DVB standard but also reduce power consumption and hardware complexity. After analyzing the power distribution in a FFT processor, we find that the memory access and the operation of complex multiplier consume almost 75% of the power. Hence we propose the matrix prefetch buffer scheme to prefetch data and to reduce memory access. Then, we also use higher radix algorithm to reduce complex multiplications. On the other hand, in order to reduce the hardware cost of the butterfly unit of high radix, we propose a modified butterfly unit and rescheduling method. By these techniques, we can implement radix-8 algorithm more efficiently. Besides, main memory can implement in single-port type without the performance degradation by the appropriate schedule among data transfer in main memory, the matrix prefetch buffer, and butterfly unit. With this approach, we can save hardware cost and power consumption in main memory. We have realized the design with TSMC 0.35 um 1P4M CMOS technology. The die area of the FFT processor is 6.1 X 6.1 mm2 including 8192 X 24-bit memory. The power consumption of our processor is only 250mW, about 40% of that of other processors. And the total gate counts are about 260,000. It is less than others by 56%. The execution time of our processor is 718.35 us in 8k modes operating at a work frequency of 20 Mhz. Therefore, it meets the DVB standard. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#NT910428154 http://hdl.handle.net/11536/70484 |
Appears in Collections: | Thesis |