# Chapter 6

# FFT Chip Design Simulation and Analysis

In the previous chapter, we introduced our new architecture including, the radix-8 butterfly operation, variable length twiddle factor ROM table, and new address pointer generation for any length FFT. We designed a variable length (512/1024/2048/4096) real to complex FFT processor chip. In this chapter, we discuss and analyze the Matlab simulation. After Matlab simulation, we write verilog code to implement the FFT hardware design. Finally, the pad location, the floorplan and the layout of our FFT chip are listed in section 6.3.

## 6.1 Matlab Simulation and Analysis

We simulate fixed radix-8 and mixed radix algorithm by using the Matlab software package. We also determine the number of bits for the data bus and twiddle factor by using Matlab simulation.

#### 6.1.1 Simulation for Fixed and Mixed Radix FFT Algorithm

Figure 6.1 is the flow chart of fixed and mixed radix FFT algorithm. The procedures are as follows:

- (1) Select FFT length 512,1024,2048 or 4096 point.
- (2) Input sine waveform
- (3) If FFT length is power of 8, the fixed radix-8 algorithm is used. If the length

is not a power of 8, the mixed radix algorithm is used.

- (4) The resulting FFT data can be transformed by the IFFT by using Equation
  - (6.1).  $DFT: X(k) = DFT(x(n)) = \sum_{n=0}^{N-1} x(n) W_N^{nk}$   $IDFT: x(n) = \frac{1}{N} \sum_{n=0}^{N-1} X(n) W_N^{-nk}$   $= \frac{1}{N} \times j [DFT\{jX^*(n)\}]^*$ (6.1)
- (5) If the output waveform is equal to the input waveform, our algorithm has been verified to be correct. We can see the simulation results for a 512-point FFT given in Figure 6.2 (a) and (b).





Figure 6.1 Fixed and mixed radix -8 FFT algorithm flowchart



(a) Input signal



(b) Output signal

Figure 6.2 Comparison of input and output data for 512 -point FFT simulation with

radix-8 algorithm

#### 6.1.2 Simulation of Twiddle Factor and Databus Bit Length

Before the implementation of the FFT IC design, we need to know the data bus size (in bits) and the length of the twiddle factor coefficient value. So, we simulate the whole DMT modulation to determine the optimal data and coefficient wordlength. The processing flow of this simulation, depicted in Figure 6.3 is as follows:

- (1) We perform I and Q values from 256 QAM constellation.
- (2) Input coding data is randomly generated.
- (3) We set initial databus and coefficient lengths.
- (4) The resulting input coding data are mapped to the complex input of the FFT.
- (5) The IFFT of the mapped data values is computed and a Guard Interval (GI) is added to the result.
- (6) After removing the Guard Interval, we perform the FFT operation to recover the complex data values.
- (7) These data values are demapped into binary data values.
- (8) Finally, we decode the data.
- (9) We compare the coded data and the decoded data.
- (10) If the databus and coefficient wordlengths are optimal, the DMT simulation is OK.
- (11) If we get the non-optimal values, we adjust their bit length and run this processing flow again.

After simulation, we find that the optimal length of the input and output databus is 16 bits (1 bit for the sign bit, 5 bits for the integer portion, and 10 bits for the fractional portion). Again, the optimal length of the twiddle factor coefficient is 12 bits (1 bit for the sign bit, 1 bit for the integer portion, and 10 bits for the fractional portion). Figure 6.4 gives SNR ratios of different size fractional portions for the data and twiddle factor coefficients. From this simulation, 10 bits is sufficiently large for input and out put wordlength and twiddle factor.



Figure 6.3 Processing flow chart of simulation with twiddle factor and databus

#### wordlength



Figure 6.4 Simulation of the optimal wordlength for data bus and twiddle factor

# 6.2 Verilog Simulation and Analysis

The new fabrication process used is TSMC 0.25  $\mu$ m CMOS technology. It is synthesized with the Synopsys cell library. Figure 6.5 gives the design flow chart and CAD tools used in our new FFT architecture. The simulation result for 512-point FFT with read/write address generation is described in Figure 6.6. Figure 6.7 is the simulation result of 2048-point with input/output data and butterfly operation.

## (1) FFT chip design flow and CAD tools used



Figure 6.5 Flow chart of FFT chip design

## (2) Simulation results for FFT processor



Figure 6.6 Simulation waveform of 512-point with read/write address generation

896

|                                 |            | 5 1 4                  | 2     |           | 115     |             |              |
|---------------------------------|------------|------------------------|-------|-----------|---------|-------------|--------------|
| File Signal Yiew Waveform       | Analog Too | ls Window <u>H</u> elp |       |           |         |             |              |
| 9 - C & Ba 🛍 📐                  | 320289     | 5 176093               | -144  | 1196      | Q Q 100 | By: 🗊 🔫 🔶 🔶 | x 1ns        |
|                                 |            |                        |       | Innease   | • • •   | langage     |              |
|                                 |            | P 1                    |       | , 1209000 |         |             | <sup>=</sup> |
| fft mode[2:0]                   | 3          |                        |       |           | 3       |             |              |
| in fft real[15:0]               | fet8       | 0                      |       |           |         | fef8        |              |
| <pre>radix4_enable</pre>        | 1          |                        | Ĩ     |           |         |             |              |
| = 62                            |            |                        | 1     |           |         |             |              |
| fft_real7(15:0)                 | 4608       | 0                      | 1.    |           |         |             | 900          |
| fft_real6[15:0]                 | 1018       | 0                      | 65348 |           |         |             | 15281        |
| fft_real5[15:0]                 | 511        | <u> </u>               | 246   |           |         |             | 15217        |
| fft_real4[15:0]                 | 3074       | 0 ζ 653                | 350   |           |         |             | 54143        |
| fft_real3[15:0]                 | 55808      |                        | 133   |           |         |             | 62222        |
| fft_real2[15:0]                 | 6660       |                        | 311   |           |         |             | 49882        |
| fft_real0[15:0]                 | £604       | 0 <u> </u>             |       |           |         |             |              |
| = 64                            |            | 1                      |       |           |         |             |              |
| real fin grap2[15:0]            | 1020       |                        |       |           |         | 0           |              |
| real fin grouf[15:0]            | 0215       | -                      |       |           |         | 0           |              |
| real fin sran5[15:0]            | 5114       |                        |       |           |         | 0           |              |
| real fin sran4[15:0]            | 13298      | 0                      |       |           |         | 0           |              |
| real_fin_sran3[15:0]            | 15357      | 0                      |       |           |         | 0           |              |
| <pre>real_fin_sram2[15:0]</pre> | 58363      | 0                      |       |           |         | 0           |              |
| real_fin_sran1[15:0]            | 60428      | 0                      |       |           |         | 0           |              |
| real_fin_sran0[15:0]            | 996        | 0                      |       |           |         | 0           |              |
| = G3                            |            |                        |       |           |         |             |              |
| ou_fft_real(15:0)               | 0          |                        | 0     |           |         |             | 0            |
| ou_fft_ing[15:0]                | 0          | L                      | 0 1   |           |         |             | 0            |
|                                 |            | P                      |       | [200000   |         | 400000      |              |
|                                 | لاست       |                        |       |           |         |             |              |

Figure 6.7 Simulation of 2048-point with input/output data and butterfly operation

#### (3) Verification for FFT IC design

We must determine whether the FFT IC design is correct or not. Here we will describe how we verify our FFT IC design in Figure 6.8. We perform the procedure of the shaded region as shown in Figure 6.9. First, we dump the FFT output data from the Matlab simulation to produce the input data to the FFT. Second, we implement the FFT IC according to the flow chart of Figure 6.5. Third, we dump the output data of the FFT IC operation. Fourth, we use the resulting IC data output to deQAM mapping with Matlab simulation. Finally, we perform data decoding and comparison with input coding data, described in Figure 6.10.



Figure 6.8 Verification for FFT IC system block

• Verification for FFT IC flowchart



Figure 6.9 Verification for FFT IC flowchart

| M 檔案 E 编辑       | E 搜  | 尊⑤ 専案① 檢視    | ① 格式  | (D) 医塊(L) 巨集(M) | 進階(点) | 城窗(W) 説明(H)          |           |        | _8×   |
|-----------------|------|--------------|-------|-----------------|-------|----------------------|-----------|--------|-------|
| 🗅 🛸 🖬 🖡         | 3 6  | 👌 🗄 Aa 👎     | 6 H   | 📰 🕺 🖻 🛍         | = =   | : 📰 📰   10m_1        | value 💌 🕯 | h 🔥 hi | 🖉 🏹 🗌 |
| II cause and    |      |              |       |                 |       |                      |           |        |       |
| decoomg_data.os | a1+  |              |       |                 |       |                      |           |        |       |
| 1               |      |              |       |                 |       |                      |           |        |       |
| 2 number=       | 1,   | coding data= | ο,    | decoding data=  | ο,    | diff data =          | 0;        |        | =     |
| 3 number=       | 2,   | coding data= | 251 , | decoding data=  | 251 , | diff data =          | 0;        |        |       |
| 4 number=       | з,   | coding_data= | 206 , | decoding_data=  | 206 , | diff_data =          | ο;        |        |       |
| 5 number=       | 4,   | coding_data= | 179 , | decoding data=  | 179 , | diff_data =          | 0;        |        |       |
| 6 number=       | 5,   | coding_data= | 124 , | decoding_data=  | 124 , | diff_data =          | 0;        |        |       |
| 7 number=       | 6,   | coding_data= | 29,   | decoding_data=  | 29,   | diff_data =          | 0;        |        |       |
| 8 number=       | 7,   | coding_data= | 170 , | decoding_data=  | 170 , | diff_data =          | 0;        |        |       |
| 9 number=       | 8,   | coding_data= | 93,   | decoding_data=  | 93,   | diff_data =          | 0;        |        |       |
| 10 number=      | 9,   | coding_data= | 36,   | decoding_data=  | 36,   | diff_data =          | 0;        |        |       |
| 11 number=      | 10 , | coding_data= | 145 , | decoding_data=  | 145 , | diff_data =          | 0;        |        |       |
| 12 number=      | 11 , | coding_data= | 210 , | decoding_data=  | 210 , | diff_data =          | 0;        |        |       |
| 13 number=      | 12,  | coding_data= | 172 , | decoding_data=  | 172 , | diff_data =          | 0;        |        |       |
| 14 number=      | 13,  | coding_data= | 255 , | decoding_data=  | 255 , | diff_data =          | 0;        |        |       |
| 15 number=      | 14 , | coding_data= | 245 , | decoding_data=  | 245,  | diff_data =          | 0;        |        |       |
| 16 number=      | 15,  | coding_data= | 15,   | decoding_data=  | 15,   | diff_data =          | 0;        |        |       |
| 17 number=      | 16,  | coding_data= | 92,   | decoding_data=  | 92,   | diff_data =          | o ;       |        |       |
| 18 number=      | 17,  | coding_data= | 140 , | decoding_data=  | 140 , | diff_data =          | O ;       |        |       |
| 19 number=      | 18,  | coding_data= | 67,   | decoding_data=  | 67,   | diff_data =          | ο;        |        |       |
| 20 number=      | 19,  | coding_data= | 152 , | decoding_data=  | 152 , | diff_data =          | 0;        |        |       |
| 21 number=      | 20,  | coding_data= | 13,   | decoding_data=  | 13 ,  | diff_data =          | 0;        |        |       |
| 22 number=      | 21,  | coding_data= | 146 , | decoding_data=  | 146 , | diff_data =          | 0;        |        |       |
| 23 number=      | 22,  | coding_data= | 179 , | decoding_data=  | 179 , | diff_data =          | 0;        |        |       |
| 24 number=      | 23,  | coding_data= | 245 , | decoding_data=  | 245 , | diff_data =          | 0;        |        |       |
| 25 number=      | 24 , | coding_data= | 191 , | decoding_data=  | 191 , | diff_data =          | 0;        |        |       |
| 26 number=      | 25,  | coding_data= | 189 , | decoding_data=  | 189 , | diff_data =          | 0;        |        |       |
| -               |      |              |       |                 |       |                      |           |        | -     |
| •               |      |              |       |                 |       |                      |           |        |       |
| 如齋說明,請按 Fi      |      | 列 1, 櫃       | 1,CW  | UNIX            | 修改    | 2: 2004/5/3 02:05:28 | 下午 大小:317 | 719    | 插入    |

Figure 6.10 Compare coding and decoding data to verify FFT IC design

## 6.3 Pad Location and Floorplan

We sketch a floor plan for the FFT chip, depicted in Figure 6.11. The objective is to determine the overall structure of the auto placement and routing. Figure 6.12 shows a schematic of QFP package for our FFT chip design. Its pin description is given in Table 6.1. Table 6.2 is the features of our FFT chip design. The die size of our FFT chip is  $2600 \mu m \times 2600 \mu m$  excluding memory. It synthesizes with 146576 gate counts and the critical path delay is18 *ns* which reported by Synopsys Design Analyzer. We do the automatic placement and routing by using Apollo. Its layout view of the 4096-point our memory based FFT chip is shown in Figure 6.13.

#### (1) Floorplan of our FFT chip



Figure 6.11 A sketch of FFT chip floorplan



Figure 6.12 FFT package definition

## (3) Pin configuration of our FFT chip

| Name          | Pin# | I/O        | Description          |
|---------------|------|------------|----------------------|
| Vdd           | 1    | power      | Power source (core)  |
| N_reset       | 2    | Input      | For system reset     |
| Ext_clk       | 3    | Input      | Clk input            |
| Fft_mode0     | 4    | Input      | Sel fft mode         |
| Fft_mode1     | 5    | Input      | Sel fft mode         |
| Fft_mode2     | 6    | Input      | Sel fft mode         |
| Fft_start     | 7    | Input      | Start fft system     |
| Fft_valid     | 8    | Input      | Fft data valid       |
| Vss           | 9    | ground     | Ground source (core) |
| In_fft_real0  | 10   | Input      | Fft real0 input      |
| In_fft_real1  | 11 🔬 | Input      | Fft real1 input      |
| In_fft_real2  | 12   | Input      | Fft real2 input      |
| In_fft_real3  | 13   | Input      | Fft real3 input      |
| In_fft_real4  | 14   | Input      | Fft real4 input      |
| In_fft_real5  | 15   | Input      | Fft real5 input      |
| In_fft_real6  | 16   | Input 1896 | Fft real6 input      |
| In_fft_real7  | 17 🛛 | Input      | Fft real7 input      |
| Vdd           | 18   | power      | Power source (core)  |
| In_fft_real8  | 19   | Input      | Fft real8 input      |
| In_fft_real9  | 20   | Input      | Fft real9 input      |
| In_fft_real10 | 21   | Input      | Fft real10 input     |
| In_fft_real11 | 22   | Input      | Fft real11 input     |
| In_fft_real12 | 23   | Input      | Fft real12 input     |
| In_fft_real13 | 24   | Input      | Fft real13 input     |
| Vss           | 25   | ground     | Ground source(core)  |
| Vdd           | 26   | Power      | Power source(pad)    |
| In_fft_real14 | 27   | Input      | Fft real14 input     |
| In_fft_real15 | 28   | Input      | Fft real15 input     |
| In_fft_img0   | 29   | Input      | Fft img0 input       |
| In_fft_img1   | 30   | Input      | Fft img1 input       |
| In_fft_img2   | 31   | Input      | Fft img2 input       |
| In_fft_img3   | 32   | Input      | Fft img3 input       |

Table 6.1 FFT pin configuration

| In_fft_img4  | 33 | Input  | Fft img4 input     |
|--------------|----|--------|--------------------|
| In_fft_img5  | 34 | Input  | Fft img5 input     |
| In_fft_img6  | 35 | Input  | Fft img6 input     |
| In_fft_img7  | 36 | Input  | Fft img7 input     |
| Vss          | 37 | Input  | Ground source(pad) |
| In_fft_img8  | 38 | Input  | Fft img8 input     |
| In_fft_img9  | 39 | Input  | Fft img9 input     |
| In_fft_img10 | 40 | Input  | Fft img10 input    |
| In_fft_img11 | 41 | Input  | Fft img11 input    |
| In_fft_img12 | 42 | Input  | Fft img12 input    |
| In_fft_img13 | 43 | Input  | Fft img13 input    |
| In_fft_img14 | 44 | Input  | Fft img14 input    |
| In_fft_img15 | 45 | Input  | Fft img15 input    |
| Vdd          | 46 | Power  | Power source(pad)  |
| Vss          | 47 | ground | Ground source(pad) |
| Nc           | 48 |        | 60.                |
| Nc           | 49 |        | C.                 |
| Nc           | 50 | ESP    | A LE               |
|              | 24 | 11     |                    |

| Name           | Pin# | I/O        | Description           |
|----------------|------|------------|-----------------------|
| Vdd            | 51   | power 1996 | Power source (pad)    |
| Test 0         | 52   | Input      | Sel Test pattern      |
| Test 1         | 53   | Input      | Sel Test pattern      |
| Test 2         | 54   | Input      | Sel Test pattern      |
| Vss            | 55   | ground     | Ground source(pad)    |
| Vdd            | 56   | power      | Power source (pad)    |
| Out_data_clk   | 57   | output     | Fft output data clk   |
| Out_data_valid | 58   | output     | Fft output data valid |
| Vss            | 59   | ground     | Ground source (pad)   |
| Ou_fft_real0   | 60   | output     | Fft real0 output      |
| Ou_fft_real1   | 61   | output     | Fft real1 output      |
| Ou_fft_real2   | 62   | output     | Fft real2 output      |
| Ou_fft_real3   | 63   | output     | Fft real3 output      |
| Ou_fft_real4   | 64   | output     | Fft real4 output      |
| Ou_fft_real5   | 65   | output     | Fft real5 output      |
| Ou_fft_real6   | 66   | output     | Fft real6 output      |
| Ou_fft_real7   | 67   | output     | Fft real7 out put     |

| vdd            | 68  | power        | Power source (pad) |
|----------------|-----|--------------|--------------------|
| Ou_fft_real8   | 69  | output       | Fft real8 output   |
| Ou_fft_real9   | 70  | output       | Fft real9 output   |
| Ou_fft_real10  | 71  | output       | Fft real10 output  |
| Ou_fft_real11  | 72  | output       | Fft real11 output  |
| Ou_fft_real12  | 73  | output       | Fft real12 output  |
| Ou_fft_real13  | 74  | output       | Fft real13 output  |
| Vss            | 75  | ground       | Ground source(pad) |
| Vdd            | 76  | Power        | Power source(pad)  |
| Out_fft_real14 | 77  | output       | Fft real14 output  |
| Ou_fft_real15  | 78  | output       | Fft real15 output  |
| Ou_fft_img0    | 79  | output       | Fft img0 output    |
| Ou_fft_img1    | 80  | output       | Fft img1 output    |
| Ou_fft_img2    | 81  | output       | Fft img2 output    |
| Ou_fft_img3    | 82  | output       | Fft img3 output    |
| Ou_fft_img4    | 83  | output       | Fft img4 output    |
| Ou_fft_img5    | 84  | output       | Fft img5 output    |
| Ou_fft_img6    | 85  | output E 5 💦 | Fft img6 output    |
| Ou_fft_img7    | 86  | output       | Fft img7 output    |
| Vss            | 87  | ground       | Ground source(pad) |
| Ou_fft_img8    | 88  | output 1996  | Fft img8 output    |
| Ou_fft_img9    | 89  | output       | Fft img9 output    |
| Ou_fft_img10   | 90  | output       | Fft img10 output   |
| Ou_fft_img11   | 91  | output       | Fft img11 output   |
| Ou_fft_img12   | 92  | output       | Fft img12 output   |
| Ou_fft_img13   | 93  | output       | Fft img13 output   |
| Ou_fft_img14   | 94  | output       | Fft img14 output   |
| Ou_fft_img15   | 95  | output       | Fft img15 output   |
| Vdd            | 96  | Power        | Power source(pad)  |
| Vss            | 97  | ground       | Ground source(pad) |
| Nc             | 98  |              |                    |
| Nc             | 99  |              |                    |
| Nc             | 100 |              |                    |

## (4) Features of our FFT chip design

| Process                   | TSMC 0.25 1P4M                 |  |  |
|---------------------------|--------------------------------|--|--|
| die size excluding memory | $2600 \mu m \times 2600 \mu m$ |  |  |
| Gate count                | 146576                         |  |  |
| FFT length                | 512,1024,2048,4096             |  |  |
| Maximum clock speed       | 50 MHz                         |  |  |
| Package                   | 100 QFP                        |  |  |
| Power supply              | 2.5V                           |  |  |
| Input/Output data bit     | 16 bits                        |  |  |

Table 6.2 Features of our FFT chip design

### (5) Layout view of our variable length FFT chip excluding memory



Figure 6.13 Layout view of our variable length FFT chip