行政院國家科學委員會補助專題研究計畫期末報告(計畫名稱)

W-CDMA 基地台接收系統之數位實現及 DSP 運算的最佳 化(3)

計畫類別: \*整合型計畫

計畫編號:NSC 90-2219-E-009-013

執行期間: 90年 08月01日至 91年 07月31日

計畫主持人:張文鐘

執行單位:交通大學電信系

中華民國 91 年 8 月

# The digital implementation and code optimization of W-CDMA receiver in the TIc6x DSP

W-CDMA 基地台接收系統之數位實現及 DSP 運算的最佳化(3)

計畫編號:NSC 90-2219-E-009-013

執行期限: 90年08月01日至91年07月31日

主持人:張文鐘 交通大學電信系

E-mail:wtchang@cc.nctu.edu.tw

## 一,中文摘要

關鍵字(cdma2000, 展蘋通訊 數位訊號處理 晶片)利用兩顆TI 定點運算的數位訊號處理晶 片來實現 cdma2000 的接收機。一顆主要是負 責碼鎖定,而另外一顆則負責碼追蹤、頻道預 測、複數解亂碼、解調及 RAKE 組合、資料決 策。利用編議器排列 經由多處理單元執行的 平行運算也將一併分析)

**Abstract**(Keyword : cdma2000, spread spectrum communication TI DSPC6x) Two TI TMS320C6201 fixed-point processing chips are used to implement a cdma2000 receiver. One chip is used for code acquisition, and the other is used for code tracking, channel estimation, complex de-spreading, demodulation, RAKE combining and data decision. The parallelism achieved by the DSP multi-processing units through the use of loop unrolling by the compiler is analyzed.

#### 二,緣由與目的

In the reverse link, the transmitted baseband signal can be expressed as

$$s(t) = \sum_{n=0}^{\infty} \left[ \left( d_n^T + j d_n^Q \right) \left( c_n^T + j c_n^Q \right) \right] g_T(t - n T_c)$$
$$= \sum_{n=0}^{\infty} d_n c_n g_T(t - n T_c)$$

Where  $g_T(t)$  is the SRRC pulse shaping filter with roll-off factor f = 0.22, n is the chip sample index,  $d_n^I$  and  $d_n^Q$  are the data for the in-phase and quadrature-phase component, respectively.  $c_n^I$  and  $c_n^Q$  are the scrambling sequences of the in-phase and quadrature-phase component, respectively. The signal will be transmitted from a mobile station through a Rayleigh fading channel with Doppler frequency spread and additive white Gaussian noise (AWGN) The received signal can be expressed as

$$r(t) = \sum_{l=1}^{L} r_{l}(t) s(t - lT - \sharp_{d}) + z(t)$$

Where  $T=2T_c=1.6276 \sim s$ , L=6,  $f_d$  is the propagation delay, and  $f_c$  is the carrier frequency. z(t) is the band-limited additive Gaussian noise (AWGN) which can be represented as

$$z(t) = n_c(t) \cos \tilde{S}_0 t - n_s(t) \sin \tilde{S}_0 t$$
$$= \operatorname{Re}\{n(t)e^{j2f_c t}\}$$

Where  $n(t) = n_c(t) + jn_s(t)$  is the equivalent lowpass form of AWGN with  $E\{n_c^2(t)\} = E\{n_s^2(t)\} = N_0$ , and  $N_0$  is one-sided PSD of AWGN. The functional block diagram is shown in Figure 1.The received complex signal passed through the chip matched filter to remove the out-of-band noise can be described as

$$r_{CMF}(t) = \left[\sum_{l=1}^{L} r_{l}(t) s(t - t_{l} + z(t))\right] \otimes g_{R}(t) = \sum_{l=1}^{L} \sum_{n=0}^{\infty} \left[r_{l}(t) d_{n} c_{n} g(t - nT_{c} - t_{l})\right] + z'(t)$$

Where  $\otimes$  denotes convolution operation, and

$$g(t) = g_T(t) \otimes g_R(t)$$
$$z'(t) = z(t) \otimes g_R(t).$$

### 三、研究方法及成果

The functional block of MRC RAKE combiner is shown in Figure 2. Each finger consists of a code tracking system and a complex channel amplitude estimator, respectively. The data flow is illustrated in Figure 3, with I\_SRRC and Q\_SRRC: waveform chip matched filter output samples for I and Q branches.-  $\tilde{I}$ : the pseudo-random (PN) code delay estimates for RAKE fingers.- i\_M\_I\_P, i\_M\_Q\_P, i\_M\_I\_F, and i M Q F: the interpolation (middle) output of I and O branches for R-PICH and R-FCH, respectively. i\_EL\_I\_P and i\_EL\_Q\_P: the interpolation (early minus late) output of I and Q branches for R-PICH. Track\_est: the output of the code tracking system, indicates the estimated chip timing error and is feedback for interpolation .- ch\_est: the output of the channel

estimation.- output: the result of the data decision. The block diagram of the complex multiply matched filter is illustrated in Figure 4. To reduce the interference noise, when the length of the matched filter is short, average with the preceding correlation output is done.

The received signal is first digitalized by PMC-2MAI ADC at 40MHz sampling rate and the digital signals are passed to PEM-4WDC wideband down-converter through the LVDS interface. PEM-4WDC down-converter then demodulates the signals and the base band I and O are stored in FIFO in the order as shown in Figure 5. The Synchronization Between PEM-4WDC Wideband Down-Converter and the processing CPUs is done using a check bit stored in MPRAM. For real time implementation, the first step is the initial search. The acquisition includes complex multiply matched filtering, energy detector, and path selector. For maximum parallelism, loop unrolling with a factor 4, is used. This requires 32 loops for a 128-tap filter (128 loops). Each loop uses 10 clocks. The execution pattern is shown in Table 3. The total clock cycles for one correlation is thus 8 + 32\*10 =328, where8is prolog and epilog. However the actual measured cycle is 392. An

 Table 1: The simulation parameters of the channel

| Path | Attenuation (dB) | Relative Delay |
|------|------------------|----------------|
|      |                  | (~s)           |
| 1    | -4               | 0              |
| 2    | -6               | 1.6276         |
| 3    | -8.24            | 3.2552         |
| 4    | -10              | 4.8828         |
| 5    | -11.55           | 6.5104         |
| 6    | -15.23           | 8.1380         |

improved version that can save half the computation time called Differential Delay Mach Filter that utilizing the PN code run properties is now being considered. The MRC RAKE functions include interpolation, channel estimation, tracking, demodulation.

The scheduling policy is illustrated in Figure 6. Node A and Node B continue polling the check bit and will start block processing until the check bit changes to '0'. When Node A begins to operate, it changes the check bit back to '1' first, and then move the data in the FIFO buffer into its internal memory via DMA. After Node A finishes processing, it moves data to MPRAM for Node B to access. The total clock time is shown in Table 4..

#### 四、結論

The simulation parameters of the channel and the system are listed in Table 1 and Table 2, respectively.. The computation load of cdma system is primarily in the code acquisition and channel estimation. Thus in the next steps, we will examine more efficient algorithms for these two functions.

 Table 2: The simulation parameters of the system

| Chip Rate ( $R_c$ )          | 1.2288 Mcps          |
|------------------------------|----------------------|
| Spreading Factor (SF)        | 16                   |
| Bit Rate $(R_i)$             | 76.8 Kbps/code       |
|                              | channel              |
| Spreading Codes              | Walsh and Scrambling |
|                              | codes                |
| Oversamples                  | 4 samples/chip       |
| Roll-off Factor ( $\Gamma$ ) | 0.22                 |
| Relative Gain for All        | 0 dB                 |
| Code Channels                |                      |
| Carrier Frequency $(f_c)$    | 2GHz                 |
| Chip SNR (SNR <sub>c</sub> ) | -20 ~ 0 dB           |
| Vehicular Speed (V)          | 60 Km/hr             |
| Rake Fingers                 | 4                    |



Figure 1: The functional block of the receiver architecture used in cdma2000 base station system.



Figure 2: MRC RAKE receiver



Figure 3: Data flow of cdma2000 receiver over two CPUs



Figure 4: Modified complex multiply matched filter for acquisition

| <b>I1</b> | Q1     | I2    | Q2      |                                                     |
|-----------|--------|-------|---------|-----------------------------------------------------|
| Figur     | e 5: I | and Q | ) signa | Is are stored in the PEM-4WDC FIFO buffer in order. |



# Figure 6: Scheduling

| Unit/Cycle | 0   | 1   | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   |
|------------|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
| .D1        |     | LDB | LDH | LDH | LDH | LDB | LDB | LDB | LDH | SUB |
| .D2        | SUB |     | LDB | LDB | LDH | LDH | LDB | LDH | LDH | LDB |
| .M1        | MPY | MPY |     | MPY | MPY | MPY |     | MPY | MPY | MPY |
| .M2        | MPY |     | MPY | MPY | MPY | MPY | MPY |     | MPY | MPY |
| .L1        |     |     |     |     |     |     |     |     |     |     |
| .L2        |     |     |     |     |     | SUB | ADD | ADD | ADD | ADD |
| .S1        |     | SUB | ADD | SUB | В   |     | SUB | SUB | ADD |     |
| .S2        |     | ADD | MV  | ADD | MV  | ADD | ADD | ADD | ADD | MV  |

Table 3 the kernel after loop unrolling of the correlation operation for code acquisition

| Tuble 4. The total clocks spent of cuch floue | Table 4: The total clocks | spent of each Node |
|-----------------------------------------------|---------------------------|--------------------|
|-----------------------------------------------|---------------------------|--------------------|

| Node A             |             | Node B                            |         |  |
|--------------------|-------------|-----------------------------------|---------|--|
| Dood Doto          | ~3000       | PN Code Generation                | 17781   |  |
| Reau Data          |             | Interpolation                     | 8986    |  |
|                    |             | Code Tracking                     | 12767   |  |
|                    |             |                                   |         |  |
| PN Code Generation | 17781       | Channel Estimation                | ~70000  |  |
| Code Acquisition   | 95000~99000 | RAKE Combining &<br>Despreading & | 6370    |  |
|                    |             | Data Decision                     |         |  |
|                    |             | Move Data                         | ~1000   |  |
| Total              | ~114781     | Total                             | ~117000 |  |