# A Low-Complexity Synchronizer for OFDM-Based UWB System

Hsuan-Yu Liu, Student Member, IEEE, and Chen-Yi Lee, Member, IEEE

Abstract—In current ultra-wideband (UWB) baseband synchronizer approaches, the parallel architecture is used to achieve over 500 MSamples/s throughput requirement. Therefore achieving low power and less area becomes the challenge of UWB baseband design. In this paper, a low-complexity synchronizer combining data-partition-based correlation algorithms and dynamic-threshold design is proposed for orthogonal frequency division multiplexing based UWB system. It provides a methodology to reduce design complexity with an acceptable performance loss. Based on the data-partition algorithms, both single auto-correlator and moving-average-free matched filter are developed with 528 Msample/s throughput for the 480 Mb/s UWB design. Simulation results show the synchronization loss can be limited to 0.8-dB signal-to-noise ratio for 8% system packet-error rate.

Index Terms—Data-partition, dynamic-threshold, moving-average-free matched filter (MF), single auto-correlator.

#### I. INTRODUCTION

RTHOGONAL frequency division multiplexing (OFDM) based ultra-wideband (UWB) technology has received attention owing to the provided 480 Mb/s high data rate and below 323 mW power requirement [1]. In baseband receiver, the timing and frequency synchronizer is used to detect the incoming packet and solve the carrier frequency offset (CFO) which is expected as  $\pm 20$  ppm for UWB [2]-[6], [10]. In the WLAN system, existing synchronizers use the matched filter (MF) and the fast-Fourier-transform (FFT) symbols for accurate timing detection and fine CFO estimation [3]-[7]. However, the moving-average circuit of MF and registers storing FFT symbol will consume large power, i.e. 110 mW in [3]. As the system migrates to UWB, parallel architecture is exploited. References [8] and [9] use 20 and 128 parallel MF to detect the symbol timing in 10- and 2-GHz sampling rates respectively. Thus, achieving low power becomes the main concern in designing UWB baseband synchronizer [8].

To achieve a power-efficient synchronizer for OFDM-based UWB system, a novel low-complexity scheme combining a data-partition and dynamic-threshold design is proposed. The data-partition method can reduce the used data amount

Manuscript received December 22, 2004; revised June 14, 2005. This work was supported by the National Science Council of Taiwan, R.O.C. under Grant NSC94-2215-E-009-044, and by the Ministry of Economic Affairs of Taiwan, R.O.C. under Grant 93-EC-17-A-03-S1-0005. This paper was recommended by Associate Editor T. S. Rosing.

The authors are with the Department of Electronics Engineering, National Chiao Tung University, Hsin-Chu 300, Taiwan, R.O.C. (e-mail: hyliu@si2lab.org; cylee@si2lab.org).

Digital Object Identifier 10.1109/TCSII.2006.882804



Fig. 1. System block diagram of OFDM-based baseband receiver.

for synchronization (Sync), thus the register-access amount and moving-average complexity can be reduced. The dynamic-threshold design can adapt the threshold value of timing detection to the channel condition, thus enhancing the Sync performance. Simulation result shows the performance loss of the proposed design with 75% register reduction can be limited to 0.8-dB signal-to-noise ratio (SNR) for 8% system packet-error rate (PER). This paper is organized as follows. System block diagram of UWB baseband receiver is described in Section III. The proposed low-complexity scheme is described in Section III. Simulated results are shown in Section IV. The proposed architecture and implementation result are described in Section V.

#### II. SYSTEM BLOCK DIAGRAM

Fig. 1 shows the system block diagram of the UWB baseband receiver. And system parameters are listed in Table I [12]. In the receiver, after the automatic gain control (AGC) adjusts the RF gain the proposed synchronizer begins to detect the incoming packet. physical layer convergence protocol (PLCP) preamble transmitted in the initial of each packet can be used for Sync. The structure of PLCP preamble defined in [10] is shown in Fig. 2. The preamble comprises 21 packet sequences (PS), three frame sequences (FS), and six channel-estimation sequences (CES). In the preamble the proposed design can sequentially finish packet detection (PD), CFO estimation, FFT-window detection (FWD), and preamble-timing detection (PTD). After the synchronizer, the received signal is sent through FFT, channel equalizer, the de-quadrature phase shift key (QPSK), the forward error control (FEC) decoder, and de-scrambler, and then the data are sent to medium access control (MAC).

#### III. ALGORITHM DESIGN

# A. Data-Partition-Based Auto-Correlation

In order to detect the repeated PS of the incoming preamble and estimate the CFO from the linear phase rotation caused by CFO, the auto-correlation (AC) can be used in the preamble-

| Data rate (Mb/s)              | 120 | 240   | 480 |
|-------------------------------|-----|-------|-----|
| Spreading gain                | 4   | 2     | 1   |
| Constellation                 |     | QPSK  |     |
| Coding rate                   |     | 3/4   |     |
| FFT size                      |     | 128   |     |
| Data carriers per OFDM symbol |     | 100   |     |
| OFDM symbol duration (ns)     |     | 312.5 |     |

528

TABLE I BASEBAND SYSTEM PARAMETERS



Fig. 2. Preamble structure of OFDM-based UWB system.

Signal bandwidth (MHz)

based OFDM system [2]–[6]. The algorithm used in the existing approaches can be derived as

$$\Lambda_{AC}(m) = \sum_{n=0}^{N-1} r_{m \times N+n} \times r_{(m+1) \times N+n}^*$$
 (1)

where N is the sample amount of a repeated symbol, and  $r_{m\times N+n}$  is the received sample in the nth cycle of the mth repeated symbol. In the UWB system, the preamble comprises the repeated OFDM symbols and each of which has 165 samples [10]. So N is equal to 165 in the OFDM system with 128-point FFT symbol and 37-sample guard-interval. And the 165 samples  $r_{m\times N+n}$  will be stored and multiplied in (1). To reduce the multiplications, a data-partition-based AC algorithm is proposed and derived as

$$\Lambda_{\rm AC}(m) \approx \omega \sum_{m=0}^{\left\lfloor \frac{N}{\omega} \right\rfloor - 1} r_{m \times N + \omega \times n} \times r_{(m+1) \times N + \omega \times n}^* \qquad (2)$$

where  $\omega$  is the reduction factor  $r_{m\times N+\omega n}$  and  $r_{(m+1)\times N+\omega n}$  are the used samples. In the proposed algorithm, input data are partitioned into  $\omega$  groups, and only one group of data is used. Thus the multiplications can be reduced to  $1/\omega$ . And the registers for storing the input samples can be also reduced. The AC output power can be used to detect valid packet. The algorithm of PD can be derived as

$$\left|\Lambda_{\rm AC}(m)\right|^2 \ge \lambda_1 \times P_{m+1}^2 \tag{3}$$

where  $|\Lambda_{AC}(m)|^2$  is the AC output power,  $\lambda_1$  is a pre-defined threshold value, and  $P_{m+1}$  is the sum of signal power of (m+1)th OFDM symbol. Fig. 3 shows the examples of normalized AC power  $|\Lambda_{AC}(m)|^2/P_{m+1}^2$  of the received signal in a high SNR condition of an AWGN channel (better channel) and a low SNR condition of an indoor multipath channel for UWB system (worse channel) [11]. The correct preamble is set to begin in 0 ns. Before 0 ns only the noise comes. And the normalized AC power of received noise may become higher as  $\omega$  is increased. That means the larger  $\omega$  value will cause the false-alarm of PD more easily. So it's important to find a  $\omega$  value to simultaneously



Fig. 3. Normalized AC power in (a) better channel and (b) worse channel.

keep Sync performance and reduce design complexity. The AC can be also used for CFO estimation [3]–[7]. The CFO estimation can be derived as

$$\hat{\epsilon} = \frac{1}{2\pi NT} \tan^{-1} \left\{ \frac{\operatorname{Im} \left[ \Lambda_{AC}(m) \right]}{\operatorname{Re} \left[ \Lambda_{AC}(m) \right]} \right\} \tag{4}$$

where  $\hat{\in}$  is the estimated CFO, N is the sample amount of an OFDM symbol, T is the sample period, and  $\Lambda_{\rm AC}(m)$  is the AC result. After CFO estimation, the phase rotation caused by CFO can be compensated, and FWD can begin without CFO distortion.

#### B. Moving-Average-Free MF

For correct FWD, the MF can be used [4], [5]. The algorithm used in existing approaches can be derived as

$$\Lambda_{\rm MF}(k) = \sum_{n=0}^{N-1} r_{k+n} \times C_n^*$$
 (5)

where N is the sample amount of an OFDM symbol, k is the FWD timing from 0 to N-1,  $r_{k+n}$  is the received sample after CFO compensation, and  $C_n$  is the coefficient of the MF. The conventional MF in (5) needs to store the received samples  $r_{k+0} \sim r_{k+N-1}$  in the registers according to different FWD timing k. We propose a moving-average-free MF which only stores  $r_0 \sim r_{N-1}$  no matter what the value of timing k. And the register power can be reduced. Since the OFDM symbol is repeated, the received samples have a period of N samples. And the received sample  $r_{k+n}$ , where k+n >= N, can approximate to  $r_{k+n-N}$ . And then the received samples  $r_{k+n}$ , where  $n=0 \sim N-1$ , can approximate to  $r_L$ , where  $n=0 \sim N-1$  and  $n=0 \sim k-1$ . That means the used received samples can be only  $n=0 \sim r_{N-1}$  for different FWD timing n=0. Equation (5) can be approximated as

$$\Lambda_{\text{MF}}(k) = \sum_{n=0}^{N-1} r_{k+n} \times C_n^* 
\approx \sum_{n=0}^{N-k-1} r_{k+n} \times C_n^* + \sum_{n=N-k}^{N-1} r_{k+n-N} \times C_n^* 
= \sum_{L=0}^{N-1} r_L \times C_{L-k+\lceil (k-L)/N \rceil \times N}^*$$
(6)



Fig. 4. MF power in (a) better channel and (b) worse channel.

where the used received samples  $r_L$  are fixed as  $r_0 \sim r_{N-1}$ , and the MF coefficients  $C_{L-k+\lceil(k-L)/N\rceil\times N}$  are still  $C_0 \sim C_{N-1}$ . Since the proposed algorithm can only use fixed N received samples to calculate all outputs of the MF, the moving-average design is not needed. Moreover the computation of the moving-average-free MF can be still reduced by the data-partition method. Finally, the proposed MF algorithm can be derived as

$$\Lambda_{\rm MF}(k) \approx \omega \sum_{\ell=0}^{\lfloor N/\omega \rfloor - 1} r_{\omega \times \ell} \times C^*_{\omega \times \ell - k + \lceil (k - \omega \times \ell)/N \rceil \times N}$$
 (7)

where the index  $\omega$  is the reduction factor as in (2). As (2), multiplications and stored samples of (7) can be reduced to  $1/\omega$  of the original amounts. The filter taps can be also reduced. The MF output power can be used for FWD. The timing when MF peak power appears can be derived as

$$K_{\text{peak}} = \arg_k \max \left\{ \left| \Lambda_{\text{MF}}(k) \right|^2 \right\}$$
 (8)

where  $K_{\rm peak}$  is the timing with peak power and  $|\Lambda_{\rm MF}(k)|^2$  is the MF output power. Fig. 4 shows the MF power of the received preamble in the channel conditions which is the same as in Fig. 3. As shown in Fig. 4, the correct FFT-window (FW) boundary is set to 0 ns. As  $\omega$  is increased, the highest peak of MF output power will not only appear in the FW boundary (0 ns). For solving the problem the sub-optimal timing location algorithm can be used [5]. And the FW boundary can be detected as the timing of the earliest searched MF peaks. As shown in Fig. 4 when  $\omega$  is equal to 4, the correct FW boundary (0 ns) is the timing of the earliest one of 2 highest peaks. In this case we can search 2 MF peaks and detect the FW boundary on the earliest peak. The sub-optimal timing location algorithm can help to adjust the FWD result according to the chosen  $\omega$  value.

#### C. Dynamic-Threshold Design

After FWD, the synchronizer can start the PTD to find the boundary between PS and FS of the preamble. Since the FS is the sign-inversed signal of PS [10], we can use sum of two

continuous AC results to detect the timing. The algorithm of PTD can be derived as

$$|\Lambda_{AC}(m) + \Lambda_{AC}(m-1)|^2 \le \lambda_2 \times [p_m + p_{m-1}]^2$$
 (9)

where  $\Lambda_{\rm AC}(m)$  is the AC result of mth and (m+1)th OFDM symbol,  $\lambda_2$  is a threshold value, and  $P_m$  is the sum of signal power of the mth OFDM symbol. If the mth OFDM symbol belongs to PS and (m+1)th OFDM symbol belongs to FS, the sign-inversed characteristic will let  $\Lambda_{\rm AC}(m)$  be sign-inversed of  $\Lambda_{\rm AC}(m-1)$ . Thus  $|\Lambda_{\rm AC}(m)+\Lambda_{\rm AC}(m-1)|^2$  will become smaller than the product of threshold  $\lambda_2$  and sum of the signal power. For accurate PTD, a dynamic-threshold design, which adapts  $\lambda_2$  value to the channel condition, is proposed. The adapted threshold can be derived as

$$\lambda_2 = \frac{|\Lambda_{AC}(m-1) + \Lambda_{AC}(m-2)|^2}{[p_{m-1} + p_{m-2}]^2} \times \varepsilon$$
 (10)

where  $\varepsilon$  is a fixed ratio to shift the level of  $\lambda_2$  to perform accurate PTD, and the threshold value  $\lambda_2$  can be updated according to AC result  $\Lambda_{\rm AC}(m)$  and sum of signal power  $P_{m-1}$  and  $P_{m-2}$ . Simulation result shows the proposed dynamic threshold design can achieve the lower FER and PER than those fixed threshold designs.

#### IV. SIMULATION ANALYSIS

System PER and FER of the proposed design is shown in this section. The simulation environment mainly comprises additive white Guassian noise (AWGN), CFO effect, SCO effect, and the indoor multipath channel [11] with typical 5 ns RMS delay spread for 480 Mb/s UWB system. The CFO and SCO between transmitter and receiver design are both set as 40 ppm (TX + RX).

#### A. PER Analysis of Data-Partition-Based Design

As shown in Fig. 5, system PER of the proposed low-complexity scheme with different reduction factor  $\omega$  is simulated and compared with perfect Sync (FER = 0.0% and CFO estimation error = 0 Hz) in 480 Mb/s data rate mode. Compared with the perfect Sync, the SNR loss for typical 8% PER is 0.14, 0.15, 0.3, and 3.1 dB of  $\omega=1$ , 2, 4, and 8. The design with  $\omega=16$  is not efficient to achieve 8% PER. The PER curves of  $\omega=1$ , 2, 4 are very close to each other, and the SNR loss becomes obviously higher when  $\omega$  is  $\geq$ 8.

## B. FER and PER Analysis of Dynamic-Threshold Design

Fig. 6 shows the FER of the proposed dynamic-threshold design compared with fixed-threshold designs. The designs with fixed threshold = 0.04 and 0.1 can respectively achieve the low FER in 0-dB and 2~6-dB SNR regions. However they can't achieve the lowest FER in all SNR regions. The proposed dynamic-threshold design can achieve the lowest SNR regions because of the adapted threshold tuning. Fig. 7 shows the PER of the proposed dynamic-threshold design in 120 Mb/s data rate. Since the proposed design can achieve the lowest FER, it can



Fig. 5. Floating-point PER with different  $\omega$  values in 480 Mb/s mode.



Fig. 6. FER with different threshold of PTD.



Fig. 7. PER with different threshold of PTD in 120 Mb/s data rate.

achieve lower 0.13-dB $\sim$ 2.33-dB SNR for 8% PER compared with the fixed-threshold designs.



Fig. 8. Fixed-point PER of the proposed design.

# C. Fixed-Point PER Performance

Fig. 8 shows the PER of the fixed-point baseband processor with the proposed design with  $\omega=4$  in 120 Mb/s~480 Mb/s data rates. And the 5-bit digital-to-analog converters (DACs) and analog-to-digital converters are adopted. Compared with perfect Sync, the SNR loss caused by Sync error is only 0.15~0.8 dB for typical 8% PER in 120 Mb/s~480 Mb/s data rates. The proposed design with  $\omega=4$  can efficiently suppress the Sync error and enhance system performance.

# V. ARCHITECTURE DESIGN

In order to efficiently achieve 528 Msamples/s throughput for UWB specifications, the synchronizer is designed with four parallel signal paths at 132-MHz clock frequency. The architecture of the proposed auto-correlator with  $\omega = 4$  is shown in Fig. 9. Since the computation of AC can be reduced to quarter in (2), only one auto-correlator is needed instead of the parallel four auto-correlators. And the stored sample amount for the auto-correlator can be also reduced to |N/4| = 41 samples. Architecture of the proposed MF is shown in Fig. 10. Based on (7), the needed tap number of MF is reduced from N to |N/4| = 41. In [10], the preamble has the constant magnitude and varied sign values. So the MF can be realized with addition/subtraction design instead of the signed multipliers. And like the auto-correlator, the proposed moving-average-free MF also needs to store 41 samples. So the registers for storing 41 samples can be shared by the auto-correlator and MF. Based on the proposed low-complexity scheme, the synchronizer can be realized with a single auto-correlator, the quarter-tap moving-average-free MF, and the quarter-size registers.

Table II lists the hardware comparison of the proposed design and a parallel approach with 4 parallel auto-correlators, 4 parallel 165-tap MF, and 165-sample registers. The power comparison is based on the post-layout simulation in 528 MSamples/s throughput and the standard  $0.18-\mu m$  CMOS process. Besides the auto-correlator, MF, and registers, the synchronizer designs also contain CFO compensators which are realized by complex multipliers to compensate the phasor error. With the reduced auto-correlator, MF, and registers, the proposed design



Fig. 9. Architecture of the proposed auto-correlator.



Fig. 10. Architecture of the proposed MF.

TABLE II HARDWARE COMPARISON

|                     | The proposed design |            | 4-parallelism architecture |            |
|---------------------|---------------------|------------|----------------------------|------------|
|                     | Gate-count          | Power (mw) | Gate-count                 | Power (mw) |
| Auto-<br>correlator | 4K                  | 3.2        | 13K                        | 12.8       |
| Matched filter      | 33K                 | 6.7        | 106K                       | 15.4       |
| Register            | 7.2K                | 8.5        | 28.8K                      | 34.0       |
| CFO<br>Compensation | 14K                 | 13.1       | 14K                        | 13.1       |
| Others              | 4.2K                | 1.9        | 4.2K                       | 1.9        |
| Total               | 62.4K               | 33.4       | 166K                       | 77.2       |

needs only 37.6% gate count and 43.3% power of the parallel approach.

Table III lists the chip testing summary and Fig. 11 shows the chip microphoto. Designed in 0.18- $\mu$ m CMOS process, the proposed synchronizer consumes 33 mW for 480 Mb/s data rate and 528 Msamples/s throughput. It occupies 20.4% of the OFDM receiver (RX) power. The proposed low-power scheme reduces 26.7% of OFDM receiver power when compared with the parallel approach.

## VI. CONCLUSION

After algorithm introduction, performance analysis, and architecture design, a low-complexity synchronizer is proposed for OFDM-based UWB baseband processor. Combining datapartitioning and dynamic-threshold schemes, the proposed design can achieve 528 Msamples/s throughput to meet 120~480 Mb/s data rates in 0.18-μm CMOS process. It needs 37.6% gate count and consumes only 43.3% power of the parallel approach.

TABLE III CHIP TESTING SUMMARY

| Process                                  | 0.18μm CMOS 1P6M |
|------------------------------------------|------------------|
| Package                                  | 208 CQFP         |
| Die Size                                 | 6.5mm x 6.5mm    |
| Maximum Data Rate                        | 480Mb/s          |
| Maximum Bandwidth                        | 528MHz           |
| System Clock                             | 264MHz           |
| Synchronizer Clock                       | 132MHz           |
| RX Core Power (including OFDM) @ 480Mb/s | 575mW            |
| OFDM RX Power (including Sync) @480Mb/s  | 162mW            |
| Proposed Sync Power @ 480Mb/s            | 33mW             |



Fig. 11. Chip microphoto of the OFDM-based UWB baseband transceiver.

#### REFERENCES

- [1] A. Batra, J. Balakrishnan, G. R. Aiello, J. R. Foerster, and A. Dabak, "Design of a multiband OFDM system for realistic UWB channel environments," *IEEE Trans. Microw. Theory Tech.*, vol., no. 9, pp. 2123–2138, Sep. 2004.
- [2] T. M. Schmidl and D. C. Cox, "Robust frequency and timing synchronization for OFDM," *IEEE Trans. Commun.*, vol. 45, no. 12, pp. 1613–1621, Dec. 1997.
- [3] M. Krstic, A. Troya, K. Maharatna, and E. Grass, "Optimized low-power synchronizer design for the IEEE 802.11a standard," in *Proc. ICASSP*, Apr. 2003, vol. 2, pp. II-333–II-336.
- [4] L. Schwoerer, "VLSI suitable synchronization algorithms and architecture for IEEE 802.11a physical layer," in *Proc. IEEE Int. Symp. Circuits Syst.*, May 2002, vol. 5, pp. 721–724.
- [5] C.-F. Hsu, Y.-H. Huang, and T.-D. Chiueh, "Design of an OFDM receiver for high-speed wireless LAN," in *Proc. IEEE Int. Symp. Circuits Syst.*, May 2001, vol. 4, pp. 558–561.
- [6] J. Liu and J. Li, "Parameter estimation and error reduction for OFDM-based WLANs," *IEEE Trans. Mobile Comput.*, vol. 3, no. 2, pp. 152–163, Apr. 2004.
- [7] C. S. Peng and K. A. Wen, "Synchronization for carrier frequency offset in wireless LAN 802.11a system," Wireless Personal Multimedia Communications, vol. 3, pp. 1083–1087, Oct. 2002.
- [8] M. Verhelst, W. Vereecken, M. Steyaert, and W. Dehaene, "Architecture for low ultra-wideband radio receivers in the 3.1–5-GHz band for data rates <10 Mbps," in *Proc. Int. Symp. Low Power Electron. Des.*, Aug. 2004, pp. 280–285.
- [9] I. D. O'Donnell, S. W. Chen, B. T. Wang, and R. W. Brodersen, "An integrated, low power, ultra-wideband transceiver architecture for lowrate indoor wireless system," in *Proc. IEEE CAS Workshop Wireless Commun. Network.*, Sep. 2002.
- [10] A. Batra et al., Multi-Band OFDM physical layer proposal Sep. 2003, IEEE P802.15-03/267r6-TG3a.
- [11] J. Foerster, Channel modeling sub-committee report, final Feb. 2003, IEEE P802.15-02/490r1-SG3a.
- [12] H.-Y. Liu et al., "A 480 Mb/s LDPC-COFDM-based UWB baseband transceiver," in Dig. Tech. Papers IEEE Int. Solid-State Circuits Conf., Feb. 2005, pp. 444–446.