# Cell Search in WCDMA Under Large-Frequency and Clock Errors: Algorithms to Hardware Implementation

Chi-Fang Li, Member, IEEE, Yuan-Sun Chu, Jan-Shin Ho, and Wern-Ho Sheen, Member, IEEE

Abstract—This paper proposes a complete cell-search solution for real-world handset operations in the wide-band code-division multiple-access system. Large-frequency and clock errors are induced at initial search due to an inaccuracy of crystal oscillators within handsets and could cause fatal performance degradation. In addition to a primary search algorithm for achieving essential pseudo-noise code acquisitions, an enhanced algorithm combining frequency offset compensation, random sample per frame, and sample point reordering is proposed to achieve fast initial search even while an inaccuracy is up to 12 ppm. After a series of trade-off between performance and complexity, two search bins in the first stage are adopted to mitigate possible errors in each bin and make the enhanced schemes become effective. In final, the complete solution is presented with alternative algorithms: a primary one and an enhanced one. The solution is implemented with the flexibility for switching algorithms in different searches and using low-power methods for the sake of handsets' working time. Its core area is  $3.8 \times 3.8$ -mm<sup>2</sup> in a 3.3-V 0.35-mm CMOS technology, and the power consumption is 35.6 and 67 mW for the primary and enhanced algorithms, respectively.

*Index Terms*—Cell search, clock error, frequency error, frequency offset compensation (FOC), low-power design, random sample per frame (RSPF), sample point reordering (SPR), wide-band code-division multiple-access (WCDMA).

## I. INTRODUCTION

THE third generation (3G) cellular systems have been standardized to provide higher data rates, better quality, and larger capacity than the second generation (2G) systems. The wide-band code-division multiple-access (WCDMA) system plays an important role in the 3G cellular systems because of its compatible networking architecture to the present popular GSM systems as well as the salient features of a CDMA system, including multi-path fading tolerance, high system capacity, low power consumption, good performance and coverage [1]–[4]. In a CDMA system, a procedure used by a mobile station (MS) to search for the best cell site and achieve code, time, and frequency synchronization with it is referred to as

J.-S. Ho is with the Department of Communication Engineering, National Penghu University, Penghu 880, Taiwan, R.O.C.. (e-mail: jsho@npu.edu.tw).

W.-H. Sheen is with the Department of Communication Engineering, National Chiao Tung University, Hsinchu 300, Taiwan, R.O.C. (e-mail: whsheen@cm.nctu.edu.tw).

Digital Object Identifier 10.1109/TCSI.2008.916394

*cell search*. Fast cell search is essential to reduce switch-on delay (initial cell search), increase stand-by time (idle-mode search) and maintain good link quality (active-mode search) [5]. This is particularly true for the WCDMA system since it employs nonsynchronous base stations instead of synchronous ones of other CDMA systems to extend coverage from outdoor to indoor [6]–[8]. A three-stage search procedure has been adopted in the WCDMA system in order to facilitate fast cell search, including: 1) slot synchronization; 2) frame synchronization with code-group identification and 3) scrambling-code detection [8].

A great deal of researches has been contributed to design and analysis of the cell search in the WCDMA system. Pipelining three stages was proposed to achieve faster search than serial execution at the cost of higher complexity and power consumption [5], [9]. A fast method using adaptive filters was proposed to help code phase acquisition in additive white Gaussian noise (AWGN) channels [10]. Another scheme was proposed to enhance initial cell search against frequency errors by using correlation properties of synchronization codes [11]. A method of coherent slot detection was also presented to improve search performance under frequency offset [12]. A differential detection method was also proposed to make cell search robust to wide range of initial frequency offset [13]. Partial symbol de-spreading (PSD) with noncoherent combining was used to overcome large-frequency error [5], [14]. Such large-frequency error comes from nonsynchronized frequency between oscillators of base and mobile stations. In consumer electronics, it is common that oscillators of handsets may have an inaccuracy of up to 12 parts per million (ppm). This imperfection incurs yet clock error at the analog-to-digital converter (ADC) before accurate frequency estimation. The amount of clock error may exceed one-chip duration during the course of initial cell search and that results in serious search failure [15]. Oversampling and passing multiple time-code candidates between search stages were presented to attempt to enhance performance if the clock error is assumed to result in nonideal sampling [16]. In hardware aspect, there was a reconfigurable low-power cell search engine presented by using memory-based digital filter to deal with different code synchronization processes in three stages [17]. Another low-power application-specific integrated circut (ASIC) implementation of cell search was also presented by using pointer-based first-in first-out (FIFO) buffers and a low-complexity magnitude approach [18].

In real-world handsets, however, an inaccuracy of handsets' oscillators may be up to 12 ppm or even more. It could result in 24-kHz or more frequency offset, which may be more serious than those assumptions in previous research cases. In addition,

Manuscript received March 11, 2005; revised September 2, 2005. This paper was recommended by Associate Editor T. Arslan.

C.-F. Li is with the Industrial Technology Research Institute, Hsinchu 310, Taiwan, R.O.C. (e-mail: richard929@itri.org.tw).

Y.-S. Chu is with the Department of Electrical Engineering, National Chung Cheng University, Chiayi 621, Taiwan, R.O.C. (e-mail: chu@ee.ccu.edu.tw).



Fig. 1. Simplified frame structure for cell search in WCDMA.

both frequency and clock errors have to be dealt with at the same time and at the cost of allowable hardware complexity in order to extend working time of handsets. In previous researches, few works dealt with the problem of clock error.

This paper presents an enhanced algorithm combining frequency offset compensation (FOC), random sample per frame (RSPF), and sample point reordering (SPR) against large-frequency and clock errors. Their performance and complexities are both considered for practical applications in which an inaccuracy of handsets' oscillators is assumed within 12 ppm. A complete search solution is presented with two algorithms, a primary one and the enhanced one, for different search modes. Moreover, the solution is going to be implemented with flexibility for switching algorithms in different searches and lowpower concepts for the sake of handsets' working time. Entire design of the solution is detailed from algorithm to hardware implementation in the paper.

The paper is organized as follows. Section II describes cell search in the WCDMA system and the primary algorithm in this work. Section III presents algorithm design of the enhancements under large-frequency and clock errors and tradeoff between performance and complexity. Hardware architecture, low-power design and implementation results are addressed in Section IV. Section V concludes the paper with remarkable results.

# II. CELL SEARCH

## A. System Specifications

Fig. 1 is a simplified frame structure of the WCDMA system, where the primary synchronization channel (PSCH), secondary synchronization channel (SSCH), and common pilot channel (CPICH) are devised to achieve fast time, code, and frequency synchronization with asynchronous base stations [8]. The 10-ms frame consists of 15 slots, and for the chip rate of 3.84 M-chips/ sec, there are 38400 chips in a frame and 2560 chips in a slot. PSCH and SSCH are both 256-chip long and only transmitted in the beginning of the slot boundaries.

Several admirable features have been incorporated into the WCDMA system in order to facilitate fast cell search. First of all, 512 scrambling codes are used to differentiate base stations in the downlink and they are further divided into 64 groups to alleviate the complexity of base station identification. The scrambling code has 38400 chips and always starts at the frame boundary.

A scheme of combining slot and frame synchronization, code grouping and identification is devised for fast scrambling code

determination. By using the same PSCH for each cell and by transmitting PSCH at the slot boundaries only, slot synchronization (stage 1) can be easily achieved by synchronizing to PSCH. Besides, a generalized hierarchical Golay (GHG) sequence is employed as the primary synchronization code (PSC) to provide satisfactory autocorrelation property and low implementation complexity at the same time [19]. Furthermore, frame synchronization with code-group identification (stage 2) can be achieved by detecting SSCH that uses one of the 16 orthogonal sequences as the spread codes, called the secondary synchronization codes (SSCs). In addition, SSCH is encoded into one of 64 codewords by using a (15, 3) comma-free Reed-Solomon (CFRS) code, with each codeword representing a code group. Because of the nice property of "comma free," that is, any cyclic shift of a codeword is not a codeword [20], the frame synchronization and code-group identification can be achieved simultaneously. Finally, the scrambling code can be determined easily from one of the eight codes in the identified group by using CPICH (stage 3).

## B. A Primary Algorithm

By means of nice properties of PN codes, the cell search can be accomplished orderly in three stages: 1) slot synchronization; 2) frame synchronization with code-group identification; and 3) scrambling-code determination. Many researches have been contributed to improve cell search in various aspects [8]. In this subsection, a *Primary* algorithm is introduced to help understand essential de-spreading, decoding, and decision mechanisms in the cell search. In our design, the primary algorithm is intended for cell search in ordinary scenarios and an enhanced algorithm based on the primary one will be presented later to achieve fast synchronization under large-frequency and clock errors.

Fig. 2 illustrates main mechanisms of three stages in the primary algorithm. Firstly, slot synchronization is meant to search for the slot boundaries of the base station that appears to have the largest power to the mobile station. After analog to digital conversion and chip shaping, received signal r(i) is de-spread (correlated) with PSC at chip rate and noncoherently combined as follows:

$$y_h^{\text{PSC}}(s) = \left| \sum_{i=0}^{255} r(2560s + h + i) \cdot c_{\text{PSC}}(i) \right|,$$
  
$$h = 0, \cdots, 2559, s = 0, \cdots, N - 1$$
(1)



(2)

Fig. 2. Primary cell search algorithm.

where  $c_{\rm PSC}(i)$  is the PSC, h is an index of slot-boundary hypotheses, and s is an index of accumulation slots. There are total 2,560 hypotheses and de-spread results,  $y_h^{\rm PSC}(s)$ , is accumulated over N slots for each slot-boundary hypothesis. The accumulation is intended for the effect of interference in a low signal-to-interference ratio (SIR) environment. The hypothesis  $\hat{h}$  with the maximal accumulated result is chosen as the slot boundary.

Second, frame synchronization with code-group identification can be accomplished by detecting SSCs and CFRS codes. In the beginning, the received signal r(i) starting at the selected slot boundary  $\hat{h}$  is de-spread with 16 SSCs, respectively. Recall that the SSC and PSC are aligned to slot boundaries and therefore, SSCs can be correctly de-spread once the slot boundary becomes available. Subsequently, the outputs of I/Q de-spreaders are coherently combined by using the phase reference coming from PSC de-spread output,  $y_{(I,Q)}^{PSC}(\hat{h},s)$ , in order to improve detection performance [5]. The output of the *j*th coherent combiner is given by

$$y_{s,\hat{h}}^{\text{SSC}}(j) = y_{(I)}^{\text{PSC}}(\hat{h}, s) \cdot \sum_{i=0}^{255} [r_I(\hat{h} + 2560s + i) \cdot c_{\text{SSC}_j}(i)] + y_{(Q)}^{\text{PSC}}(\hat{h}, s) \cdot \sum_{i=0}^{255} [r_Q(\hat{h} + 2560s + i) \cdot c_{\text{SSC}_j}(i)]$$

where  $c_{\text{SSC}_j}(i)$  is the *j*th SSC. Hard-decision on the SSC symbol is made in every slot, and after a 15-slot duration, CFRS decoding determines the code group,  $\hat{g}$ , and the frame boundary,  $\hat{s}$ , simultaneously by means of Hamming distances [21].

Thirdly, one of eight scrambling codes in the  $\hat{g}$ th group is identified through a majority-vote mechanism [5]. The received CPICH is de-spread with eight possible scrambling codes, that is

$$y_{m,k}^{\text{SCR}} = \left| \sum_{i=0}^{255} r(\hat{h} - 2560 \cdot \hat{s} + 256m + i) \cdot c_{\text{SCR}_k}^* (256m + i) \right|,$$
  
$$k = 0, 1, \dots, 7, m = 0, 1, \dots, 149$$
(3)

where  $c_{\text{SCR}_k}$  is the *k*th scrambling code in the  $\hat{g}$ th group. The scrambling code with the maximal  $y_{m,k}^{\text{SCR}}$  acquires one vote. After processing up to one frame (150 votes), the maximal is tested against a threshold to guarantee a low false alarm probability. Once the threshold is exceeded, the  $\hat{k}$ th scrambling code in the  $\hat{g}$ th group is identified as the cell's identification. Otherwise, a new trial for searching the scrambling code continues.

After the three stages,  $(\hat{h}, \hat{s})$  provides correct timing for slot and frame boundaries, and  $(\hat{g}, \hat{k})$  provides the identification of the cell-specific scrambling code to read system broadcast information.

The default process of three stages is pipelined instead of serial in order to perform more trials for fast synchronization [5],



Fig. 3. Performance of the primary algorithm, where  $f_{\Delta}$  and  $f_{\Delta}$  are the frequency offset and doppler frequency, respectively.

[9]. According to the frame structure in Fig. 1, it is natural to select one frame as the processing time for each stage, i.e., the default dwell time of a stage is 15 slots (10 ms).

Cumulative density function (CDF) of detection is used for performance comparison. Fig. 3 illustrates that the primary algorithm can provide fast synchronization if the frequency offset is limited. Therefore, it is intended for the cell search in idle and connected modes in our design. When the frequency offset might be up to 12 kHz or more, serious performance degradation arises as shown in Fig. 4. However, an inaccurate oscillator of up to 12 ppm is possibly equipped in a handset due to limited prime cost of mass consumer electronics. Before an accurate frequency estimation and control mechanism starts off in the receiver, the inaccuracy incurs large-frequency offset in received signals and clock error in the analog-to-digital converter. Accordingly, the cell search during initialization of a handset requires more algorithmic schemes to deal with large-frequency and clock errors.

# III. ENHANCED ALGORITHM UNDER LARGE-FREQUENCY AND CLOCK ERRORS

#### A. Enhanced Schemes

Table I shows frequency and clock errors under various ranges of inaccuracy of handset oscillators. Obviously, initial



Fig. 4. Performance degradation of the primary algorithm under large-frequency errors.

TABLE I CARRIER—PHASE ROTATION AND CLOCK DRIFT

| OSC                 | FO    | Carrier I | Phase Rota | Clock Drift (chip) |       |       |       |
|---------------------|-------|-----------|------------|--------------------|-------|-------|-------|
| Inaccuracy<br>(ppm) | (kHz) | 256-chip  | 128-chip   | 64-chip            | 10-ms | 20-ms | 30-ms |
| 0                   | 0     | 0         | 0          | 0                  | 0     | 0     | 0     |
| 3                   | 6     | 0.8       | 0.4        | 0.2                | 0.115 | 0.230 | 0.347 |
| 6                   | 12    | 1.6       | 0.8        | 0.4                | 0.154 | 0.307 | 0.461 |
| 9                   | 18    | 2.4       | 1.2        | 0.6                | 0.230 | 0.461 | 0.691 |
| 12                  | 24    | 3.2       | 1.6        | 0.8                | 0.461 | 0.922 | 1.382 |

cell search has to deal with such large-frequency and clock errors.

During cell search, the frequency error (also called as frequency offset) will result in carrier phase rotation, which degrades the effectiveness of processing gain in spread-spectrum techniques. Table I lists possible phase rotations under different frequency offsets. The default spreading factor is 256 in (1)–(3) and then it suffers from acute phase rotation even though frequency offset is not too large. PSD was used to partition the spreading length into 128, 64, or fewer chips to restrain phase rotation within limits [6], [14]. An example of partitioning off into a 64-chip case can be written out as shown in (4)–(6) at the bottom of the page, where  $y_{(I,Q)}^{PSC}(\hat{h}, s) \triangleq \sum_{i=64l}^{64l+63} r_{(I,Q)}(\hat{h} + 2560s + i) \cdot c_{PSC}(i)$ .

In fact, the processing gain is also reduced while the phase rotation is portioned out. The PSD method cannot be applied to deal with large-frequency offsets. As shown in Table I, phase

$$y_{h}^{\text{PSC}}(s) = \sum_{l=0}^{3} \left| \sum_{i=64l}^{64l+63} r(2560s + h + i) \cdot c_{\text{PSC}}(i) \right|$$
(4)  
$$y_{s,\hat{h}}^{\text{SSC}}(j) = \sum_{l=0}^{3} \left\{ y_{(I)}^{\text{PSC}}(\hat{h}, s) \cdot \sum_{i=64l}^{64l+63} [r_{I}(\hat{h} + 2560s + i) \cdot c_{\text{SSC}_{j}}(i)] + y_{(Q)}^{\text{PSC}}(\hat{h}, s) \cdot \sum_{i=64l}^{64l+63} [r_{Q}(\hat{h} + 2560s + i) \cdot c_{\text{SSC}_{j}}(i)] \right\}$$
(5)  
and  $y_{m,k}^{\text{SCR}} = \sum_{l=0}^{3} \left| \sum_{i=64l}^{64l+63} r(\hat{h} - 2560 \cdot \hat{s} + 256m + i) \cdot c_{\text{SCR}_{k}}^{*}(256m + i) \right|$ (6)



Fig. 5. Concept of FOC.

rotation is still up to 0.8 at 24-kHz frequency offset even if spreading length is down to 64 chips. A more effective scheme should be elaborated to avoid phase rotation and conserve processing gain at the same time.

On the other hand, the clock error will affect the source clock of the ADC in a handset. The drifting source clock makes the ADC to sample at an uncertain rate, which is not an assumed rate synchronous to the chip rate or its over-sample rates. The clock error (drift) is accumulated during the course of a trial. Table I also lists possible amounts of clock errors after 10, 20, and 30 ms. In general, the tolerant timing error after code acquisition is  $\pm T_c/2v$ , where  $T_c$  the period of one chip and vis the over-sample rate of input data. In the primary algorithm, its input data is assumed at chip rate, i.e., v = 1. The tolerant timing error after slot synchronization is  $\pm T_c/2$ . However, the error will be extended up to more than one chip (>  $T_c$ ) due to the clock drift, and it is possible to render the three-stage cell search a total failure if the acquired sampling point (the slot boundary) at the first stage is used for all the three stages.

Oversampling in stage 1 and choosing multiple candidates for stage 2 and 3 were proposed to increase the timing resolution and the detection probability, respectively [16]. However, oversampling process (v = 2, 4, 8, ...) can only reduce the tolerant timing error, but it results in high complexity and cannot cure of the clock drift. Choosing multiple candidates helps trace the identified slot and frame boundaries, but it also results in high complexity in stage 2 and 3. More effective methods against the clock drift are necessary.

An enhanced algorithm combining FOC, RSPF, and sampling-point reordering (SPR) is proposed to provide fast search under large-frequency and clock errors. Since the frequency error is too large to compensate during de-spread, coarse frequency estimation and direct compensation before cell search are applied in the FOC method. The frequency offset is estimated and compensated explicitly by dividing offset range into



Fig. 6. Concept of SPR.

different "bins," and the stage-1 detection is performed for each bin in parallel. Fig. 5 shows the concept of FOC method in cell search. Coarse frequency offset is estimated according to results of all stage-1 bins. The estimated frequency offset,  $\Delta \hat{f}$ , in the selected bin is then used for the following stage-2 and stage-3 detections of the same trial. The received r(i) in (1)–(3) are replaced by using  $r_{\text{FOC}}(i)$  as follows:

$$r_{\rm FOC}(i) = r(i) \cdot e(i) e(i) = \cos(2\pi i \Delta f_b T_c^{(b)}) + j \sin(2\pi i \Delta f_b T_c^{(b)})$$
(7)

where  $\Delta f_b$  is the assumed frequency offset to the carrier frequency on the *b*th bin and its value is derived by dividing possible range of frequency offset into b sections (corresponding to b bins) and then  $\Delta f_b$  is the offset of a bin's central frequency from the carrier frequency.

Since each bin has to perform stage-1 search independently, multiple bins of stage 1 are performed in parallel. The number of bins has to be selected carefully in order to achieve good performance and reduce complexity at the same time.

The SPR is proposed to counteract the problem of clock drift. Fig. 6 shows the main idea of SPR method. First, the range of clock drift is divided into several bins, like that in the FOC method. Suitable number of bins and quantity of clock drift depend on the accuracy of crystal oscillators applied practically in the handsets. All bins need to cover the whole range of the inaccuracy of oscillator, which including from onward to backward cases. Each bin will have an assumed quantity of clock drift. Then, according to the assumed quantity of clock drift of a specific bin, a mechanism of dropping or stuffing one sample from or into incoming data sequence is employed to do the compensation when the drift is accumulated for being up to one sample. In the figure, for example in case of onward sampling, if the sampled point 4' can be used instead of 3', a correct timing can be acquired as that in the ideal case. Therefore, the sampled point 4' is reordered as 3'.

The above two methods divide frequency offset into several bins and compensate frequency and clock errors in each bin, respectively. More bins provide more precise estimation and compensation. However, the number of bins cannot be increased unlimited due to the high complexity of stage 1. In other words, certain frequency and clock errors still exist in each bin. The RSPF method is applied to mitigate these errors by using sampling points randomly in different frames (stages) [15]. The basic idea of RSPF is that different sampling points within one chip duration are used as the chip's sampled data in different frames so as to mitigate nonideal sampling and the clock drift



Fig. 7. Concept of the enhanced algorithm against large-frequency and clock errors.

accumulation. In order to use random sampling points, oversampling ( $v \ge 2$ ) is needed in the RSPF method.

The entire solution is illustrated in Fig. 7. The three schemes are combined as a pre-processing block and it is set in front of all stages and bins independently. The coarse frequency estimation is made decision according to the results of all stage-1 bins. The estimated frequency offset value is passed to the same trial's stage-2 and stage-3 processes.

## B. Performance Simulations and Comparisons

The same simulation model and parameters are used like the previous section. In addition, oversampling rate is assumed as 2 (v = 2), the inaccuracy of handset's oscillator is assumed as 12 ppm, the geometric factor is set as 6 dB (G = 6 dB), and the channel is a flat Rayleigh fading with maximum Doppler shift of 185.2 Hz (100 km/hr).

Fig. 8 illustrates CDF of different cell search algorithms. The algorithms combining SPR, RSPF, and FOC provide much better performance than others, such as those combining RSPF, SPR, and PSD, those combining only RSPF and FOC (traditional coarse frequency estimation cases), and the case using RSPF only. Obviously, the SPR and FOC methods are verified to be suitable and necessary to counteract frequency and clock errors at the same time.

Fig. 9 shows mean search times of different algorithms when different stage-1 bins are used. In the figure, the RSPF method is justified to be helpful to search performance in both kinds of algorithms, those combining RSPF, SPR, and PSD and those combing RSPF, SPR, and FOC. Besides, the RSPF method could be said as be helpful to reduce the number of stage-1 bins. In the algorithm combining RSPF, SPR, and FOC, 2-bin search can achieve synchronization in below 200 ms. However, the algorithm combining only SPR and FOC needs 3 to 4 bins to achieve the same performance. Accordingly, the



Fig. 8. CDF of search times using different search methods.

RSPF is involved into the enhanced algorithm. The number of bins could be selected as 2 or 3 after their complexities are calculated and compared.

Fig. 10 illustrates CDFs of the two main kinds of algorithms if multiple timing candidates (MTCs) are processed in stage 2 and 3. The MTC method was presented to enhance search performance at the cost of high complexities in stages 2 and 3 [16]. As shown in the figure, the MTC method provides only slight improvement on search performance at the cost of high complexity of multiple stage-2 and stage-3 processes. Detailed complexity calculations of concerned algorithms are illustrated in the following subsection.

According to the above performance comparisons, two kinds of search algorithms, RSPF+FOC+SPR and RSPF+PSD+SPR, are concerned in further work and their complexities are calculated and compared in the following subsection.



Fig. 9. Mean search time of different search methods.



Fig. 10. CDF of two main kinds of search algorithm after using MTC.

#### C. Complexity Calculations and Comparisons

According to functional properties, cell search can be divided into four main functional parts, including preprocessing, stage-1, stage-2, and stage-3. Their complexities are calculated in million operations per second (MOPS).

1) Preprocessing Complexity: Three functional schemes, the SPR, RSPF, and FOC are included in this part. Fig. 11 shows that the SPR compensates clock error first, then the RSPF chooses sample points randomly as the input data at chip rate, and finally the FOC compensates frequency error on input chip data.

The first two functions, SPR and RSPF, both need to count and adjust selections at special times. Their complexities can be simplified as one operation for each input sampled points. Therefore, both functions demand 3.84v MOPS, where v is the oversampling rate of ADC. The complexity of FOC can be calculated according to (7) where complex-valued multiplications are employed to compensate frequency. The FOC operations need 4 multiplications and two additions to perform these complex-valued multiplications at chip rate. Hence, a FOC function needs

$$(4+2) \times 3.84 \times 10^6 = 2.304 \times 10^7 = 23.04$$
 MOPS.

When the three functions are involved concurrently, each preprocessing block needs 23.04 + 7.68v MOPS.



Fig. 11. Performance of the robust complete search solution.

2) Stage-1 Complexity: In stage-1 process, there are four schemes available, including default, PSD, MTC, and multi-bin searches. The default detection scheme is as shown in (1) and the accumulation length is over 15 slots. In the default scheme, chip-rate operations include (1) and the accumulation shown in Fig. 2. Default PSC detection is assumed as using efficient Golay correlation (EGC) in [19] and , [23] instead of matched filter operations described in (1) since the PSC is a GHG code. It uses only 13 additions for matching a 256-chip PSC. Real and imaginary parts (I-branch and Q-branch) of input data both need the PSC detection. Non-coherent combination needs two squares and one addition for each slot boundary hypothesis. The accumulation needs one addition per chip time. Then 2559 comparisons are needed only per frame and they are coordinated as additions due to their similar complexities. Accordingly, the number of operations in stage 1 per second can be counted as follows:

Additions :  $(13 \times 2+1+1) \times 3.84 \times 10^{6}+2559 \times 10^{2}$ = 1.077759 × 10<sup>8</sup> Multiplications : 2 × 3.84 × 10<sup>6</sup> = 7.68 × 10<sup>6</sup> Total : 1.077759 × 10<sup>8</sup> + 7.68 × 10<sup>6</sup> = 115.4559 × 10<sup>6</sup> = 115.46 MOPS.

In the PSD scheme, it is assumed that a symbol is separated into four segments in (4), and the separation influences complexities of de-spreading and (non)coherent combining. For stage 1, the EGC is not suited to perform PSD. A hybrid EGC-HMF (hierarchical matched filter) detector needs only 18 additions for 4-segment PSC matching [24]. In addition, 4-segment PSD also results in eight multiplications and seven additions required in the noncoherent combination of stage 1. Stage-1 operations become

Additions : 
$$(18 \times 2+7+1) \times 3.84 \times 10^{6} + 2559 \times 10^{2}$$
  
= 1.692159 × 10<sup>8</sup>  
Multiplications :  $8 \times 3.84 \times 10^{6} = 3.072 \times 10^{7}$   
Total : 1.692159 × 10<sup>8</sup> + 3.072 × 10<sup>7</sup>  
= 1.999359 × 10<sup>8</sup> = 199.94 MOPS.

The MTC method only increases some comparison operations at the end of stage 1. It is assumed  $L_2$  candidates are chosen in the stage, then extra 0.00256 ( $L_2 - 1$ ) MOPS are needed for determining  $L_2$  timing candidates.

When multiple bins are used in stage 1, the complexity are multiplied by  $B_1$ , the number of stage-1 bins. The complexities of multiple bins are 115.46  $B_1$  and 199.94  $B_1$  MOPS for default and PSD cases.

3) Stage-2 Complexity: In stage 2, three optional schemes are possible to enhance performance, including default, PSD, and MTC. The default one is as described in (2) and Fig. 2. In the default scheme, SSC detection occurs only at the identified slot boundaries. According to (2), 255 additions are used to correlate with one of 16 SSCs in either branch, and 2 multiplications and one addition are used to combine coherently. Then 16 comparisons are categorized as additions similarly. CFRS decoding requires 960 15-symbol correlations (additions) and 959 comparisons to determine the maximum value. The operations of stage 2 per second is counted as

Additions : 
$$[(255 \times 2+1+1) \times 16 \times 15+960 \times 15+959]$$
  
  $\times 10^{2} = 1.38239 \times 10^{7}$   
Multiplications:  $2 \times 16 \times 15 \times 10^{2} = 4.8 \times 10^{4}$   
Total :  $1.38239 \times 10^{7} + 4.8 \times 10^{4}$   
 $= 1.3719 \times 10^{7} \doteq 13.72$  MOPS.

While the PSD in (5) is used in stage 2, 63 additions are needed for each segment de-spread. In addition, 4-segment PSD also results in eight multiplications and seven additions required in the coherent combination. Stage-2 operations become

Additions : 
$$[(63 \times 4 \times 2 + 7 + 1) \times 16 \times 15 + 960 \times 15 + 959] \times 10^2 = 1.38239 \times 10^7$$
  
Multiplications :  $8 \times 16 \times 15 \times 10^2 = 1.92 \times 10^5$   
Total :  $1.38239 \times 10^7 + 1.92 \times 10^5$   
 $= 1.40159 \times 10^7 \doteq 14.02$  MOPS.

When the MTC is activated in the previous stage-1 process, the complexities of stage-2 are multiplied by  $L_2$ , the number of stage-2 replicas. The complexities of MTC are  $13.72L_2$  and  $14.02L_2$  MOPS for default and PSD cases.

4) Stage-3 Complexity: In stage 3, default, PSD, and MTC schemes are possible to enhance its performance. The default one is as described in (3) to Fig. 2. In the default scheme, the chip-rate operations in (3) include the complex-value de-spreading and coherent accumulations, where the former needs two additions to reform I/Q-branches and the latter needs two additions for their accumulations. Noncoherent combining needs two squares and one addition for each symbol of de-spreading. Voting needs seven comparisons to make decision on each ballot. 150 additions are used to count ballots and seven comparisons are used to determine the candidate with major ballots

Additions:  $(2+2) \times 8 \times 3.84 \times 10^{6} + [(1 \times 8 + 7 + 1) \times 150 + 7] \times 10^{2} = 1.231207 \times 10^{8}$ Multiplications:  $2 \times 8 \times 150 \times 10^{2} = 2.4 \times 10^{5}$ Total:  $1.231207 \times 10^{8} + 2.4 \times 10^{5}$  $= 1.233607 \times 10^{8} \doteq 123.36$  MOPS. Note that in the above complexity calculations, to correlate with a PN-code is very simple and ignorable since a PN-code is a set of  $\{\pm 1\}$ .

In the PSD scheme, eight squares and seven additions are required in the noncoherent combination as shown in (6). Stage-3 operations become

Additions : 
$$(2+2) \times 8 \times 3.84 \times 10^{6} + [(7 \times 8 + 7 + 1) \times 150 + 7] \times 10^{2} = 1.238407 \times 10^{8}$$
  
Multiplications :  $8 \times 8 \times 150 \times 10^{2} = 9.6 \times 10^{5}$   
Total :  $1.238407 \times 10^{8} + 9.6 \times 10^{5}$   
 $= 1.248007 \times 10^{8} \doteq 124.8$  MOPS.

When the MTC is activated in the previous stage-2 process, the complexities of stage-3 are multiplied by  $L_3$ , the number of stage-3 replicas. The complexities of MTC are  $123.36L_3$  and  $124.8L_3$  MOPS for default and PSD cases.

5) Complexity Summary: Table II summarizes the complexities of the primary search methods. The *default* column is the required MOPS for each of functions listed in the first column. The next four columns show the MOPS when SPR, FOC, PSD and MTC are used in the three stages of detections. The remaining four columns show the total MOPS of different combined search algorithms. The complexities of the concerned two kinds of algorithms, RSPF+FOC+SPR and RSPF+PSD+SPR are listed in Table III.

#### D. Summaries

After performance comparisons, two kinds of search algorithms are concerned, i.e., RSPF+FOC+SPR and RSPF+PSD+SPR. Their complexities have been calculated and shown in Table III and their mean search times are available in Fig. 9. Accordingly, the RSPF+FOC+SPR algorithm uses lower complexity and provide faster search than the other. The algorithm with  $B_1 = 2$  or  $B_1 = 3$  is the most efficient one. Consequently, the enhanced algorithm is chosen as the "RSPF+FOC+SPR" with  $B_1 = 2$  because it provides faster synchronization and takes lower complexity than the other ones. A MTC option seems not worthy due to its high complexities as shown in Table II and only little improvement as shown in Fig. 10.

The complete solution for cell search in WCDMA is presented as a combination of the primary algorithm and the other one enhanced with "RSPF+FOC+SPR" ( $B_1 = 2$ ). The enhanced "RSPF+FOC+SPR" algorithm is applied to initial cell search to achieve fast synchronization under large-frequency and clock errors. Its complexity is 521.6 MOPS. On the other hand, the primary algorithm can be used for idle- and active-mode searches to provide fast search using only necessary complexity. Its complexity is 252.54 MOPS only. Alternative algorithms can be switched according to operation states of the WCDMA handsets. Fig. 11 illustrates the performance of the robust solution under different ranges of frequency offset, where the robust solution means the proposed solution of alternative search algorithms, and the traditional one means using an algorithm without any enhancement. The robust solution can provide 90% of successful searches within 70 ms using the low-complexity primary algorithm in the idle- and active-mode

| 1       | Defaul<br>t | SP<br>R         | FOC             | PSD        | MT<br>C         | RSPF+FOC+SP<br>R                                            | RSPF+FOC+MT<br>C                                                                                                                                               | RSPF+PSD+SP<br>R                                             | RSPF+PSD+MT<br>C                                             |
|---------|-------------|-----------------|-----------------|------------|-----------------|-------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------|--------------------------------------------------------------|
| SPR     | 3.84 U      | -               | -               | -          | -               | 3.84 U (2+ B <sub>1</sub> )                                 | -                                                                                                                                                              | 3.84 U (2+ B <sub>1</sub> )                                  | -                                                            |
| RSPF    | 3.84 U      | -               | -               | -          | -               | 3.84 U (2+ B <sub>1</sub> )                                 | 3.84 U (2+ B <sub>1</sub> )                                                                                                                                    | 3.84 U (2+ B <sub>1</sub> )                                  | 11 <b>.52</b> U                                              |
| FOC     | 23.04       | -               | -               | -          | -               | 23.04(2+ B <sub>1</sub> )                                   | 23.04(2+ B <sub>1</sub> )                                                                                                                                      | -                                                            | -                                                            |
| Stage 1 | 115.46      | ×B <sub>1</sub> | ×B <sub>1</sub> | 199.9<br>4 | -               | 115.46B <sub>1</sub>                                        | 115.46B <sub>1</sub>                                                                                                                                           | 199.94 <b>B</b> 1                                            | 199.94                                                       |
| Stage 2 | 13.72       | -               | -               | 14.02      | ×L <sub>2</sub> | 13.72                                                       | 13.72L <sub>2</sub>                                                                                                                                            | 14.02                                                        | 14.02L <sub>2</sub>                                          |
| Stage 3 | 123.36      | -               | -               | 124.8      | ×L <sub>3</sub> | 123.36                                                      | 123.36L <sub>3</sub>                                                                                                                                           | 124.8                                                        | 124.8L <sub>3</sub>                                          |
| Total   | -           | -               | -               | -          | -               | 7.68 U (2+ B <sub>1</sub> )+<br>138.5B <sub>1</sub> +183.16 | $\begin{array}{c} \textbf{3.84} \ \upsilon \ (\textbf{2+B_1}) + \\ \textbf{138.5B_1} + \textbf{13.72L_2} \\ + \textbf{123.36L_3} + \textbf{46.08} \end{array}$ | 7.68 U (2+ B <sub>1</sub> )+<br>199.94B <sub>1</sub> +138.82 | 11.52 U +199.94+<br>14.02L <sub>2</sub> +124.8L <sub>3</sub> |

 TABLE II

 COMPLEXITY COMPARISONS BETWEEN DIFFERENT SEARCHING METHODS

Note: U is the over-sample rate of ADC. B<sub>1</sub> is the number of bins in stage 1, and L<sub>2</sub> and L<sub>3</sub> are the number of candidates for stage 2 and 3, respectively. All complexity numbers are counted in MOPS.

TABLE III COMPLEXITY COMPARISONS OF CONCERNED ALGORITHMS USING DIFFERENT NUMBERS OF STAGE 1 BINS

|              | $B_1 = 2$ | $B_1 = 3$ | $B_1 = 4$ | $B_1 = 5$ |
|--------------|-----------|-----------|-----------|-----------|
| RSPF+FOC+SPR | 521.6     | 675.46    | 829.32    | 983.18    |
| RSPF+PSD+SPR | 600.1     | 815.44    | 1030.74   | 1244.92   |
| FOC+SPR      | 490.86    | 637.06    | 783.24    | 929.42    |

Note: All complexity numbers are counted in MOPS.

cell searches and within 400 ms using the enhanced algorithm in the initial cell search.

## IV. HARDWARE DESIGN AND IMPLEMENTATION

In recent years, system-on-a-chip (SoC) concept has received much attention, which benefits from state-of-the-art advancing technology of CMOS process. In a handset design, all functional circuits are integrated as a single system chip, and therefore a cell search solution is supposed to be designed and implemented as a macro-cell, which is applicable to a receiver SoC chip of WCDMA handsets. In this work, the cell search macro-cell is referred to as the cell search engine (CSE), which provides alternative cell search algorithms for different scenarios.

Before implementation of the robust search solution from algorithm into hardware, a series of fixed-point simulations are necessary to determine bit widths of input signals and internal data. Figs. 12 and 13 shows fixed-point simulation results of stage 1 and 2, respectively. Input signals to cell search can be quantized as 4-bit digital data with little performance loss in order to reduce circuit complexity. In addition, bit truncations are used after (non)coherent combinations, which can reduce memory size in stage 1 and circuit complexity in stage 2, respectively. Truncating off 12 and 10 bits is applied to (non)coherent combination results in stage 1 and 2.

#### A. Architecture Design

Two kinds of search algorithms have been designed and involved in the robust solution to achieve fast code synchronization in different channel conditions. The CSE has to



Fig. 12. Fixed-point simulation in stage 1.



Fig. 13. Fixed-point simulation in stage 2.

perform one of the two search algorithms alternatively on the same hardware platform. Fig. 14 illustrates architecture of the CSE, including two stage-1 modules, one stage-2 module and



Fig. 14. Architecture of the cell search engine.

one stage-3 module. Every stage module includes an individual preprocessing block.

When the CSE works as initial cell search, all stage modules are activated and their preprocessing blocks are also activated to perform the "RSPF+FOC+SPR" algorithm. On the other hand, in idle mode and active mode searches, only 'one' stage-1 module works with the stage-2 and stage-3 modules, where preprocessing blocks are all bypassed. After completing a successful cell search trial, the obtained chip timing, scrambling code, and/or estimated coarse frequency offset estimation from stage 1 are delivered to upper layers.

The word-length information of each block is also notified in the figure. Those word lengths with most significant bit (*MSB*) signs mean that only most significant word-length bits are used in the next blocks. In Stage-1 module, for example, de-spreading results after noncoherent combination are truncated into 11 bits only in order to reduce memory size of the next accumulator. Its performance loss due to quantization and truncation has been illustrated in Fig. 12. Similar truncation is also applied to the preprocessing block and the stage-2 module.

In the pre-processing block, three enhanced functions, SPR, RSPF and FOC, are designed to be capable of activated or bypassed respectively as shown in Fig. 15. The SPR consists of a tapped-delay line along with a multiplexer and a reordering controller. The initial selection of multiplexer is on the "0" position. According to accumulated amount of sampling error, the reordering controller adjusts the selection of multiplexer toward "+" or "-" to drop or stuff one sampling point from or into incoming data sequence. If bypass is required, the selection is always set at "0."

The RSPF always selects one of the sampling points within one chip duration as the input data to the three stages of detectors. It consists of a delay element to hold the first sample within one-chip duration, a 2-to-1 multiplexer to select the first or the second sample as the chip's sample value, and a controller to randomize the selection per frame. Its selection is changed randomly for every new frame if the RSPF is activated and the selection is fixed if the function is bypassed. Input data rate to the RSPF block is the real sample rate of ADC and its output data rate is the chip rate.

The FOC compensates frequency offsets of input data before three-stage processing at chip rate, 3.84 Mcps. It performs complex-valued multiplications using four multipliers and two adders. Corresponding "cos" and "sin" values of offset frequencies are quantized and pre-stored in a register file and can be refreshed for different channel conditions. When it is needed to bypass the FOC block, its inputs are bypassed to output directly.

In stage-1 module, since the PSC is a 256-chip GHG code, the well-known efficient Golay correlator (EGC) can be applied for the PSC de-spreading instead of traditional matched filter (MF) or hierarchical MF (HMF) [19], [23]. Using EGC architecture for the 256-chip PSC de-spreading results in only 13 additions instead of traditional 256 additions.

In addition, a register file of 2560 fields is required to accumulate over 15 slots for all slot boundary hypotheses. Partial results are bit-truncated before accumulation in order to reduce



Fig. 15. Preprocessing block.

the size of the register. Only the most significant 12 bits of partial results are accumulated, therefore each field needs 2 bytes for accumulation over 15 slots. That means a memory space of 5.12 kB is needed. In our design, a 5.12-kB SRAM is embedded as the register file for good chip integration.

The peak detector compares these 2560 accumulated values and the timing with the maximum value is identified as the slot boundary candidate. For the primary search algorithm, the candidate is treated as the slot boundary and used in the following stage-2 and stage-3 processes of the same trial. For the enhanced search algorithm, two stage-1 bins are activated, so their candidates are compared further and chosen the larger one as the slot boundary.

In stage-2 module, the CFRS decoding has to make a decision among 960 hypotheses before the next slot boundary comes. A  $1 \times 15$  systolic-array decoder is used for the CFRS decoding [21]. After CFRS decoding, the desired code group and frame boundary  $(\hat{g}, \hat{s})$  can be obtained and passed to the next stage 3.

In stage-3 module, one of eight scrambling codes in the identified  $\hat{g}$  th group needs to be determined. Eight de-scramblers are employed to de-spread the received CPICH in parallel. Each de-scrambler performs complex-valued de-spreading and noncoherent combining. After de-scrambling, a majority-vote mechanism is implemented by using a set of comparators and vote-counters (incrementors). The detected results, slot boundary, frame boundary, code group, scrambling code, are all sent to upper layers.

#### **B.** Implementation Results

The hardware implementation is carried out through a "topdown" cell-based ASIC design flow in a 3.3-V 0.35- $\mu$ m CMOS 1P4M technology. Power consumptions are obtained from their post-layout simulations by *EPIC's Powermill* tool. The sampling rate of ADC is assumed as 7.68 MHz (oversample by 2) and data rates of stage 1, 2, and 3 are 3.84 MHz (system chip rate).

Table IV lists out hardware areas and power dissipation of main blocks and the whole cell search ASIC. The cell search ASIC consumes 74.56 and 120.42 mW while the default algorithm and the enhanced algorithm are performed respectively.

TABLE IV HARDWARE AREA AND POWER DISSIPATION OF THE CELL SEARCH ASIC

| Functional    | Area               | Power            |  |  |
|---------------|--------------------|------------------|--|--|
| Blocks        | (mm <sup>2</sup> ) | (mW)             |  |  |
| Preprocessing | 0.4214             | 2.16             |  |  |
| Stage 1       | 4.6062             | 37.22            |  |  |
| Stage 2       | 2.7495             | 7.21             |  |  |
| Stage 3       | 3.3635             | 30.13            |  |  |
| Whale Chin    | 17.011             | Primary: 74.56   |  |  |
| whole Chip    | 17.011             | Enhanced: 120.42 |  |  |

TABLE V Summarized Results After Low-Power Design on Critical Components

|               |                    | 1       | r         | r         |  |
|---------------|--------------------|---------|-----------|-----------|--|
| Blocks        | Area               | Power   | Area      | Power     |  |
|               | (mm <sup>2</sup> ) | (mW)    | Reduction | Reduction |  |
| Stage 1       | 4.6062             | 37.22   | 2804      | 28 0.04   |  |
|               | 4.4782             | 22.75   | 2.0 /0    | 30.9 /0   |  |
| Stage 2       | 2.7495             | 7.21    | 2 2 0/    | 17 4 9/   |  |
|               | 2.6589             | 5.956   | 3.3 70    | 17.4 70   |  |
| Stage 3       | 3.3635             | 30.13   | 54 1 94   | 76 8 9/   |  |
|               | 1.5423             | 6.978   | 34.1 /0   | /0.8 %    |  |
| Whole<br>Chip | 17.011             | 74.56/  |           | 52.1 %/   |  |
|               | 17.011             | 120.42  | 127%      |           |  |
|               | 14 8422            | 35.684/ | 12.7 /0   | 44.3 %    |  |
|               | 14.0452            | 67.074  |           |           |  |

The difference of power dissipation is considerable and it proves that using two algorithms for cell search in different search modes is very suitable to the WCDMA system.

#### C. Low-Power Design and Results

Critical components of power consumption are observed through a complete power analysis of stages' processes, and then these critical components are re-designed with more consideration on power saving. A low-power design work has been presented in our previous work [18]. The two low-power methods, the pointer-based FIFO buffer used in de-spreaders and the new magnitude approach, are applied to the implementation of the CSE. Table V tabulates low-power implementation

| Low-Power Cell Search Engine |                                  |  |  |  |
|------------------------------|----------------------------------|--|--|--|
| Technology                   | 3.3 V, 0.35 μm CMOS 1P4M process |  |  |  |
| Transistor Count             | 655,342                          |  |  |  |
| Core area                    | 3.8527×3.8526 mm <sup>2</sup>    |  |  |  |
| Power Dissipation            | 35.684/67.074 mW @ 15.36 MHz     |  |  |  |
| Max. speed                   | 50 MHz                           |  |  |  |

TABLE VI HARDWARE CORE PROFILE OF THE CSE

results. Power reductions of stage-1, stage-2 and stage-3 blocks are 38.9%, 17.4%, and 76.8%, respectively. Area reductions of them are only 2.8%, 3.3%, and 54.1%, respectively. The entire power reduction of the CSE achieves up to 52.1% in the primary algorithm and 44.3% in the enhanced one, and its area reduction is total 12.7%.

The hardware profile of the CSE is summarized in Table VI. The core area is  $3.8527 \times 3.8526 \text{ mm}^2$  and transistor count is 655,342. The low-power cell search chip consumes 35.684 mW using the primary algorithm and 67.074 mW using the enhanced algorithm, operating at 15.36 MHz and complying with the WCDMA standards.

#### V. CONCLUSION

In the paper, large-frequency and clock errors during initial cell search are dealt with using the enhanced algorithm, which combines FOC, RSPF, and SPR schemes. In addition, two-bin search in stage 1 is adopted to cope with an inaccuracy of up to 12 ppm according to a series of trade off between performance and complexity. The complete solution for various searches of handsets is proposed with two alternatives: an enhanced one used for initial search and a primary one used for idle-mode and active-mode searches. Their complexities are 521.6 and 252.54 MOPS in the enhanced and the primary algorithms respectively. The hardware implementation is carried out with flexibility for switching algorithms and low-power methods for extending handsets' working time. De-spreaders and noncoherent/coherent combiners of three stages are found as power-critical components and redesigned using pointer-based FIFO buffers and a new magnitude approximation respectively. The power dissipation is then reduced 52.1% and 44.3% to be 35.68 and 67.07 mW for the primary and the enhanced algorithms, respectively, and its core area is reduced by 12.7% to be 14.8432 mm<sup>2</sup> in a 3.3-V 0.35- $\mu$ m CMOS 1P4M technology.

## REFERENCES

- Hompage, International Telecommunication Union, Geneva, Switzerland [Online]. Available: http://www.itu.int
- [2] H. Holma and A. Toskala, WCDMA for UMTS: Radio Access for Third Generation Mobile Communications. New York: Wiley, 2002.
- [3] A. J. Viterbi, CDMA: Principles of Spread Spectrum Communication. Reading, MA: Addison-Wesley, 1995.

- [4] R. L. Peterson, R. E. Ziemer, and D. E. Borth, *Introduction to Spread-Spectrum Communications*. Englewood Cliffs, NJ: Prentice-Hall, 1995.
- [5] Y.-P. E. Wang and T. Ottosson, "Cell search in W-CDMA," *IEEE J. Select. Areas Commun.*, vol. 18, no. 8, pp. 1470–1482, Aug. 2002.
- [6] Hompage, 3rd Generation Partnership Project, Sophia Antipolis, France [Online]. Available: http://www.3gpp.org
- [7] Hompage, 3rd Generation Partnership Project2, Arlington, VA [On-line]. Available: http://www.3gpp2.org
  [8] "Spreading and modulation (FDD)," 3rd Generation Partnership
- [8] "Spreading and modulation (FDD)," 3rd Generation Partnership Project, Sophia Antipolis, France, 3GPP Tech. Spec. TS 25.213, V3.7.0, 2001.
- [9] K. Higuchi, M. Sawahashi, and F. Adachi, "Fast cell search algorithm in inter-cell asynchronous DS-CDMA mobile radio," *IEICE Trans. Commun.*, vol. E81-B, no. 7, pp. 1527–1534, Jul. 1998.
- [10] F. L. Lo and A. U. H. Sheikh, "Fast cell search during PN code phase acquisition using adaptive filters in AWGN environment," in *Proc. IEEE Int. Conf. Globecom*, 1999, pp. 452–456.
- [11] J.-Y. Chun and K.-M. Lee, "An initial cell search scheme robust to frequency error in W-CDMA system," in *Proc. IEEE Int. Symp. Personal, Indoor Mobile Radio Commun.*, Sep. 2000, vol. 2, pp. 1400–1404.
- [12] T. Chulajata, H. M. Kwon, and K. Y. Min, "Coherent slot detection under frequency offset for W-CDMA," in *Proc. IEEE Veh. Technol. Conf.*, 2001, vol. 3, pp. 1719–1723.
- [13] J. Moon and Y.-H. Lee, "Cell search robust to initial frequency offset in WCDMA systems," in *Proc. IEEE Int. Symp. Personal, Indoor and Mobile Radio Commun.*, Sep. 2002, vol. 5, pp. 2039–2043.
- [14] K.-M. Lee and J.-Y. Chun, "An initial cell search scheme robust to frequency error in W-CDMA system," in *Proc. IEEE Int. Symp. Personal*, *Indoor and Mobile Radio Commun.*, Sep. 2000, vol. 2, pp. 1400–1404.
- [15] W.-H. Sheen and J.-S. Ho, "Cell search for 3GPP W-CDMA/FDD with chip clock shift and nonideal sampling," in *Proc. IEEE Veh. Technol. Conf.*, Oct. 2001, vol. 4, pp. 2369–2374.
- [16] M. Kiessling and S. A. Mujtaba, "Performance enhancements to the UMTS (W-CDMA) initial cell search algorithm," in *Proc. IEEE Int. Conf. Commun.*, 2002, vol. 1, pp. 590–594.
- [17] N. Darbel, Y. Rasse, O. Bastidas-Garcia, G. Faux, B. Jubelin, and M. Carrie, "Reconfigurable low power cell search engine for UMTS-FDD mobile terminals," in *Proc. IEEE Workshop Signal Processing Syst.*, 2002, pp. 171–176.
- [18] C.-F Li, Y.-S. Chu, W.-H. Sheen, F.-C. Tian, and J.-S. Ho, "A low-power ASIC design for cell search in W-CDMA system," *IEEE J. Solid-State Circuits*, vol. 39, no. 5, pp. 852–857, May 2004.
- [19] "Generalized hierarchical Golay sequence for PSC with low complexity correlation using pruned efficient Golay correlator," Siemens and Texas Instruments, Dallas, TX, 3GPP TSG RAN WG1 Tdoc 99/554, 1999.
- [20] J. J. Stiffler, "Comma-free error-correcting codes," *IEEE Trans. Inf. Theory*, vol. 1T-11, no. 1, pp. 107–112, Jan. 1965.
- [21] C.-F Li, W.-H. Sheen, C.-R. Wang, and Y.-S. Chu, "A fast multi-speed comma-free Reed–Solomon decoder for W-CDMA applications using foldable systolic array architecture," *IEEE J. Solid-State Circuits*, vol. 38, no. 4, pp. 677–682, Apr. 2003.
- [22] C.-F Li, "Design and Implementation of Pseudo-Noise Code Acquisitions in 3rd Generation WCDMA System," Ph.D. dissertation, National Chung Cheng Univ., Chiayi, Taiwan, R.O.C., 2004.
- [23] "A new hierarchical correlation sequence with good properties in presence of a frequency error," Siemens, New York, 3GPP TSG RAN WG1 Tdoc 99/146, 1999.
- [24] C.-F Li, Y.-S. Chu, and W.-H. Sheen, "An integrated multi-scheme cell search platform for W-CDMA applications," in *Proc. IEEE Int. Symp. Personal, Indoor Mobile Radio Commun.*, Sep. 2003, vol. 2, pp. 1197–1201.



**Chi-Fang Li** (S'04–M'07) received the B.S. degree from the Chinese Culture University, Taipei, Taiwan, R.O.C., in 1997, and the M.S. and the Ph.D. degrees from the National Chung Cheng University, Chiayi, Taiwan, R.O.C., in 1999 and 2004, respectively.

Since 2005, he has been a Research Engineer in the Industrial Technology Research Institute, Hsinchu, Taiwan, R.O.C.. His research interests include signal processing, application-specific integrated circuit design, and system design for wireless communication systems.



Yuan-Sun Chu received the B.S. degree in electrical from Feng-Chia University, Taiwan, R.O.C., in 1978, the M.S. and Ph.D. degrees in electrical engineering from Katholieke Universiteit Leuven, Leuven, Belgium, in 1986 and 1991, respectively.

He has been a professor in the department of electrical engineering, National Chung-Cheng University, Chiayi, Taiwan R.O.C. His research interest includes communication integrated circuit (IC) design, network system-on-a-chip (SoC) design, network security IC design, and low power IC design.



Wern-Ho Sheen (M'91) received the B.S. degree from the National Taiwan University of Science and Technology, Taipei, Taiwan, R.O.C., in 1982, the M.S. degree from the National Chiao Tung University, Hsinchu, Taiwan, R.O.C., in 1984, and the Ph.D. degree from Georgia Institute of Technology, Atlanta, in 1991.

From 1993 to 2001, he was with the National Chung Cheng University, Chiayi, Taiwan, R.O.C, where he held positions as Professor in the Department of Electrical Engineering and Managing

Director of the Center for Telecommunication Research. Since 2001, he has been a Professor in the Department of Communication Engineering, National Chiao Tung University. His research interests include general areas of communication theory, cellular mobile, and personal radio systems, adaptive signal processing for wireless communications, spread-spectrum communications, and VLSI design for wireless communications systems.



**Jan-Shin Ho** received the B.S. degree in communication engineering from the National Chiao Tung University, Hsinchu, Taiwan, R.O.C., in 1993, and the M.S.E.E. and Ph.D. degrees in the electrical engineering from National Chung Cheng University, Chiayi, Taiwan, R.O.C., in 1998 and 2004, respectively.

From 2004 to 2005, he was an Engineer with the Wireless Communication Division, Computer and Communication Laboratory (CCL), Industrial Technology Research Institute (ITRI), Hsinchu,

Taiwan, R.O.C. From 2005 to 2006, he was a Senior Engineer with Chip Development Division, Optical Storage BU, MediaTek Inc., Hsinchu, Taiwan, R.O.C. Since 2006, he has been an Assistant Professor with the Department of Communication Engineering, National Penghu University, Penghu, Taiwan, R.O.C. His research interest is in the inner receiver design of wireless communications, with emphasis on spread spectrum communications and orthogonal multiplexing frequency modulation based systems.