# 國立交通大學

電子工程學系 電子研究所

博士論文

應用於地面及手持數位電視廣播與室內無線接收機之同步設計

Synchronization Design for DVB-T/H and Indoor Wireless Receiver

研究生:魏庭楨

指導教授:周世傑教授

中華民國一百年五月

# 應用於地面及手持數位電視廣播與室內無線接 收機之同步設計

# Synchronization Design for DVB-T/H and Indoor Wireless Receiver

研究生:魏庭楨 Student: Ting Chen Wei

指導教授:周世傑 教授 Advisor: Prof. Shyh-Jye Jou



#### A Dissertation

Submitted to Department of Electronics Engineering and Institute of Electronics

College of Electrical and Computer Engineering

National Chiao Tung University

in partial Fulfillment of the Requirements

for the Degree of

Doctor of Philosophy

in

**Electronics Engineering** 

May 2011

Hsinchu, Taiwan, Republic of China

中華民國 一百年 五月

# 應用於地面及手持數位電視廣播與室內無線接收機之同步設計

研究生:魏庭楨 指導教授:周世傑 教授

國立交通大學

電子工程學系 電子研究所博士班

#### 摘要

在此篇論文中,提出應用於手持及地面數位電視廣播與室內無線 60GHz 規格之同步設計。本論文探索基頻數位信號處理演算法和架構,以達成所要求之系統規格。此外,整合與實現低功率及有效率的資料路徑,以驗證所提出之基頻設計。為了完成同步設計,本論文採用並修改數個廣泛使用的資料路徑,像是移動總合架構、單埠記憶體式延遲線、差值編碼、迴路濾波器、複數乘法器、座標旋轉數位運算演算法與直流錯誤移除方法來有效率地實現同步之硬體。

提出之手持及地面數位電視廣播基頻接收機包含,模式、符元及保護區間偵測、載波頻率與取樣時脈同步、快速兩階段之散佈領航碼同步與通道估測。保護區間偵測採用無除法之相關方法。載波同步與取樣時脈同步使用記憶體分享架構。快速兩階段之散佈領航碼同步方法加速領航碼位置之偵測。為了增加記憶體使用率,通道估測重新利用符元偵測之記憶體。此外,差值編碼減少記錄領航碼位置的儲存器之使用量。相位預測方法減少相位累加器之操作次數。系統模擬結果顯示此接收機可達位元錯誤率之要求。最後,此接收機之晶片使用 0.18µm 互補式金氧半導體技術製造和驗證,其核心面積為 12.96 mm²。

對於應用於 60GHz 之室內無線規格,此篇論文提出用於正交分頻多工與單載 波接收機之雙模架構。標頭與符元偵測、載波頻率與取樣時脈同步與部分的通道 估測共用於正交分頻多工模式與單載波模式,以減少硬體複雜度。此外,提出一個應用於取樣頻率誤差補償器之平行化架構。此平行取樣頻率誤差補償器解決內 插器之不規則存取並加速處理速度。合成結果顯示,在使用 90nm 互補式金氧半 導體技術下,此架構可操作於 400 MHz 且在八倍平行化下可達到 3.2Gs/s,而其等效邏輯閘數約為 204K。



Synchronization Design for DVB-T/H and

**Indoor Wireless Receiver** 

Student: Ting Chen Wei Advisor: Prof. Shyh-Jye Jou

Department of Electronics Engineering

Institute of Electronics

National Chiao Tung University

**Abstract** 

In this thesis, synchronization designs for DVB-T/H and indoor wireless 60GHz

41111111

standards are present. Baseband digital signal processing algorithms and architectures

are explored to achieve the required system specifications. Moreover, low power and

area efficient data-paths are integrated and implemented to verify the proposed

baseband designs. To accomplish synchronization designs, several widely used

data-paths, such as the moving sum architecture, the single port based delay line, the

differential encoding scheme, the loop filter, the complex multiplier, CORDIC

algorithm and the removing DC error scheme are adopted and modified to implement

the hardware of synchronizations efficiently.

The proposed DVB-T/H baseband receiver contains a mode/symbol/guard

interval detection, a carrier frequency and sampling clock synchronization, a

two-stages fast scattered pilot synchronization, and a channel estimation. The guard

interval detection adopts a division free correlation method. The carrier

synchronization and sampling clock synchronization uses a memory sharing

architecture. A two-stages fast scatted pilot synchronization method increases the

V

speed of the detection of pilot location. To increase memory utility, the channel estimation reuses the memory of the symbol detection. Besides, the differential encoding scheme reduces the storage requirement of recording pilot location. The phase predictive scheme reduces the operations of phase accumulators. The system simulation results show this receiver can achieve BER requirement. Finally, the chip of this receiver was fabricated and verified in a 0.18µm CMOS technology and its core size is 12.96 mm<sup>2</sup>.

For indoor wireless 60GHz standards, this thesis presents a dual mode architecture of the OFDM/single carrier mode receiver. The preamble/symbol detection, the carrier and sampling clock synchronization, and parts of channel estimation are shared in OFDM mode and SC mode to reduce hardware complexity. Besides, a parallel architecture for a sampling clock offset compensator is proposed. The parallel sampling clock offset compensator solves the irregular access form interpolators and increases the speed of processing. The synthesis result shows that it can operate at 400 MHz and achieve 3.2 Giga samples per second with a 8X parallelization with about 204 K equivalent gate counts by using 90nm CMOS process.

### 誌 謝

首先要感謝的是指導教授周世傑博士,教授提供良好的研究環境,並且適時 地給予建議和協助解決研究上的問題,使此篇論文能順利完成。然後要感謝口試 委員,陳紹基博士、吳安宇博士、劉志尉博士、許騰尹博士與薛木添博士,委員 們在口試時給予的建議與指教,讓這本論文更加的完善。此外要感謝在數位電視 廣播接收機計畫與室內無線接收機計畫中,參與教授們與同學們所提供的幫忙與 協助,並感謝緒祥同學所提拱的快速傅立葉轉換模組,讓晶片實作得以完成。接 下來要感謝的是實驗室的同學們,在做研究時互相的砥礪,並感謝瑋昌學弟與琪 耀學弟在研究上的幫助。最後,要感謝在背後支持我的家人們。



## **Contents**

| Chapter 1 Introduction and Motivation                                    |
|--------------------------------------------------------------------------|
| 1.1 Motivation1                                                          |
| 1.2 Thesis Organization                                                  |
| Chapter 2 Overview of Channel Model and the Effects of Frequency Offset4 |
| 2.1 Channel Model4                                                       |
| 2.2 Effects of Frequency Offset                                          |
| 2.2.1 Effects of Sampling Clock Offset6                                  |
| 2.2.2 Effects of Carrier Frequency Offset9                               |
| 2.3 AWGN12                                                               |
| 2.4 Link Budget                                                          |
| 2.5 Summary 1896                                                         |
| Chapter 3 Data-path in a Baseband Receiver                               |
| 3.1 Moving Sum Architecture                                              |
| 3.2 Delay Line                                                           |
| 3.3 Differential Encoding [13] [26] [27]21                               |
| 3.4 Complex Multiplier                                                   |
| 3.5 Loop Filter                                                          |
| 3.6 CORDIC                                                               |
| 3.7 Removing DC error                                                    |
| 3.8 Fast Fourier Transform                                               |
| 3.9 Summary                                                              |
| Chapter 4 OFDM Baseband Receiver for DVB-T/H33                           |

| 4.1 Introduction of DVB-TH                                  | 33 |
|-------------------------------------------------------------|----|
| 4.2 Baseband Receiver Architecture                          | 35 |
| 4.3 GI Detection [10] [11] [12]                             | 36 |
| 4.4 CFO and SCO synchronization                             | 40 |
| 4.5 Scattered Pilots Synchronization [11] [13] [14]         | 48 |
| 4.6 Equalizer [10] [11]                                     | 51 |
| 4.7 Hardware Implementation                                 | 54 |
| 4.8 Summary                                                 | 60 |
| Chapter 5 SC-FDE/OFDM Receiver for 60 GHz                   | 62 |
| 5.1 Introduction of standards for 60GHz                     | 62 |
| 5.2 Consideration of Dual Mode Architecture Design          |    |
| 5.3 CFO/SCO Synchronization                                 | 69 |
| 5.4 Behavior Simulation of OFDM (HSI) mode                  | 70 |
| 5.5 Simulation of SCO Compensation                          | 72 |
| 5.6 Proposed Parallelized SCO Compensator                   | 77 |
| 5.6.1 Time Interpolation                                    | 77 |
| 5.6.2 Parallel Elastic Buffer                               | 79 |
| 5.6.3 Parallel NCO                                          | 81 |
| 5.6.4 Proposed Parallelized Time Interpolation Architecture | 82 |
| 5.6.5 Parallelized Frequency Rotation                       | 83 |
| 5.6.6 Implementation Results of SCO Compensation            | 84 |
| 5.7 Summary                                                 | 87 |
| Chapter 6 Conclusion                                        | 89 |
| Chapter 7 Future Work                                       | 91 |
| Reference                                                   | 92 |
| 作者簡歷                                                        | 98 |

## List of Tables

| TABLE 1-1 Characteristic of DVB-T/H and 60GHz standards                       |  |  |  |  |
|-------------------------------------------------------------------------------|--|--|--|--|
| TABLE 3-1 Comparisons of hardware complexity2                                 |  |  |  |  |
| TABLE 3-2 Comparisons of different implementations of delay line (8K >        |  |  |  |  |
| 12) [10] [11] [12]                                                            |  |  |  |  |
| TABLE 3-3 Comparisons of different implementations of memory ( $64 \times 64$ |  |  |  |  |
|                                                                               |  |  |  |  |
| TABLE 3-4 Comparisons of different complex multiplier                         |  |  |  |  |
| TABLE 3-5 Comparisons of implementations of CORDIC                            |  |  |  |  |
| TABLE 3-6 Synthesis comparisons of an interpolation with and without the      |  |  |  |  |
| removing DC30                                                                 |  |  |  |  |
| TABLE 3-7 General comparison of FFT architectures                             |  |  |  |  |
| TABLE 4-1 Specification of DVB-T/H [1] [2] 3-                                 |  |  |  |  |
| TABLE 4-2 Comparison of memory usage of ICFO, RCO and SCO42                   |  |  |  |  |
| TABLE 4-3 Hardware complexity of PB and CB Algorithm50                        |  |  |  |  |
| TABLE 4-4 Required SNR for different Modes, channels and modulation           |  |  |  |  |
| 57                                                                            |  |  |  |  |
| TABLE 4-5 Synthesis results                                                   |  |  |  |  |
| TABLE 4-6 Comparison between previous reported DVB-T/H [51] receive           |  |  |  |  |
| with this work60                                                              |  |  |  |  |
| TABLE 5-1 Comparison of 802.15.3c [3] and 802.11ad [4]                        |  |  |  |  |
| TABLE 5-2 Comparison of OFDM and SC                                           |  |  |  |  |
| TABLE 5-3 Comparison of Boundary Detection                                    |  |  |  |  |

| TABLE 5-4 Comparison of CFO Estimations                     | 67             |
|-------------------------------------------------------------|----------------|
| TABLE 5-5 Synthesis results of an Cubic-Spline interpolator | r of different |
| process                                                     | 78             |
| TABLE 5-6 Synthesis results                                 | 87             |



## List of Figures

| Fig. 2-1 DVB-T/H channel models provided by [1] [2]5                               |
|------------------------------------------------------------------------------------|
| Fig. 2-2 802.15.3c channel models provided by [18] and only print the              |
| first 100 samples6                                                                 |
| Fig. 2-3 Eye diagram accumulated 50000 samples (12X oversample,                    |
| raised cosine pulse shaping, and no AWGN) in a SC system (a) without               |
| SCO effect (b) with 50ppm SCO7                                                     |
| Fig. 2-4 Constellation rotation and dispersion (QPSK and no AWGN)                  |
| caused by SCO in a OFDM system (a) with 50ppm SCO (b) with                         |
| 500ppm SCO8                                                                        |
| Fig. 2-5 Two methods of adding SCO (a) resample based method and (b)               |
| fractional delay filter based method9                                              |
| Fig. 2-6 QPSK constellation rotation caused by CFO (a) a SC data block             |
| in time domain (b) an OFDM symbol in the frequency domain11                        |
| Fig. 2-7 Illustration of SNR and $E_s/N_0$ in over-sampling ratio equal to 4 ( $U$ |
| = 4)                                                                               |
| Fig. 2-8 Power spectrum density after passing a low-pass filter or a               |
| decimation filter14                                                                |
| Fig. 2-9 Comparison of SNR and $E_b/N_0$ in an oversampling system14               |
| Fig. 2-10 Link budget of a receiver                                                |
| Fig. 2-11 LDPC performance for MCS1, 2, 3 [23]                                     |
| Fig. 3-1 Moving sum architecture [25]20                                            |
| Fig. 3-2 Single port memory based delay line [26]21                                |

| Fig. 3-3 Differential encoding of continual pilot positions of DVB-T/H [13]                      |
|--------------------------------------------------------------------------------------------------|
| [26] [27]                                                                                        |
| Fig. 3-4 Architectures of complex multiplier (a) 4 'x' and 2 '+' (b) 3 'x' and                   |
| 5 '+' [28] [29]23                                                                                |
| Fig. 3-5 Architecture of a loop filter [25] [30] on the CFO loop24                               |
| Fig. 3-6 Tracking curves of different filter coefficients for CFO loop in a                      |
| 802.15.3c baseband receiver                                                                      |
| Fig. 3-7 Different combinations of $K_c$ and $K_d$ (a) $K_d = 1$ , (b) $K_d = 4$ , (c) $K_d = 4$ |
| $8$ , (d) $K_d = 16$ , (e) $K_d = 32$ , (f) $K_d = 64$ ,                                         |
| Fig. 3-8 Implementation of CORDIC (a) folding architecture (b) unfolding                         |
| architecture [34] [35]27                                                                         |
| Fig. 3-9 Hardware implementation of Multiply-Add (a) original (b)                                |
| removing DC error version                                                                        |
| Fig. 3-10 Simulation of removing DC error (a) original (b) removing DC                           |
| error version                                                                                    |
| Fig. 3-11 Output order of FFT (a) normal order (b) reversed order30                              |
| Fig. 3-12 2K/4K/8K FFT [52] with single path delay feedback (SDF) [38]                           |
| [39] [54] and Radix-2/4/8 [53]31                                                                 |
| Fig. 4-1 The DVB-T/H transmitter block diagram [1] [2]34                                         |
| Fig. 4-2 The DVB-T/H receiver architecture36                                                     |
| Fig. 4-3 GI detection error rate vs. threshold under 8K transmission mode,                       |
| AWGN level = 5dB and Rayleigh channel [1] [2]39                                                  |
| Fig. 4-4 GI detection error rate vs. threshold under 2K transmission mode,                       |
| AWGN level = 5dB and Rayleigh channel [1] [2]39                                                  |
| Fig. 4-5 Memory sharing architecture for ICFO, residual CFO (RCFO) and                           |
| SCO estimation41                                                                                 |

| Fig. 4-6 Architecture of ten stages unfolding CORDIC [34] [35]43            |
|-----------------------------------------------------------------------------|
| Fig. 4-7 Operation of the interpolation controller (a) normal operation (b) |
| skipped operation and (c) duplicated operation45                            |
| Fig. 4-8 Modified Fallow structure for cubic Lagrange interpolator [9][47]  |
| 46                                                                          |
| Fig. 4-9 Phase prediction of phase accumulator                              |
| Fig. 4-10 RCFO/SCO RTL tracking curve @ 8K mode, SNR = 20dB, 64             |
| QAM, Rayleigh channel, 200ppm SCO and 0.05 sub-carrier spacing              |
| RCFO                                                                        |
| Fig. 4-11 Output SNR of different SCOs @ RCFO = 0.05 subcarrier spacing     |
| 8K/2K mode, 64QAM and AWGN/Rayliegh channel48                               |
| Fig. 4-12 Performance of the two stages and three stages SPS scheme51       |
| Fig. 4-13 Pilot arrangement of DVB [1] and 2-D predictive channel           |
| estimation [50]52                                                           |
| Fig. 4-14 Channel estimation architecture modified from [50] [51]54         |
| Fig. 4-15 Ricean (F1) and Rayleigh (P1) channel [1] [2]                     |
| Fig. 4-16 Simulated RTL BER performance after soft Viterbi decoder at 2K    |
| mode, 1/4 GI56                                                              |
| Fig. 4-17 Simulated RTL BER performance after soft Viterbi decoder at 8K    |
| mode, 1/4 GI56                                                              |
| Fig. 4-18 Testing architecture                                              |
| Fig. 4-19 Measured Shmoo plots (frequency vs. supply voltage) (a) 2K        |
| mode, 1/4 GI and (b) 8K mode, 1/4 GI (the axis are redrawn due to the       |
| unclear of the original picture)59                                          |
| Fig. 4-20 Die photo of the proposed DVB-T/H baseband receiver IC59          |
| Fig. 5-1 OFDM/SC dual modes receiver for 802.15.3c69                        |

| Fig. 5-2 CFO tracking curve @ OFDM mode, SNR = 15dB and 50ppm                 |
|-------------------------------------------------------------------------------|
| CFO70                                                                         |
| Fig. 5-3 Channel model [18], RMS delay = 3.2ns71                              |
| Fig. 5-4 Receiver Performance@ OFDM mode, QPSK, 50ppm CFO,                    |
| 50ppm SCO and Code rate =1/2 (BER is calculated at steady state               |
| (after 96 OFDM symbols))71                                                    |
| Fig. 5-5 Simulation model of SCO compensation73                               |
| Fig. 5-6 Simulation results of using different methods, filters, and sampling |
| rates for OFDM mode                                                           |
| Fig. 5-7 Simulation results of different sampling rates and methods for SC    |
| mode                                                                          |
| Fig. 5-8 Simulation results of different ADC bits (QPSK, OFDM mode), $T'$     |
| means time interpolation method, 'F' means frequency rotation method,         |
| 'No' means no quantization, '10_8' means that ADC is 10 bits and              |
| fractional part is 8 bits and others are by analogy76                         |
| Fig. 5-9 Simulation results of different ADC bits (64QAM, OFDM mode),         |
| T' means time interpolation method, 'F' means frequency rotation              |
| method, 'No' means no quantization, '10_8' means that ADC is 10 bits          |
| and fractional part is 8 bits and others are by analogy77                     |
| Fig. 5-10 Illustration of irregular access in parallel (a) normal access (b)  |
| access of successive duplication (c) access of successive skip79              |
| Fig. 5-11 Illustration of parallel elastic buffer access                      |
| Fig. 5-12 Architecture of parallelized NCO and phase prediction81             |
| Fig. 5-13 Proposed parallelized time interpolation82                          |
| Fig. 5-14 Block diagram the hardware implementation85                         |
| Fig. 5-15 Performance Comparison 86                                           |



## Chapter 1

### **Introduction and Motivation**

#### 1.1 Motivation

Orthogonal frequency division multiplexing (OFDM) is widely used in the modern digital communication. The orthogonal sub-carriers of OFDM provide high spectrum efficiency and can achieve high data rate requirement. Digital broadcasting-terrestrial/handheld (DVB-T/H) [1] [2] is released by European Telecommunications Standards Institute (ETSI) to replace the tradition analog TV broadcasting. DVB-T/H standard adopts OFDM to transmit high quality audio/video. In addition, for high-bandwidth wireless applications, the adaption of the 9 GHz bandwidth between 57 GHz and 66 GHz are very popular. Several standards such as 802.15.3c [3] and 802.11ad [4] are proposed to achieve Multi-gigabit per second (Gbps) transmission in the indoor environment. Both OFDM and single carrier (SC) modulation schemes are used in 802.15.3c and 802.11ad.

However, OFDM modulation scheme is sensitive to synchronization. The frequency offset destroys the orthogonal of OFDM and induces inter carrier interference (ICI). In a receiver of the OFDM system, Synchronizations include the symbol synchronization, the carrier frequency synchronization, and the sampling clock synchronization. The goal of the symbol synchronization is to find the location or the beginning of a symbol. A wrong location will destroy the demodulation in receiver. The carrier frequency synchronization compensates the mismatch among the mixers of a transmitter and a receiver (i.e. carrier frequency offset (CFO)). CFO

causes rotations of the constellation and induces ICI in the frequency domain. The sampling clock synchronization compensates the mismatch among the DAC of a transmitter and the ADC of a reviver (i.e. sampling clock offset (SCO)). SCO induces rotations of the constellation and ICI and also causes receiver get more or less samples. The OFDM symbol synchronization [5] [6] usually uses the repeated signals in the cyclic prefix. CFO and SCO can be estimated in the frequency domain [5] [6] [7]. With the improvement of the digital signals processing, the compensation of SCO is translated into the digital domain [8] [9] to relax the specifications of the analog device.

This thesis introduces the source of the frequency offset and the effects of the frequency offset. With the knowledge, it is more clearly to understand the meaning of the different algorithms. Then, several data-paths are surveyed. When designing a baseband receiver, these data-paths can help to translate mathematics of algorithms into hardware and to make data be operated smoothly. Finally, this thesis takes DVB-T/H and 802.15.3c/802.11ad as examples to design baseband receiver architectures and focuses on synchronization design. The characteristic of these standards are shown in TABLE 1-1. The length of DVB-T/H is 8192; hence, it is a critical point to design a baseband receiver. Goals of DVB-T/H receiver are to adopt algorithms with less matrix operations and share resources. On the other hand, the characteristic of 60 GHz standards is the high speed requirement. Therefore, adopting algorithms with parallel ability and pipelining are design considerations.

The proposed DVB-T/H baseband receiver includes a division free symbol/mode/guard interval (GI) detection [10] [11] [12], a carrier and sampling clock synchronization [5] [6] [7] and a two-stage fast scatter pilot synchronization [13]

[14]. Besides, several novel schemes are adopted to reduce the hardware complexity. In the example of 802.15.3c/802.11ad baseband receiver, this thesis presents an architecture of the OFDM/single carrier (SC) dual mode receiver. OFDM mode and SC mode share the resource to reduce the hardware complexity. In addition, we concentrate on the SCO compensation and discuss the advantages and disadvantages among the time interpolation method [8] [9] and the frequency rotation method [15] [16]. To meet the high data rate requirement of 802.15.3c, a parallel architecture is proposed to solve the irregular access from interpolators.

TABLE 1-1 Characteristic of DVB-T/H and 60GHz standards

|             | DVB-T/H                  | 802.15.3c/802.11ad(OFDM)       |
|-------------|--------------------------|--------------------------------|
| Sample Rate | 9.14 MHz                 | 2.64 GHz ( <b>High Speed</b> ) |
| Data Rate   | 4.98 ~ 31.67 Mbits       | 0.032 ~5.775 Gbits             |
| OFDM Length | 2K, 4K, 8K (Long Length) | 512                            |

1896

#### 1.2 Thesis Organization

The organization of this thesis is as follows. Chapter 2 introduces the effects of the frequency offsets and discusses how to speculate a reasonable performance loss of a baseband receiver. Chapter 3 shows several well-known and widely used data-paths of a baseband receiver. Chapter 4 is a design example of a DVB-T/H receiver. Chapter 5 shows the architecture of an OFDM/single carrier dual mode receiver for 802.15.3c. In addition, a novel parallel SCO compensation architecture is presented. Finally, Chapter 6 and Chapter 7 are conclusion of this thesis and future work.

## Chapter 2

### Overview of Channel Model and the

## **Effects of Frequency Offset**

Several non-ideal effects during a transmission are introduced, such as, the frequency offset, the channel models and AWGN noise. In addition, this chapter shows how to estimate link budget and derive the reasonable implementation loss of the radio frequency, the baseband demodulation, and the channel coding. Furthermore, this chapter takes 802.15.3c standard as an example to calculate the link budget of the each part of a receiver.

#### 2.1 Channel Model

1896

The baseband channel usually uses the tape delay model [17]. A path in a multi-path channel is characterized by the path delay, the path gain, and the angle of the path. Besides, when considering a mobile environment, the fading gain of a path is required. Fig. 2-1 shows the channel models of a terrestrial environment specified by DVB-T/H [1] [2]. DVB-T/H provides two kinds of channel. One is Rician channel and the other is Rayleigh channel. Rician channel has a strong path; on the contrary, Rayleigh channel does not have a strong path. Besides, Rician channel has a smaller delay spread. Hence, Rician channel has a flatter spectrum and provides a better transmitting environment.



Fig. 2-1 DVB-T/H channel models provided by [1] [2]

The channel model of the indoor environment shown in Fig. 2-2 are two models provided by 802.15.3c working group [18]. The channel model specified in [18] is a program which can dynamically generate channel models and has several conditions to set up. Fig. 2-2 shows two different channels. One is line of slight (LOS) channel and the other is Non-LOS (NLOS) channel. These two channels can respectively correspond to Rician channel and Rayleigh channel of DVB-T/H.



Fig. 2-2 802.15.3c channel models provided by [18] and only print the first 100 samples

## 2.2 Effects of Frequency Offset

In a wireless transmission, the information of the reference clock which generates the radio frequency and the clock of ADC/DAC is usually not transmitted. The transmitter and the receiver have their individual reference clock. Hence, there is always a mismatch among them due to process, voltage and temperature. This seccsion introduces the effect of frequency offset.

#### 2.2.1 Effects of Sampling Clock Offset

The sampling clock offset (SCO) is caused by the clock frequency difference between the transmitter DAC and the receiver ADC. For a SC system, SCO narrows down the eye of the eye diagram as shown in Fig. 2-3. Fig. 2-3(a) is an eye diagram



without SCO and Fig. 2-3(b) is that with 50 ppm SCO.

Fig. 2-3 Eye diagram accumulated 50000 samples (12X oversample, raised cosine pulse shaping, and no AWGN) in a SC system (a) without SCO effect (b) with 50ppm SCO

For an OFDM system, SCO causes the rotation of the constellation be proportional with the subcarrier index and adds ICI as shown in Eqn. (2-1).

$$Signal_{BB@RX} = \sum_{n=0}^{N-1} x(n(1-\delta) - \phi) \cdot \exp(\frac{j2\pi nk}{N}) \quad n = 1 \sim N$$

$$= \frac{1}{N} \cdot \left\{ \sum_{n=0}^{N-1} \sum_{p=0}^{N-1} X(p) \cdot \exp(\frac{-j2\pi(n(1-\delta) - \phi)p}{N}) \cdot \exp(\frac{j2\pi nk}{N}) \right\}$$

$$= \frac{1}{N} \cdot \left\{ \sum_{n=0}^{N-1} \left\{ \sum_{n=0}^{N-1} X(p) \cdot \exp(\frac{j2\pi(n\delta + \phi)p}{N}) \cdot \exp(\frac{j2\pi(n\delta + \phi)p}{N}) \cdot \exp(\frac{j2\pi nk}{N}) \right\} \right\}$$

$$= \frac{1}{N} \cdot X(p) \cdot \exp(\frac{j2\pi \phi p}{N}) \cdot \sum_{n=0}^{N-1} \exp(\frac{j2\pi n\delta p}{N}) + ICI$$

$$= \frac{1}{N} \cdot \frac{\sin(\pi \delta p)}{\sin(\frac{\pi \delta p}{N})} \cdot X(p) \cdot \exp(\frac{j2\pi \phi p}{N}) \cdot \exp(\frac{-j\pi \delta p(N-1)}{N}) + ICI$$

$$= \frac{1}{N} \cdot \frac{\sin(\pi \delta p)}{\sin(\frac{\pi \delta p}{N})} \cdot X(p) \cdot \exp(\frac{j2\pi \phi p}{N}) \cdot \exp(\frac{-j\pi \delta p(N-1)}{N}) + ICI$$

where  $Signal_{BB@RX}$  is the received baseband signal in frequency domain,  $\delta$  is SCO normalized by the sampling frequency, $\psi$  is the sampling phase offset, x(n) is the received signals in the time domain, X(p) is the transmitted signals in the frequency domain, N is the length of an OFDM and p is the sub carrier index. Fig. 2-4 shows the simulation of 50 ppm and 500 ppm SCO without AWGN. The constellation of 500

ppm SCO disperses more than that of 50ppm. This is because higher SCO has higher ICI. Besides, after a long period of the transmission, the number of the received sample will be less or more than the transmitted one due to SCO.



Fig. 2-4 Constellation rotation and dispersion (QPSK and no AWGN) caused by SCO in a OFDM system (a) with 50ppm SCO (b) with 500ppm SCO

There are two ways to model the effect of SCO. One is the resample based method and the other one is the fractional delay filter based method. The illustrations of these two methods are shown in Fig. 2-5. The resample based method requires the irreducible fractional of the added SCO. For example, if the added SCO is 50ppm, the irreducible fractional is 20001/20000. Then, the input signals are up-sampled by 20001 and are filtered by a low pass filter. Finally, the filtered signals are down-sampled by 20000 and a signal with 50ppm SCO is generated. The fractional delay filter method is like the time domain interpolation method of the SCO compensation. This method uses a fractional delay filter to insert SCO.



(a) Resample based method



(b) Fractional delay filter based method

Fig. 2-5 Two methods of adding SCO (a) resample based method and (b) fractional delay filter based method

## 2.2.2 Effects of Carrier Frequency Offset

The carrier frequency offset (CFO) is caused by the mismatch between the transmitter mixer and the receiver mixer. Eqn.(2-2) shows how the signal of baseband ( $Signal_{BB}$ ) is carried into the radio frequency ( $Signal_{RF}$ ).

$$Signal_{RF} = \text{Re}\left(Signal_{BB@TX} \cdot \exp(j \cdot 2\pi f_c t)\right)$$

$$= I \cdot \cos(2\pi f_c t) - Q \cdot \sin(2\pi f_c t)$$
(2-2)

where 'Re' is the real part of a signal,  $f_c$  is the carrier frequency. Eqn.(2-3) shows how the in-phase part of baseband in the receiver is generated.

$$\begin{split} I_{Signal_{B\Theta RX}} &= LP[(I \cdot \cos(2\pi f_c t) - Q \cdot \sin(2\pi f_c t)) \cdot \cos(2\pi (f_c + \Delta f)t)] \\ &= LP[I \cdot \cos(2\pi f_c t) \cdot \cos(2\pi (f_c + \Delta f)) \\ &- Q \cdot \sin(2\pi f_c t) \cdot \cos(2\pi (f_c + \Delta f)t)] \\ &= LP[I \cdot \{\cos(-2\pi \Delta f t) + \cos(2\pi (2f_c + \Delta f))\} \\ &- Q \cdot \{\sin(-2\pi \Delta f t) + \sin(2\pi (2f_c + \Delta f))\}] \\ &= I \cdot \cos(-2\pi \Delta f t) - Q \cdot \sin(-2\pi \Delta f t) \end{split}$$

Where LP is a operation of a low pass filter and  $\Delta f$  is the CFO. By the same way, the quadrature-phase part is shown in Eqn. (2-4).

$$Q_{Signal_{BB \oplus PY}} = I \cdot \sin(-2\pi\Delta ft) + Q \cdot \cos(-2\pi\Delta ft)$$
 (2-4)

By combining Eqn.(2-3) and Eqn. (2-4), we can get the effect from CFO shown in Eqn.(2-5). This effect causes a rotated constellation in time domain

$$\begin{aligned} Signal_{BB@RX} &= I_{Signal_{BB@RX}} + j \cdot Q_{Signal_{BB@RX}} \\ &= (I + j \cdot Q) \cdot (\cos(-2\pi\Delta f t) + j \cdot \sin(-2\pi\Delta f t) \\ &= Signal_{BB@TX} \cdot \exp(-j2\pi\Delta f t) \end{aligned} \tag{2-5}$$

For the digital simulation, a normalized form is shown in Eqn.(2-6).

$$Signal_{BB@RX} = Signal_{BB@TX} \cdot \exp(-j2\pi\Delta ft)$$

$$= Signal_{BB@TX} \cdot \exp(-j2\pi\Delta fnT_s)$$

$$= Signal_{BB@TX} \cdot \exp(-j2\pi(\frac{\Delta f}{f_s})n)$$

$$= Signal_{BB@TX} \cdot \exp(-j2\pi(\Delta f_d)n)$$
(2-6)

Where  $\Delta f_d$  is the digital CFO normalized by the sampling frequency.

Then, we continue to drive the effect of CFO in frequency domain. An OFDM signal of the baseband is shown in Eqn. (2-7).

$$Signal_{BB@TX} = x(n) = \frac{1}{N} \sum_{n=0}^{N-1} X(p) \cdot \exp(\frac{-j2\pi np}{N}) \qquad n = 1 \sim N$$
 (2-7)

Where X(p) is the transmitted signal. Then, by adding CFO, we can get Eqn.(2-8).

$$Signal_{BR@RX} = x(n) \cdot \exp(-j2\pi \cdot \Delta f_d \cdot n) \qquad n = 1 \sim N$$
 (2-8)

The received X(p) is shown in Eqn.(2-9):

$$\begin{aligned} &Signal_{BB \circledast RX} = \sum_{n=0}^{N-1} x(n) \cdot \exp(-j2\pi \cdot (\Delta f_d \cdot n + \phi)) \exp(\frac{j2\pi nk}{N}) \quad n = 1 \sim N \\ &= \frac{1}{N} \cdot \left\{ \sum_{n=0}^{N-1} \sum_{p=0}^{N-1} X(p) \cdot \exp(\frac{-j2\pi p}{N}) \cdot \exp(-j2\pi \cdot (\Delta f_d \cdot n + \phi)) \exp(\frac{j2\pi nk}{N}) \right\} \\ &= \frac{1}{N} \cdot \left\{ \sum_{n=0}^{N-1} \left\{ \sum_{n=0}^{N-1} X(p) \cdot \exp(\frac{-j2\pi p}{N}) \cdot \exp(-j2\pi \cdot (\Delta f_d \cdot n + \phi)) \exp(\frac{j2\pi nk}{N}) \right\} \right\} \\ &= \frac{1}{N} \cdot X(p) \cdot \exp(j2\pi \phi) \cdot \sum_{n=0}^{N-1} \exp(-2\pi \cdot \Delta f_d \cdot n) + ICI \\ &= \frac{1}{N} \cdot \frac{\sin(\pi \Delta f_d N)}{\sin(\pi \Delta f_d)} \cdot X(p) \cdot \underbrace{\exp(j2\pi \phi) \cdot \exp(-j\pi \cdot \Delta f_d \cdot (N - 1))}_{common \ phase \ rotation} + ICI \end{aligned}$$

This effect of CFO in the frequency domain not only causes the constellation ratate but also adds ICI. A simulation of the effect of CFO is shown in Fig. 2-6. Fig. 2-6(a) is a result of a SC data block in the time domain and Fig. 2-6(b) is a result of OFDM system in frequency domain. The CFO effect between them is very different. In SC, each symbol suffers different phase rotation which is proportional to time index and the constellation becomes a cycle (QPSK). However, In OFDM, each subcarrier has the same phase rotation and ICI is also shown.



Fig. 2-6 QPSK constellation rotation caused by CFO (a) a SC data block in time domain (b) an OFDM symbol in the frequency domain

#### **2.3 AWGN**

In a transmission, the thermal noise is the source of noise and its noise power spectral density  $(N_0)$  is defined by Eqn.(2-10).

$$N_0 = kT$$
  
= 1.38×10<sup>-23</sup> J/K×298 K (@25°C)  
= 1.38×10<sup>-23</sup> ×1×10<sup>3</sup> mW/K×298 K  
= -173.86 dBm / Hz  
= -113.86 dBm / MHz

Where k is Stefan–Boltzmann constant and T is temperature in Kelvin scale. The thermal noise has almost flat power spectral density in frequency spectrum. Usually, additive white Gaussian noise (AWGN) is a model for the thermal noise. Signal to noise ratio (SNR) measures the quality of the received signal. For digital modulation,  $E_b/N_0$  is usually adopted.  $E_b$  is the average energy per information bit. Besides,  $E_b/N_0$  is a normalization of SNR by spectral effectual (bit/s/Hz). The translation between SNR and  $E_b/N_0$  is shown in Eqn.(2-11) [20]:

$$SNR = \frac{S}{N} = \frac{E_{s} \cdot f_{symbol}}{N_{0} \cdot B}$$

$$\frac{E_{s}}{N_{0}} = \frac{E_{b}}{N_{0}} \cdot \log_{2}(M) \cdot C_{rate}$$

$$\Rightarrow SNR = \frac{E_{b}}{N_{0}} \cdot \log_{2}(M) \cdot C_{rate} \cdot \frac{f_{symbol}}{B}$$
(2-11)

Where  $f_{symbol}$  is the symbol rate,  $E_s$  is the energy per symbol, B is the noise bandwidth, M is the M-ary modulation and  $C_{rate}$  is the coding rate. Another special condition, an over sampling system, must be considered and are shown in Eqn.(2-12).

$$SNR_{oversample} = \frac{\mathbf{E}_{s} \cdot \mathbf{f}_{symobl}}{\mathbf{N}_{0} \cdot \mathbf{B}} = \frac{\mathbf{E}_{s} \cdot \mathbf{f}_{symobl}}{\mathbf{N}_{0} \cdot \mathbf{f}_{sample}} = \frac{\mathbf{E}_{s} \cdot \mathbf{f}_{symobl}}{\mathbf{N}_{0} \cdot \mathbf{U} \cdot \mathbf{f}_{sample}}$$

$$\Rightarrow SNR_{oversample} = \frac{\mathbf{E}_{s}}{\mathbf{N}_{0}} \cdot \frac{1}{\mathbf{U}}$$
(2-12)

Where  $f_{sample}$  is the sampling rate and U is the oversampling ratio. According to Eqn. (2-12), for an over-sampling system,  $E_s/N_0$  is U times as much as SNR<sub>oversample</sub>. An example in Fig. 2-7 illustrates this condition. In this example, when adding 10dB noise power into a 4X-sampling system with signal power equal to 0 dB,  $E_s/N_0$  is 16 dB and SNR<sub>oversample</sub> is 10 dB. The difference between  $E_s/N_0$  and SNR<sub>oversample</sub> is 6dB, equal to  $10 \times \log(4)$ . However, there is usually a low-pass filter or a decimation filter in a transmission system. Fig. 2-8 shows the power spectrum density after passing a filter. Both  $E_s/N_0$  and SNR<sub>filtered</sub> becomes 16 dB. Fig. 2-9 is a summary. Hence, when adding AWGN in an over-sampling system which contains a low-pass filter, the added noise power is required to be modified by using Eqn.(2-13)

$$SNR_{added} = SNR_{required} - 10 \cdot \log_{10}(U)$$
 (2-13)



Fig. 2-7 Illustration of SNR and  $E_s/N_0$  in over-sampling ratio equal to 4 (U = 4)



Fig. 2-8 Power spectrum density after passing a low-pass filter or a decimation filter



Fig. 2-9 Comparison of SNR and E<sub>b</sub>/N<sub>0</sub> in an oversampling system

#### 2.4 Link Budget

Link budget is used to roughly estimate the required SNR of a receiver system at a certainly BER. Fig. 2-10 is a block diagram of the link budget of a receiver. A wireless receiver can be divided into three parts, the radio frequency (RF), the baseband, and the channel code as shown in Fig. 2-10. The radio frequency and the baseband introduce additional noises into the system and the channel code improves the system performance.



There is a path loss ( $P_{loss}$ ) between a transmitter and a receiver. The model of path loss depends on the environment that signals pass through. For example, Eqn.(2-14) is a path loss model for free-space transmission [21]:

$$P_r = P_t \frac{\lambda^2 G_t G_r}{4\pi d^2} \tag{2-14}$$

where  $P_r$  is the received power,  $P_t$  is the transmitted power,  $G_t$  is the antenna gain of the transmitter,  $G_r$  is the antenna gain of the receiver,  $\lambda$  is the wave length of the carrier, d is the distance between the transmitter and receiver. To replace with other path models, Eqn.(2-14) can be simplified into Eqn. (2-15).

$$R_{ss} = P_r = P_t \cdot P_{loss} \tag{2-15}$$

Where  $R_{ss}$  is the receiver sensitivity. For defining a specification, the required  $R_{ss}$  is based on the following steps:

- Define the required BER according to the requirement of application.
- Define the required  $E_b/N_0$  base on the performance of channel codes.
- •Translate  $E_b/N_0$  to SNR<sub>o</sub> according to the symbol rate, the bandwidth and modulations
- •Consider the implementation loss ( $IMP_{loss}$ ) of baseband and the noise figure ( $N_f$ ) of the RF.
- Calculate the thermal noise power based on Eqn. (2-10).

In short, the receiver sensitivity in dBm can be calculated by using Eqn.(2-16) is modified from [22]:

$$R_{ss} = \underbrace{-113.86 + 10 \cdot \log(B)}_{Thermat noise power} + \underbrace{IMP_{loss} + N_f}_{Noise of non-ideal} + SNR_o(dBm)$$
(2-16)

From the view of a baseband designer, the knowledge of the reasonable implementation loss is required. Usually, standards will list the receiver sensitivity and the requirement of BER or packet error rate (PER). We can speculate the reasonable implementation loss from that information. In the following, we take 802.15.3c [3] as an example and derive the reasonable implementation loss.

According to the 802.15.3c, the required PER is 0.08 at payload length equal to 2<sup>14</sup> bits in AWGN channel of SC mode; hence, the required BER can be calculated by using Eqn. (2-17).

$$(1 - BER)^n \ge (1 - PER)$$
  
 $(1 - BER)^{(2^{14})} \ge (1 - 0.08)$   
 $(1 - BER)^{16384} \ge 0.92$  (2-17)  
 $BER \le 1 - (0.92)^{1/16384}$   
 $BER \le 5.09 \times 10^{-6}$ 

where n is the number of the transmitted bits. Thus, we can get the required BER is about  $5.09 \times 10^{-6}$ . On the other hand, the required BER for OFDM mode is  $1 \times 10^{-6}$  [3]. Hence, we choose that the required BER is equal to  $1 \times 10^{-6}$ . According to Fig. 2-11, the required  $E_b/N_0$  is about 3dB [23] for MCS 1 (QPSK (M=2), code rate ( $C_{rate}$ ) = 1/2). Then, by using Eqn. (2-11), the required  $SNR_o$  is also 3dB. The receiver sensitivity of MCS1 is -50dBm [3]. By assuming the noise figure is 7 dB [24], the receiver  $SNR_r$  after RF is about 24dB. Therefore, the reasonable implementation loss is about 21dB.



Fig. 2-11 LDPC performance for MCS1, 2, 3 [23]

#### 2.5 Summary

In this Chapter, channel models of 802.15.3c and DVB-T/H are introduced. Those channels can be divided into NLOS channel and LOS channel. The channel spectrum of LOS is usually more flat and can provide a better environment. In addition, the effects of the frequency offsets are also discussed. The frequency offsets causes the constellation rotate and introduce ICI. Finally, the method to calculate link budget in a communication system is shown. This can provide a rough speculation of the performance requirement.



## **Chapter 3**

## Data-path in a Baseband Receiver

This chapter introduces several data-paths used in the design of a baseband receiver. Different implementations and hardware complexity of those data-paths are compared.

#### 3.1 Moving Sum Architecture

Maximum correlation and its recursive form shown in Eqn. (3-1) and Eqn. (3-2) is usually used in the symbol detection.

$$k(i) = \sum_{n=0}^{N_s-1} x(n+i)^* \cdot x(n+i-N) \quad i = 0 \sim P$$
 (3-1)

$$k(0) = x(0)^* \cdot x(-N) + x(1)^* \cdot x(1-N) + x(2)^* \cdot x(2-N) + \dots + x(N_g - 1)^* \cdot x(N_g - 1 - N)$$

$$k(1) = \underbrace{x(0)^* \cdot x(-N) + x(1)^* \cdot x(1-N) + x(2)^* \cdot x(2-N) + \dots + x(N_g - 1)^* \cdot x(N_g - 1 - N)}_{k(0)} + \underbrace{x(N_g - 1 + 1)^* \cdot x(N_g - 1 + 1 - N) - x(0)^* \cdot x(-N)}_{\Delta k(0)}$$

$$(3-2)$$

$$k(1) = k(0) + \Delta k(0)$$

$$k(1) = k(0) + \Delta k(0)$$
  
 
$$\Rightarrow k(i+1) = k(i) + x((N_g - 1) + (i+1))^* \cdot x((N_g - 1) + (i+1) - N) - x(i)^* \cdot x(i-N) = k(i) + \Delta k(i)$$

where x(n) is the received signal, N is the length of repeated signals,  $N_g$  is the length of a repeated signal and P is the length of the detecting window. A moving sum architecture [25] implements the recursive form of the maximum correlation is shown in Fig. 3-1. The comparison of the direct implementation of Eqn. (3-1) and the moving sum architecture is shown in TABLE 3-1. The required multipliers and adders of the moving sum architecture are less.



Fig. 3-1 Moving sum architecture [25]

TABLE 3-1 Comparisons of hardware complexity

#### 3.2 Delay Line

There are two ways to implement the delay line. One is the shift register based and the other is the memory based. Because a delay line is required to store and to push data at the same time, a dual port memory is a straightforward module to implement the delay line. However, the area complexity of a dual port memory is usually larger than that of a single port memory at the same the capacity. An architecture of the delay line using single port memory is shown in Fig. 3-2 [26]. This architecture uses two single port memories and each single port memory has half of the capacity of the delay line. These two single port memories are interlaced by read and write accessed. Hence, this architecture has the same function as a delay line.

Two design examples are shown in TABLE 3-2 and TABLE 3-3. In TABLE 3-2, the length of the delay line is very longer (a 8k delay line for DVB-T/H). Hence the implementation by single port memories has lower area compared to that of a dual port memory. However, in TABLE 3-3 which is a comparison of different implementations (802.15.3c) of memory, the length of the required memory is very

short. Therefore, the implementation by single port memories has no advantage. Besides, the implementation by logics and registers has the largest area in both design cases.



Fig. 3-2 Single port memory based delay line [26]

TABLE 3-2 Comparisons of different implementations of delay line (8K × 12) [10] [11] [12]

| 0.18um Process | 8K shift register | 8K dual port | 8 × 1K single port |
|----------------|-------------------|--------------|--------------------|
| Area (mm^2)    | 5.56              | 1.56         | 1.04               |

TABLE 3-3 Comparisons of different implementations of memory  $(64 \times 64)$ 

| 65nm Process | 64 (logic + register) | 64 dual port | $2 \times 32$ single port |
|--------------|-----------------------|--------------|---------------------------|
| Area (um^2)  | 47285                 | 12280        | 17052                     |

# **3.3 Differential Encoding [13] [26] [27]**

The differential encoding is a method to reduce the required capacity of ROM. This method only stores difference between the successive values and the overhead is that it requires an additional accumlator. Fig. 3-3 is an example to record the continual pilot of DVB-T/H. The distribution pattern of differences is periodical. Hence, it only requires to record one period of the distribution pattern. The storage cost of the original method is 2301 (177  $\times$  13) bits; in contrast, the storage cost of the differential encoding method is 360 (45  $\times$  8) bits. However, the length of ROM must be a

power-of-two. Hence, the implemented storage size becomes 512 ( $64 \times 8$ ) and is reduced by 77% [13] [26] [27].



Fig. 3-3 Differential encoding of continual pilot positions of DVB-T/H [13] [26] [27]

### 3.4 Complex Multiplier

The complex multiplier is a very common module in a baseband receiver. The traditional complex multiplier contains four multipliers and two adders. When considering the complexity of the area, the other architecture shown in Eqn.(3-3) is proposed by [28] [29]. An illustration of these two architectures is shown Fig. 3-4. The modified complex multiplier has a common term and it reduces a multiplier. Comparisons of two architectures are shown in TABLE 3-4. The modified one has lower complexity but its critical path is longer.

$$(a+bj)*(c+dj) = (ac-bd) + (ad+bc) j$$
  
=  $(ac-bc-bd+bc) + (ad+bd+bc-bd) j$   
=  $(c(a-b)-b(c-d)) + (d(a+b)+b(c-d)) j$  (3-3)



Fig. 3-4 Architectures of complex multiplier (a) 4 'x' and 2 '+' (b) 3 'x' and 5 '+' [28] [29]

 Original
 Modified [28] [29]

 Multiplier
 4
 3

 Adder
 2
 5

 Critical path
 1 Multiplier, 1 Adder
 1 Multiplier, 2 Adder

TABLE 3-4 Comparisons of different complex multiplier

# 3.5 Loop Filter

In a synchronization loop, a loop filter is used to suppress noise and makes the freed back loop stable. A simple loop filter shown in Fig. 3-5 is usually used in the baseband receiver and its detail description is reported by [25] [30].  $C_1$  and  $C_2$  are adjustable and can control the convergence speed and the stability of the steady state. Fig. 3-6 shows different tracking curves at different loop filter coefficients for the CFO loop used in a 802.15.3c baseband receiver. To reduce the hardware complexity,  $C_1$  and  $C_2$  can be set into power of twos. Hence, the multipliers shown in Fig. 3-5 can

be replaced by wire shifting.



Fig. 3-5 Architecture of a loop filter [25] [30] on the CFO loop



Fig. 3-6 Tracking curves of different filter coefficients for CFO loop in a 802.15.3c baseband receiver

The detail analysis of this loop filter is reported by [30]. To quickly select a suitable set of coefficients,  $C_1$  and  $C_2$  can be decomposed into  $K_c$  and  $K_d$  as shown in Eqn.(3-4). Different combinations of  $K_c$  and  $K_d$  of trucking curves are shown in Fig. 3-7.

$$C_1 = \frac{1}{K_c}$$

$$C_2 = \frac{1}{K_c} \times \frac{1}{K_d}$$
(3-4)



 $Fig. 3-7 \ Different \ combinations \ of \ K_c \ and \ K_d \ (a) \ K_d = 1, \ (b) \ K_d = 4, \ (c) \ K_d = 8, \ (d) \ K_d = 16, \ (e) \ K_d = 32,$   $(f) \ K_d = 64,$ 

According to these trucking curves, we can roughly figure out than  $K_c$  controls the convergence and the stability in the steady state and  $K_d$  controls the level of damping. The applicable set of  $K_d$  is {4, 8, 16, 32, 64}. When deciding the value of  $K_d$ , it is recommend to select from big to small number of the set. Deciding the value of  $K_c$  relates to the average value of estimation in open loop ( $V_e$ ) and the compensation range ( $C_r$ ).  $K_c$  can roughly be set to 10~100 times  $V_e/C_r$  (This paragraph is an empirical rule).

### 3.6 CORDIC

The coordinate rotational digital computer (CORDIC) [31] [32] [33] is an algorithm to compute the trigonometric functions. The basic idea of this algorithm is as follows:

$$B = A \times \exp(j\theta) \tag{3-5}$$

where A is a complex number and B is A rotated by  $\theta$ . Then, we can translate Eqn. (3-5) into a matrix form:

$$\begin{bmatrix} B_R \\ B_I \end{bmatrix} = \underbrace{\begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix}}_{Rotation Matrix} \begin{bmatrix} A_R \\ A_I \end{bmatrix}$$
(3-6)

Then, Eqn.(3-6) can be modified to:

$$\begin{bmatrix} B_R \\ B_I \end{bmatrix} = \cos(\theta) \begin{bmatrix} 1 & -\tan(\theta) \\ \tan(\theta) & 1 \end{bmatrix} \begin{bmatrix} A_R \\ A_I \end{bmatrix}$$
(3-7)

Assume  $\theta$  is composed of several  $\phi$  and tan ( $\phi$ )s are power of two

$$\theta \approx \sum_{i=0}^{N-1} \phi_i + noise \quad \tan(\phi_i) = 2^{-i}$$
 (3-8)

$$\begin{bmatrix} B_R \\ B_I \end{bmatrix} = \underbrace{\cos(\phi_1)\cos(\phi_2)\cdots\cos(\phi_n)}_{Additional \ gain} \underbrace{\begin{bmatrix} 1 & -2^0 \\ 2^0 & 1 \end{bmatrix}}_{Rotation \ Matrix \ Rotation \ Matrix} \underbrace{\begin{bmatrix} 1 & -2^{-1} \\ 2^{-1} & 1 \end{bmatrix}}_{Rotation \ Matrix} \underbrace{\begin{bmatrix} 1 & -2^{-N-1} \\ 2^{-N-1} & 1 \end{bmatrix}}_{Rotation \ Matrix} \underbrace{\begin{bmatrix} A_R \\ A_I \end{bmatrix}}_{Rotation \ Matrix} + noise$$
(3-9)

Finally, the rotation by  $\theta$  can be replaced with several micro angles and the implementation only requires adder and wire shifting. For baseband applications, the additional gain is not a problem because the equalizer will compensate this gain.

Two architectures are used in the implementation of CORDIC and they are the folding and the unfolding architecture shown in Fig. 3-8 [34] [35]. The comparison of these two architectures is shown in TABLE 3-5. In sort, the folding architecture has lower complexity but it required an additional N times clock rate. The unfolding architecture has a longer critical path but it can use pipeline architecture to speed up the computation.



Fig. 3-8 Implementation of CORDIC (a) folding architecture (b) unfolding architecture [34] [35]

TABLE 3-5 Comparisons of implementations of CORDIC

|               | Folding                                      | Unfolding     |  |
|---------------|----------------------------------------------|---------------|--|
| Adders        | 3 3 × N                                      |               |  |
| Shifter       | Barrel shifter                               | Wire shifting |  |
| Critical path | 1 Barrel shifter, 1 Adder                    | N Adders      |  |
| Clock rate    | Clock rate One additional N times Clock rate |               |  |

Note: N is the number of the CORDIC stages

### 3.7 Removing DC error

In the hardware design, a suitable truncation scheme reduces the hardware complexity and keeps the performance. In the 2's complementary system, the truncation makes both a positive number and a negative number become smaller. Hence, the received data has a DC gain error after truncation. In an OFDM system, the received data will transfer into the frequency domain; therefore, the DC subcarrier will suffer from a large performance loss due to the truncation. Fortunately, the DC subcarrier usually does not carry any information (a Null subcarrier) in most OFDM system to prevent the re-radiation or leakage of a local oscillator (LO). A simple and easy method reported by [36] can remove the DC error with a little overhead.

Fig. 3-9 shows a hardware design example of the removing DC error. The example is a Multiply-Add function in an interpolator. Fig. 3-9 (b) is a version of the removing truncation DC error. Two constant adders are required according to the method. Besides, the Multiply-Add function is replaced by a DesignWare IP [37], 'DW\_02\_prod\_sum1'. Fig. 3-10 is a simulation result of removing truncation DC error. It shows the DC error in the Fig. 3-10 (a) can be removed. Finally, the hardware

complexity comparisons are listed in TABLE 3-6. The design example is a Cubic Lagrange intepolator [8] [9]. The design overhead of removing DC error is that the gate counts are increased by 11%.



Fig. 3-9 Hardware implementation of Multiply-Add (a) original (b) removing DC error version



Fig. 3-10 Simulation of removing DC error (a) original (b) removing DC error version

| Process               | 90nm          |                 |  |
|-----------------------|---------------|-----------------|--|
| Max Clock Rate        | ≒333MHz (3ns) |                 |  |
| (G + G + 1)           | Original      | Remove DC error |  |
| Area (Gate Counts)    | 9K(100%)      | =10K(111%)      |  |
| Power @(0.9V,333MHz)  |               |                 |  |
| Estimated by Synopsys | ≒1.37mW       | ≒1.44mW         |  |
| Design Compiler       |               |                 |  |

TABLE 3-6 Synthesis comparisons of an interpolation with and without the removing DC

#### 3.8 Fast Fourier Transform

In an OFDM receiver, a Fast Fourier Transform (FFT) unit is required to transform data form the time domain to frequency domain. The output order of an FFT has two kinds as shown in Fig. 3-11. The normal order is a general order of FFT output. However, in standards [1] [3], the positions of pilots are recorded in the reversed order. It is important to make sure what kind order is used in standards.



Fig. 3-11 Output order of FFT (a) normal order (b) reversed order

Pipeline-based architecture [38] [39] [54] and memory-based [40] [41] architecture are widely used in the implementation of FFT. The comparisons of these two architectures are shown in TABLE 3-7. The memory-based architecture can be regarding as folding form of the pipeline architecture. Hence, the memory-based one

has low complexity but it require more clock cycles to accomplish process. The pipeline-based one usually can operate at a higher frequency by pipelining. When considering the application of OFDM system, the successive data input to FFT unit is an important issue. The memory-based one need an extra input buffer for temporarily storing the input data but the pipeline-based one does not require. In the other hand, the order of output of pipeline-based one is not regular; hence, it requires a reorder buffer. In contrast, the memory-based one can reuse the internal memory.

Pipeline-based (SDF) [38] [39] [54] Memory-based [40] [41] Complexity 0 Speed Χ Successive Χ Data in (require a input buffer) Reorder (require a reorder buffer) (reuse the internal memory) Delay Delay Radix-2/4/8 Twiddle Twiddle Factor Factor FFT Input Reorder Buffer Radix-2/4/8 FFT output Radix-2/4/8 Factor Factor

TABLE 3-7 General comparison of FFT architectures

Fig. 3-12 2K/4K/8K FFT [52] with single path delay feedback (SDF) [38] [39] [54] and Radix-2/4/8 [53]

Fig. 3-12 is a 2K/4K/8K FFT (a work of Syu-Siang Long [52]) adopted in DVB-T/H receiver. This FFT uses single path delay feedback (SDF) [38] [39] [54] which is a pipeline-base architecture and combines Radix-2 and Radix2/4/8 [53]. A reorder buffer is used to transform the output order. It can operate at 40MHz clock and process 2K, 4K, and 8K FFT operations.

### 3.9 Summary

Several implementation methods of a delay line are discussed in this chapter. The length of delay can roughly decide how to implement a delay line. When the length is shorter, the complexity of the implantation of dual port memories is comparable with that of single port memories. On the contrast, when the length is longer, the implementation by single port memories has lower complexity. Algorithms of a baseband receiver require trigonometric modules such as cosine, sine, and arc-tangent. CORDIC algorithm can calculate those functions and has a good ability for reusing. Two architectures of CORDIC are compared and the adoption of these two architectures depends on the receiver architecture. Besides, the truncation operation is usually used to reduce the hardware cost but a DC error is generated in a 2's complementary system. This error has large influence on the DC subcarrier of an OFDM system. A removing DC error technique [36] is introduced. The method can eliminate the DC error with smaller overhead. Two architectures of FFT are compared. The memory-based architecture has lower complexity but requires more clock cycles. The pipeline-based architecture can operate at high frequency but an extra reorder buffer is required.

# Chapter 4

# **OFDM Baseband Receiver for**

# **DVB-T/H**

This chapter shows the proposed DVB-T/H receiver. First, an introduction of DVB-TH standard is presented and the proposed architecture for a DVB-TH receiver is shown. Then, several schemes are proposed to reduce the hardware complexity and the power consumption. Finally, the implementation results are shown.

#### 4.1 Introduction of DVB-TH

Digital video broadcasting terrestrial and handheld (DVB-T/H) [1] [2] are proposed by European Telecommunications Standards Institute (ETSI) to transmit digital TV signal. DVB-T/H defines three different bandwidths, 6, 7 and 8MHz for different areas and countries. In Taiwan, the standard of digital TVs adopts 6MHz DVB-T. Fig. 4-1 shows the DVB-T/H transmitter block diagram [1] [2]. The DVB-T/H adopts two level channel codes, Reed Solomon code and convolution code. Reed Solomon code has a better ability against bust errors and convolution code is more suitable for random errors. Hence, these two codes cooperate well.

The DVB-T/H standard adopts orthogonal frequency division multiplexing (OFDM). In the DVB-T/H, there are three symbol lengths, 2048 (2K Mode), 4096(4K Mode) and 8192 (8K Mode) and four guard interval (GI) lengths which are used for with different channel conditions. Besides, continual pilots, scattered pilots and

transmission parameter signaling (TPS) pilots are inserted in the frequency domain. The continual pilots have fixed position, the scattered pilots change their position every OFDM symbols and the TPS is used to transmit system parameters. The data subcarriers can use several different constellation schemes like, quadrature phase-shift keying (QPSK), 16 quadrature amplitude modulation (QAM) and 64QAM. TABLE 4-1 is a summary of the specification of DVB-T/H.



Fig. 4-1 The DVB-T/H transmitter block diagram [1] [2]

TABLE 4-1 Specification of DVB-T/H [1] [2]

| Bandwidth (MHz)     | 6                  | 7   | 8    |
|---------------------|--------------------|-----|------|
| Samping Preiod (us) | 7/48               | 1/8 | 7/64 |
| FFT Length,         | 2K,4K,8K           |     |      |
| Used Subcarriers    | 1705, 3409, 6817   |     |      |
| Guard interval      | 1/4, 1/8,1/16,1/32 |     |      |
| Modulation          | QPSK, 16QAM, 64QAM |     |      |

#### 4.2 Baseband Receiver Architecture

Fig. 4-2 shows the block diagram of the DVB-T/H baseband receiver. In the receiver, the Mode/GI/Symbol detection, the carrier synchronization, the sampling clock synchronization and the channel estimation (inner receiver) are designed and implemented into RTL level. The soft demapper, the interleaver, and the soft Viterbi decoder (outer receiver) are behavior models which are used to measure the receiver performance. The hardware implementation contains two clock rate domains. One is 4X clock rate and the other is 1X clock rate. The derotator, the interpolator and the FFT operate at 4X clock rate. On the other hand, the Mode/GI/Symbol detection, the channel estimation, the integer CFO (ICFO) estimation and the SCO and residual CFO (RCFO) estimation [5][6][7] work at 1X clock rate.

The demodulation flow has two stages: the acquisition stage and the tracking stage. In the acquisition stage, the receiver detects the transmission Mode and the GI length, finds the OFDM symbol boundary, compensates the fractional CFO (FCFO) and estimates ICFO. Then, the demodulation flow enters into the tracking stage. In the tracking stage, the receiver tracks SCO and RCFO. After getting into the steady state, the receiver detects the scattered pilot mode, does channel estimation, equalization and demaps the constellation into bits stream.

The goal is to design a low power and low complexity baseband receiver. The following is the summary of the adopted schemes to reduce the power consumption or the hardware complexity:

- The Phase prediction scheme reduces the operations of phase accumulators during GI period. (Low power)
- Distributed memory banks reduce the access power and increases the access

ability. (Low power)

- The Differential encoding scheme of continual pilots positions reduces the storage cost.(Low complexity)
- The Mode/GI/Symbol detection and the channel estimation share the same memory bank.(Low complexity)
- The integer CFO (ICFO) estimation and the residual CFO (RCFO) and SCO estimation share the same memory module. (Low complexity)



Fig. 4-2 The DVB-T/H receiver architecture

# 4.3 GI Detection [10] [11] [12]

The Mode/GI detection algorithm [42] adopts the cyclic prefix (CP) based correlation algorithm to identify the symbol mode. Eqn.(4-1) is the maximum correlation (MC) [6]:

$$x_{MC}(n) = \left| \sum_{i=0}^{\frac{Nsc}{32} - 1} r^*(n-i) \times r(n-i - Nsc) \right|$$
 (4-1)

where r(n) is the received signal, Nsc is the number of sub-carriers and Nsc/32 is the shortest guard interval length. The correlation result  $x_{MC}(n)$  will form a peak or plateau if the tested mode equals to the transmitted symbol mode. However, defining the threshold and detecting the plateau are difficult due to glitches. Eqn.(4-2) is a modified form of Eqn.(4-1) called the normalized maximum correlation (NMC) [43] [44]:

$$x_{NMC}(n) = \frac{\left| \sum_{i=0}^{Nsc} r^{*}(n-i) \times r(n-i-Nsc) \right|}{\left| \sum_{i=0}^{Nsc} r^{*}(n-i) \times r(n-i) \right|}$$
(4-2)

The denominator denotes the power of received signal r(n) and is employed to normalize to "1". Unlike MC method, NMC method has more flat plateau and is easy to detect the GI length; however, the NMC method requires division operation.

#### 1896

To reduce the division operation of NMC, the plateau threshold is defined as ' $T_h$ ' as given by Eqn.(4-3). Then, the GI detection equation can be modified as shown in Eqn.(4-4). A division operation is removed by moving the denominator to the right side and adopting a pre-determined threshold,  $T_h$ . Moreover, the square-root operation in the absolute operation of a complex number is also not required by squaring both sides.

$$x(n)_{NMC} \in plateau \quad if \quad x(n)_{NMC} \ge T_h$$
 (4-3)

$$\left| \sum_{i=0}^{\frac{Nsc}{32}-1} r^*(n-i) \cdot r(n-i-N) \right|^2 - (T_h)^2 \times \left| \sum_{i=0}^{\frac{Nsc}{32}-1} r^*(n-i) \cdot r(n-i) \right|^2 \ge 0$$
 (4-4)

Determine an accurate detection is important in reducing the detection error. Using low threshold, the non-plateau region will be regarded as a plateau region and it causes incorrect GI length detection. Using high threshold, glitches on the plateau will decrease the estimated plateau length and cause incorrect GI detection. Fig. 4-3 and Fig. 4-4 are GI detection error rate simulations results for 2k and 8k mode. In simulation results, the detection rate at 8K mode is better than that at 2K mode. This is because 8K mode has the longest symbol length. Except the case of 1/32 GI length, the detection error rate has similar behavior at the low and high threshold. In the case of 1/32 GI length, at low threshold, miscalculated plateau decreases the GI detection error rate. At high threshold, due to the decision boundary of 1/32 GI case, the decreased plateau length does not cause the detection error in these simulations. For detection error rate to be lower than 0.01, the threshold is chosen to be 0.5. In 1/32 GI and 2K mode case, the performance is very close to 0.01. The multiplication in Eqn.(4-4) can be replaced with displacement of wiring in the hard implementation.



Fig. 4-3 GI detection error rate vs. threshold under 8K transmission mode, AWGN level = 5dB and



Fig. 4-4 GI detection error rate vs. threshold under 2K transmission mode, AWGN level = 5dB and Rayleigh channel [1] [2]

### 4.4 CFO and SCO synchronization

OFDM systems are sensitive to mismatches of carrier and sampling frequencies between transmitter and receiver. These mismatches cause two effects: phase rotation and intercarrier interference (ICI). CFO causes the constellation of an OFDM symbol to rotate by a common phase; on the other hand, the phase rotation caused by SCO is proportional to the subcarrier index [5][6]. In addition, the frequency offset breaks the orthogonality of OFDM systems; as a result, the transmitted data on a subcarrier is interfered by other subcarrier and causes the degradation of performance.

To avoid ICI and to keep the phase of the constellation fixed, the receiver needs to compensate the frequency offset. The CFO is composed of fractional CFO (FCFO) and integral CFO (ICFO) in an OFDM system. A three steps method for the carrier frequency synchronization (one pre-FFT and the other post-FFT) is reported by [5] [6] [7]. First, at the symbol boundary detection, the result of delay correlation is also used for estimating FCFO [43]. In the second step, ICFO is estimated in frequency domain by using pilots. However, the FCFO estimation cannot be estimated perfectly. A residual CFO (RCFO) still remains. Hence, a RCFO and SCO estimation [5] [6] [7] in the frequency domain is used to keep tracking RCFO and SCO at every OFDM symbol in the final step.

This work adopts the carrier frequency and sample clock synchronization [5] [6] [7]:

$$f_{\Delta} = \frac{1}{2\pi(1+N_{_{o}}/N)} \cdot \frac{1}{2} \cdot (\phi_{1,l} + \phi_{2,l})$$

$$t_{\Delta} = \frac{1}{2\pi(1+N_g/N)} \cdot \frac{1}{k/2} \cdot (\phi_{1,l} - \phi_{2,l})$$

$$\phi_{1,l} = \angle \left[\sum_{k \in C1} Z_{lk}\right] \qquad \phi_{2,l} = \angle \left[\sum_{k \in C2} Z_{lk}\right]$$

$$(4-5)$$

Where  $f_A$  is the estimated CFO,  $t_A$  is the is the estimated SCO, k is the number of subcarrier, N is the length of the OFDM,  $N_g$  is the length of guard interval,  $C_I$  is the positive continual pilot set,  $C_2$  is the negative continual pilot set, and Z is product of subcarriers of successive OFDM symbols. The architecture of the RCFO and SCO estimation is shown in Fig. 4-5. The ' $tan^{-1}$ ' module calculates the angle of a complex number and this module adopts the CORDIC algorithm [31]. To smooth the RCFO and SCO estimation, the loop filters [30] are added into the synchronization loops. The coefficients of the loop filters are designed as power-of-twos; therefore, the multipliers can be replaced with wire-shifting.



Fig. 4-5 Memory sharing architecture for ICFO, residual CFO (RCFO) and SCO estimation

In **ICFO** estimation; memory reduction architecture [45] uses Series-In-Parallel-Out (SIPO) to temporarily store sign bit of the samples from FFT; hence, it is not necessary to store the full bits of the each received data that comes from FFT. As a result, the usage of memory is reduced. In additional, because the multiplicand is equal to 1 or -1, the complex multiplier can be replaced with adders, inverters and MUXs. A differential encoding method [13] [26] [27] is used to record the continual pilot positions and the distribution of differential encoding positions is periodic. Therefore, storage requirement of recording continual plots positions is reduced by 77% in implementation. The design overhead is an accumulator and control unit to accumulate the difference values. Besides, The ICFO estimation and RCFO/SCO estimation share the same memory to reduce hardware cost. The carrier synchronization can compensates ±50 shifted subcarrier spacing (equivalent to ±220kHz at 2K mode and ±55kHz at 8K mode) and the clock synchronization can compensate 200ppm sampling clock offset.

# 1896

TABLE 4-2 Comparison of memory usage of ICFO, RCO and SCO

|                 | Direct Implementation | Memory Sharing Architecture |  |
|-----------------|-----------------------|-----------------------------|--|
| ICFO            | 512×24                |                             |  |
| Estimation [45] | 2×64×38               | 510, 04(Glassian)           |  |
| RCFO/SCO        |                       | 512×24(Sharing)             |  |
| Estimation      | 177×24                | 2×64×38                     |  |
| [5] [6] [7]     |                       |                             |  |
| Recording       | 177×13                | 64×8                        |  |
| Pilot Position  | (ROM)                 | (Differential Encoding)     |  |
| Total           | 22701 (100%)          | 17664 (750)                 |  |
| Required Bits   | 23701 (100%)          | 17664 (75%)                 |  |

In short, the memory sharing architecture shown in Fig. 4-5 is design to reduce the complexity of the memory usage. The comparison of direct implementation and this

architecture is shown in TABLE 4-2. The memory usage of this architecture is reduced by about 25%

The CFO compensation is composed of a derotator and a sinusoidal value generator. This work uses the coordinate rotational digital computer (CORDIC) [31] based derotator [46]. The conventional derotator needs a complex multiplier and the sinusoidal value generator requires hardware for implementation. A CORDIC-based derotator combines them to reduce hardware complexity. CORDIC can implement trigonometric function such as, sine, cosine and arctangent. Hence, CORDIC can be reused to do different equations. A CORDIC-based derotator combines the derotator and the NCO to reduce hardware complexity. The derotator of this work also adopts a ten stages unfolding CORDIC structure [34] [35] as shown in Fig. 4-6.



Fig. 4-6 Architecture of ten stages unfolding CORDIC [34] [35]

A detail mathematic description of digital timing (sampling clock) synchronization loop is reported by [8][9]. The synchronization is composed of estimation and compensation. The compensation is also called "digital fractional

delay filter" or "interpolator". The proposed receiver adopts the cubic Lagrange interpolator [9][47] for compensating the SCO and 4× oversampling.

The operation of interpolation controller is shown in Fig. 4-7. Because of cubic Lagrange interpolation and 4× oversampling, the normal operation is shown in Fig. 4-7 (a). The cubic Lagrange interpolation requires four points to construct a new sample of fractional interval among the basepoint. Besides, the system requires 4× decimation to recovery symbol rate; as a result, the interpolator generates a new sample every four samples. However, there are two exceptional situations. The Lagrange interpolation is valid within the interval of interpolation set. When the fractional interval exceeds this range, the interpolation controller requires to change the basepoint of the interpolation set. This work sets valid fractional interval within ±0.5 sample. The valid fractional interval, [-0.5, 0.5) has a very sample hardware implementation. In two's complement, the first two bits of a number which is less than -0.5 must be 10 and the number which is more than or equal to 0.5 must be 01. Hence, a comparator can examine the first two bits of a number to determine whether it is within the fractional interval instead of a whole bits comparison to reduce the power consumption.

Fig. 4-7(b) and Fig. 4-7(c) are the exceptional situations. In Fig. 4-7 (b), the sampling frequency of receiver is higher than that of transmitter, so the fractional interval is increasing. When it exceeds 0.5 sampling period, the interpolation controller moves the basepoint forward. Then, the modified fractional interval is the complementary of the original one. Therefore, a sample is skipped in this situation. Fig. 4-7(c) shows the other situation. When the sampling rate of the receiver is slower than that of the transmitter, the fractional interval is decreasing. As a result, a sample

is duplicated.



Fig. 4-7 Operation of the interpolation controller (a) normal operation (b) skipped operation and (c) duplicated operation

Fig. 4-8 shows the modified Fallow structure [47] for the cubic Lagrange interpolator [9] [47]. This work modifies the Fallow structure by adding data holding registers to form a serial-in-parallel-out registers (SIPO) which is controlled by interpolator controller. SIPO stores the received samples into data holding registers every four clock cycles in normal operation or it stores received samples every five clock cycles in skipped operation and every three clock cycles in duplicated operation. Hence, the interpolator does not require processing the dropped data due to 4× decimation and results in power saving.



Fig. 4-8 Modified Fallow structure for cubic Lagrange interpolator [9][47]

After locating the symbol boundary, the samples within the GI period can be dropped. Hence, the interpolator and derotator can stop working to save power. However, the phase accumulators (ACC) of the NCO and interpolator controller must keep the phase continuity. A phase prediction scheme [13] [26] [27] is proposed in this work. First, the estimation estimates the frequency offset once at each symbol, so the estimated frequency offset is a constant within an OFDM symbol. Second, the estimated frequency offset multiplied by the GI length is the total phase offset of GI. As a result, the total phase offset during GI is also a constant. Thus, the proposed scheme disables the NCO and interpolator controller within the GI and it predicts and compensates the phase of GI at the beginning of the next OFDM symbol. Moreover, because the GI length of DVB-T/H is a power-of-two, the multiplication of phase prediction can be replaced with the shifting of the connections for complexity saving. Fig. 4-9 is the simulation waveforms of the proposed phase prediction scheme. It

shows that this scheme can keep the phase continuity and reduces 3%-20% operations of the phase accumulators for different GI length.



Fig. 4-9 Phase prediction of phase accumulator

Fig. 4-10 is a RTL simulation result of RCFO/SCO tracking and the FCFO estimation is closed to test tracking ability of RCFO. The simulation shows that both RCFO and SCO can track the offset at 8K mode, SNR = 20dB, 64 QAM, Rayleigh channel, 200ppm SCO and 0.05 sub-carrier spacing RCFO.



Fig. 4-10 RCFO/SCO RTL tracking curve @ 8K mode, SNR = 20dB, 64 QAM, Rayleigh channel, 200ppm SCO and 0.05 sub-carrier spacing RCFO

Fig. 4-11 shows the simulated RTL output SNR for different fractional SCOs (in ppm). In this simulation, a RCFO equal to 0.05 subcarrier spacing is added. The simulation results show the receiver can keep tracking under different frequency offsets at 8K/2K mode, 64QAM, AWGN/Rayliegh channel.



Fig. 4-11 Output SNR of different SCOs @ RCFO = 0.05 subcarrier spacing, 8K/2K mode, 64QAM and AWGN/Rayliegh channel

# 4.5 Scattered Pilots Synchronization [11] [13] [14]

The position of scattered pilots is recorded in TPS pilots. To decrease the detection latency, two fast scattered pilot synchronization (SPS) algorithms are reported in [48] [49]. One is Power-Based (PB) algorithm shown in Eqn.(4-6) [48] [49] and the other is Correlation-Based (CB) algorithm shown in Eqn.(4-7) [48] [49]:

$$SP_{PB} = \arg\max_{k} \left\{ \left| \sum_{p=0}^{p_{\text{max}}} SC(n,12p + 4 \times (k+3)_{\text{mod4}}) \right| \times SC^{*}(n,12p + 4 \times (k+3)_{\text{mod4}}) \right\}$$
(4-6)

$$SP_{CB} = \arg\max_{k} \left\{ \left| \sum_{p=0}^{p_{\text{max}}} SC(n,12p + 4 \times (k+3)_{\text{mod}4}) \right| \times SC^{*}(n-4,12p + 4 \times (k+3)_{\text{mod}4}) \right\}$$
(4-7)

where SC(n,m) is the m<sup>th</sup> sub-carrier of the n<sup>th</sup> symbol, k is the possible scatter pilots mode and SP is the estimated scatter pilots mode. Both algorithms use the boosted power [1] [2] of the transmitted scatter pilots. The summation of correlation of the scatter pilots is usually larger than that of the data subcarriers. Therefore, the PB and CB algorithm can distinguish the scattered pilots from the data subcarriers.

The PB algorithm requires two real multipliers and one real adder to correlate, one adder to do summation and four register groups to store the correlation results of the possible scattered pilot location. On the other hand, due to the complex number operations of the CB algorithm, it requires a complex multiplier (three real multipliers and five real adders [28] [29]). Moreover, double register groups are required for recording the real part and the imagine part of the correlation result. Besides, an absolute value unit (two real multipliers and one adder) is required. Moreover, an extra storage element is used to record the data of the pervious symbol at possible scattered pilot location. For example, the CB algorithm requires a 2272 words memory to store the scattered pilots. The hardware complexities of PB and CB algorithm are shown in TABLE 4-3.

TABLE 4-3 Hardware complexity of PB and CB Algorithm

|    | Real<br>multiplier | Real adder | Register<br>group | Memory                   | Latency (symbols) |
|----|--------------------|------------|-------------------|--------------------------|-------------------|
| PB | 2                  | 2          | 4                 | 0                        | 1                 |
| СВ | 5                  | 8          | 8                 | 227 2Words<br>( 8K mode) | 5                 |

The proposed baseband receiver adopts a two stages SPS scheme [11] [13] [14] to improve the reliability. This scheme operates SPS twice. The first SPS is used to detect the scattered pilot mode of the current symbol and the second one is used to ensure the prediction of the first one. If the detected scattered pilot mode from the second SPS is not the same as the predicted mode from the first one, the system will think that an error happened and redo the two stages SPS scheme. The first and second SPS can either the PB algorithm or the CB algorithm. Because the CB algorithm requires pervious symbol, the detection latency is five OFDM symbols. Besides, when an error happened, the latency of the two stages PB-PB algorithm which redoes once is four OFDM symbols and it is smaller than that of the CB algorithm.

A performance comparison of the original SPS and the two stages SPS scheme is shown in Fig. 4-12. In addition, a three-stage PB-PB-PB SPS scheme is carried out in the simulation. In the simulation result, the single stage CB has better performance than the single stage PB and the two stages PB-PB; however, the detection latency of the single stage CB is much longer (five OFDM symbols). Moreover, the performance of the three-stage PB-PB-PB is close to the single stage CB. By considering the detection latency, error penalty and hardware complexity, this work adopts the two stages PB-PB for the scatter pilot synchronization. The latency of the two stages

PB-PB SPS algorithm is four OFDM symbols when an error happened. In hardware implementation, it requires two real multipliers, two real adders and four register groups; besides, it does not require memory storage to record the pervious data.



Fig. 4-12 Performance of the two stages and three stages SPS scheme

# 4.6 Equalizer [10] [11]

Eqn.(4-8) shows the basic concept of channel estimation for an OFDM system.

$$CR_{SP}(n,m) = \frac{SP_{rec}(n,m)}{SP_{\rm exp}(m)}$$
(4-8)

where n is the symbol index, m is subcarrier index,  $CR_{SP}(n,m)$  is the channel response of scattered pilots,  $SP_{rec}(n,m)$  is the received scattered pilot, and  $SP_{exp}(n,m)$  is the expected scattered pilot.

The DVB-T receiver estimates the channel response by using pilots in the frequency domain of the transmitted data. Fig. 4-13 shows the pilot arrangement of DVB-T. Precise channel estimation needs to use the scattered pilots of several OFDM symbols, and to interpolate the channel responses for data subcarriers. In the proposed receiver, the channel estimation and symbol detection share the same memory bank to reduce hardware complexity.



Fig. 4-13 Pilot arrangement of DVB [1] and 2-D predictive channel estimation [50]

The 2-D Predictive channel estimation algorithm [50] adopts time domain extrapolation. Eqn.(4-9) [50] shows the mathematical description of time domain extrapolation and Eqn.(4-10) [50] shows the mathematical description of frequency domain interpolation.

$$CR(n,k) = CR_{SP}(n,k)$$

$$CR(n,k+3) = \frac{CR_{SP}(n-3,k+3) \times 7 - CR_{SP}(n-7,k+3) \times 3}{4}$$

$$CR(n,k+6) = \frac{CR_{SP}(n-2,k+6) \times 6 - CR_{SP}(n-6,k+6) \times 2}{4}$$

$$CR(n,k+9) = \frac{CR_{SP}(n-1,k+9) \times 5 - CR_{SP}(n-5,k+9) \times 1}{4}$$

$$k = (k_{\min} + (n \mod 4) + 12p) \mod k_{\max} \mid p \in Z, p > 0$$

$$(4-9)$$

$$CR(n,3 \times (k-1)+1) = \frac{CR(n,3 \times (k-1)) \times 2 - CR(n,3 \times k) \times 1}{3}$$

$$CR(n,3 \times (k-1)+2) = \frac{CR(n,3 \times (k-1)) \times 1 - CR(n,3 \times k) \times 2}{3}$$

$$CR(n,m) = CR(n,m)$$
(4-10)

where CR(n,m) is the channel response at symbol n and subcarrier m. It uses two scattered pilots before the current one and performs extrapolation to predict the channel response of the current symbol as shown in Fig. 4-13. The benefit of this algorithm is that it only stores the scattered pilots and does not store the data subcarriers; therefore, this algorithm saves large memory.

Fig. 4-14 shows the channel estimation architecture modified form [50] [51]. The Multiplexes (MUXs) control the composition of scale number which shows in Eqn.(4-9) and Eqn.(4-10). The power of two multiplications can be realized by shifting of connections. This architecture includes seven SRAM blocks which are each composed of two 1K SRAM blocks. The scatter pilots of previous seven symbols are already stored into SRAM blocks when channel estimation is performing at symbol (n). Newly arrived scatter pilots of symbol (n) are stored into the first SRAM block, and that of symbol (n+1) are stored into the second SRAM block, and so on. To prevent simultaneous read/write operations or overwriting of the required data, the stored scattered pilots of the previous seven symbols are read every 12 cycles into data-holding registers as shown in Fig. 4-14. Within these 12 cycles, only one SRAM block is used to store the newly arrived scatted pilots, and the other memory blocks are disable for power saving.



Fig. 4-14 Channel estimation architecture modified from [50] [51]

# 4.7 Hardware Implementation

Fig. 4-2 shows the architecture of the DVB-T/H receiver. The baseband receiver chip contains two clock rate domains. The CORDIC based derotator, the Cubic Langrage interpolator and the FFT operate at 4×sampling rate and the Mode/GI/symbol detection, the channel estimation and the SCO and RCFO estimation work at 1×sampling rate. For the 8MHz channel bandwidth, the required sample rate is 9.14MHz, and the 4×sampling rate is 36.56MHz. A pipeline-based FFT implemented by Syu-Siang Long [52] is composed of Radix-2 and Radix-2/4/8 [53] butterfly and uses single-path delay feedback (SDF) [54] to realize 2K/4K/8K points FFT. The hardware implementation is a co-work with Wei-Chang Liu [11] and Chi-Yao Tseng [26].

A simulation environment is built to verify the performance and functionality of the receiver. The channel models use Ricean (F1) and Rayleigh (P1) channel as shown in Fig. 4-15 provided by the DVB-T/H standard [1] [2]. As shown in Fig. 4-2, the

baseband receiver is designed with Verilog hardware description language in RTL and synthesized to gate level implementation. According to the DVB-T/H standard, the required bit error rate (BER) after Viterbi decoder is  $2 \times 10^{-4}$ . Fig. 4-16 and Fig. 4-17 shows that the BER performance results which can achieve the required BER and TABLE 4-4 is a summary of required SNR.



Fig. 4-15 Ricean (F1) and Rayleigh (P1) channel [1] [2]



Fig. 4-16 Simulated RTL BER performance after soft Viterbi decoder at 2K mode, 1/4 GI



Fig. 4-17 Simulated RTL BER performance after soft Viterbi decoder at 8K mode,  $1/4~\mathrm{GI}$ 

| ,,,,,,,,,,       |               |              |
|------------------|---------------|--------------|
|                  | 2K Mode (dB)  | 8K Mode (dB) |
| 16QAM (AWGN)     | ≒ 11.1        | ≒10.9        |
| 16QAM (Ricean)   | ≒ 11.5        | ≒ 11.5       |
| 16QAM (Rayleigh) | ≒ 13.5        | ≒ 13.4       |
| 64QAM (AWGN)     | ≒ 15.5        | ≒ 15.3       |
| 64QAM (Ricean)   | <b>≒</b> 16.0 | ≒16.1        |
| 64OAM (Rayleigh) | ≒17.9         | ≒17.9        |

TABLE 4-4 Required SNR for different Modes, channels and modulations

The proposed receiver includes scan-chain and memory built-in self-test (BIST) for testing. To increase the observations, several watching points are inserted to get the information of the internal blocks and Fig. 4-18 shows the testing architecture of the proposed receiver, In addition, the proposed receiver can operate at different modes:

- 1. Normal mode: the functions of the receiver all works.
- 2. FFT mode: test the FFT.
- 3. CFO mode: open the SCO loop.
- 4. SCO mode: open the CFO loop

These watching points and testing modes can help to observe the internal signals.

1896



Fig. 4-18 Testing architecture

The synthesis result of the DVB-T/H receiver shows that the equivalent gate counts is about 810K gates (including Memory). The total memory requirement of this receiver is 102.8KB (99KB SRAM and 3.8KB ROM). Among the memory requirement, 76KB SRAM and 3.7KB ROM are for the FFT, 21KB SRAM is for the channel estimation and the Mode/GI/Symbol detection, and 2KB SRAM and 0.1KB ROM are for synchronizations. The summary of the design result is shown in TABLE 4-5.

TABLE 4-5 Synthesis results

|                                | <b>Equivalent Gate Count</b> | Required Memory Bits |
|--------------------------------|------------------------------|----------------------|
| FFT                            | 500K (62%)                   | 79.7KB               |
| Mode/GI/Symbol  Detection & CE | 223K (28%)                   | 21KB                 |
| CFO/SCO<br>Estimation          | 57K (7%)                     | 2.1KB                |
| CFO/SCO Compensation & others  | 30K (3%)                     | 0                    |
| Total                          | 810K                         | 102.8KB              |

Using 0.18µm, 1.8 V CMOS process, the core area of the proposed DVB-T/H receiver is about 12.96 mm<sup>2</sup>. Fig. 4-19 shows the measured Shmoo plots. The chip can operate from 1.44V to 1.8V. The maximum operating frequency is 60MHz at 1.8V. This chip consumes 43 mW at 1.8V and 28mW at 1.45V in the 1/4 GI mode at 40 MHz clock rate. The die photo of this receiver is shown in Fig. 4-20.





Fig. 4-19 Measured Shmoo plots (frequency vs. supply voltage) (a) 2K mode, 1/4 GI and (b) 8K mode, 1/4 GI (the axis are redrawn due to the unclear of the original picture)



Fig. 4-20 Die photo of the proposed DVB-T/H baseband receiver IC

TABLE 4-6 Comparison between previous reported DVB-T/H [51] receiver with this work

|                                          | Chen's [51]                                                     | Ours [55]                                                   |  |
|------------------------------------------|-----------------------------------------------------------------|-------------------------------------------------------------|--|
| Technology                               | 0.18µm                                                          | 0.18μm                                                      |  |
| Supply Voltage                           | 1.8 V Core, 3.3V I/O                                            | 1.8 V Core, 3.3V I/O                                        |  |
| Core Size                                | $= 6.5 \text{ mm} \times 5.4 \text{ mm}$ $= 35.10 \text{ mm}^2$ | $3.6 \text{ mm} \times 3.6 \text{ mm} = 12.96 \text{ mm}^2$ |  |
| Input Clock Frequency                    | 109.71 MHz                                                      | 40 MHz (36.56MHz)                                           |  |
| Power Consumption                        | 250mW (1/32 GI Mode)                                            | 43mW (1/4 GI Mode)                                          |  |
| Memory of                                | SRAM: 118KB                                                     | SRAM: 99KB, ROM: 3.8KB                                      |  |
| OFDM Demodulation                        | (FFT: 50KB, EQ: 68KB)                                           | (FFT: SRAM: 76KB & ROM 3.7KB)                               |  |
| Area of OFDM Demodulation                | $= 25 \text{ mm}^2$                                             | 12.96 mm <sup>2</sup>                                       |  |
| Operation Frequency of OFDM Demodulation | 36.56 MHz                                                       | 40 MHz (36.56MHz)                                           |  |
| Power Consumption of                     | ≒200mW(1/32GI)                                                  | 43mW (Measured Power at 1/4GI)                              |  |
| OFDM Demodulation                        | 200ii W(1/32GI)                                                 | 55mW (Normalized power at 1/32GI)                           |  |
| Power Consumption                        | NA E S                                                          | 28mW (Measured Power at                                     |  |
| @ Low Supply Voltage                     | IVA                                                             | 1/4GI,40MHz, 1.45V)                                         |  |

A comparison with previous reported DVB receiver [51] is listed in TABLE 4-6. The Chen's chip [51] contains OFDM demodulation and channel decoding. To make a fair comparison, the area and power consumption of the Chen's OFDM demodulation is calculated according to its die photo and power profiling. In summary, the proposed chip provides a low power and low area solution to the DVB-T/H baseband receiver.

### 4.8 Summary

This chapter shows the architecture of the proposed OFDM baseband receiver for DVB-T/H. The receiver integrates a Mode/GI/Symbol detection, a multimode FFT, a channel estimation, a carrier frequency synchronization loop, a sampling clock

synchronization loop and a two stages scattered pilots synchronization. This work adopts the modified division free NMC algorithm for GI detection. Under detection error rate and implementation consideration, 0.5 is chosen as threshold for the proposed scheme in GI detection. A novel phase predictive scheme reduces 3%~20% operations of phase accumulators for different GI lengths. The differential encoding scheme reduces the required storage size by 77%. The PB-PB two stages scatter pilots synchronization scheme has smaller latency and hardware complexity. The synthesis results show that the equivalent gate count of the DVB-T/H receiver is about 810K gates including 102.8 KB memory. This receiver chip was fabricated in a 0.18μm 1P6M technology and its core size is 12.96 mm<sup>2</sup>.

1896

# Chapter 5

## SC-FDE/OFDM Receiver for 60 GHz

The chapter takes the wireless indoor receiver in 60 GHz as an example. First, standards in 60 GHz are introduced. The advantages and disadvantage of OFDM mode and SC mode are compared. Then, the architecture of SC/OFDM dual mode receiver is shown and the behavior performance of OFDM mode is presented. Finally, a parallel architecture of the SCO compensator is proposed to achieve the Muti-Gbps transmission.

### 5.1 Introduction of standards for 60GHz

The 60 GHz frequency band is popular recently. Several standards, such as 802.15.3c [3] and 802.11ad [4] are use this band. The goal of 60GHz standards is to achieve Multi gigabit per second (Gbps) transmission in the indoor environment or transmits 1 Gbps at a range of at least 10m. The advantages of 60 GHz are [56]:

- The available bandwidth is wide. (57.0–66.0 GHz) and can provide Multi-Gbps transmission.
- The 60GHz frequency band is license-free in most country.
- The reflection of the signal is attenuated quickly; hence, the transmitter needs to aim at the receiver. The beamforming is required.

802.11ad is a 60 GHz version of 802.11 series and looks for compatibility with 802.15.3c in PHY. Both standards have OFDM and single carrier (SC) mode. The comparisons of these two standards are listed in TABLE 5-1. In the 802.11ad, a new

mode, named low power SC PHY, is added. The low power SC mode uses RS code instead of LDPC to reduce the power consumption. Besides, the payload of this new mode is different with the other modes. The data block length is still 448 like the SC mode; however, it is divided into 7 sub blocks which are composed of 56 data chips and 8 known GI. If Single carrier frequency domain equalizer (SC-FDE) [57] is adopted in the receiver, a smaller sub block length may reduce the power consumption.

TABLE 5-1 Comparison of 802.15.3c [3] and 802.11ad [4]

|                        | 802.15.3c [3]                                   | 802.11ad [4]         |  |
|------------------------|-------------------------------------------------|----------------------|--|
| Frequency Band         | 57-66 GHz                                       |                      |  |
| Sample (Chip) Rate     | E S <sup>2640</sup> MHz (OFDM)<br>1760 MHz (SC) |                      |  |
| III.                   | CMS(SC)                                         | Control PHY(SC)      |  |
|                        | SC 1896                                         | SC PHY               |  |
| Modes                  | HSI(OFDM)                                       | OFDM PHY             |  |
|                        | AV(OFDM)                                        | No                   |  |
|                        | No                                              | low power SC PHY     |  |
| Channel Code           | LDPC                                            | LDPC,                |  |
|                        |                                                 | RS(for low power SC) |  |
| Preamble Structure     | Almost the same                                 |                      |  |
|                        | OFDM: 512 + 64 (GI), 336 useful subcarriers     |                      |  |
| Payload Structure      | SC: 448 + 64 (known GI)                         |                      |  |
|                        | low power SC: 56 + 8 (known GI)                 |                      |  |
| Number of Pilot (OFDM) | Both are 16, but with different location        |                      |  |

The 60GHz standards use OFDM and SC for PHY. The comparisons of OFDM and SC are as follows:

- OFDM signal is composed of several different frequency signals. Constructive
  and destructive phenomenon makes OFDM have higher Peak-to-Average Power
  Ratio (PAPR). The higher PAPR makes the power amplifier more difficult to
  design [58] [59].
- Because the subcarriers in OFDM have orthogonality and the spectral of subcarriers can overlap. Hence, OFDM has better spectral efficiency than SC.
- OFDM must keep orthogonalilty and is sensitive to the carrier frequency offset. For the same CFO, the degradation of OFDM is larger than that of SC [60].
- For long multi-path, the complexity of a time equalizer in a regular SC is quite high. For OFDM, as long as the length of multi-path is smaller than GI, a one-tap frequency domain equalizer is sufficient. In 60 GHz standards, the data block in the SC mode inserts a period of known GI. Hence, the receiver can use SC-FDE. However, it requires both a FFT and an IFFT and has higher complexity than OFDM [57].
- Phase noise breaks the orthogonality of OFDM. Therefore, the performance of OFDM degrades more than that of SC at the effect of phase noise [61].
- The signal power of SC is an average of entire bandwidth but that of OFDM is a smaller bandwidth of a subcarrier. The fading of channel causes the signal power of a subcarrier degrade in OFDM. Without channel code, the performance of SC is better than that of OFDM [59] [60].

TABLE 5-2 is a summary of the Comparison between OFDM and SC.

TABLE 5-2 Comparison of OFDM and SC

|                          | OFDM                   | SC(π/2 Modulation)       |
|--------------------------|------------------------|--------------------------|
| Power Amplifier Design   | × (higher PAPR)        | 0                        |
| Spectral Efficiency      | 0                      | ×                        |
| Phase Noise              | ×                      | 0                        |
| Carrier Frequency Offset | ×                      | 0                        |
| Equalizer Design         | ○(One Tap EQ in Freq.) | × (Long ISI)  △ (SC-FDE) |
| Channel Code             | × (Poor w/o FEC)       | 0                        |

# 5.2 Consideration of Dual Mode Architecture Design

Two kinds of methods can be used for the symbol boundary detection and they are atuo-correlation based and cross-correlation based method. The atuo-correlation based method uses time shifted and repeated signals and calculates the correlation among them. A maximum correlation method [6][43] with a moving sum architecture [25] can reduce the required multiplier in atuo-correlation. The cross-correlation based method correlates the received signal and the known transmitting signal and a match filter is an example. In the 60 GHz standards, the known preamble uses Golay sequence; hence, a Golay correlater [63] can be adopted. The Golay correlater is also a kind of match filter, but it has lower complexity. The comparison of these method are listed in TABLE 5-3. In summary, the Colay correlator has the lowest complexity; however, in the view of the whole system, the maximum correlation operation can also estimate the CFO when searching the boundary. Therefore, the proposed receiver will use maximum correlation algorithm.

TABLE 5-3 Comparison of Boundary Detection

|                                  | Multiplication pre Sample                                                                              | Storage Unit (Samples) |
|----------------------------------|--------------------------------------------------------------------------------------------------------|------------------------|
| Maximum Correlation (Moving Sum) | 1                                                                                                      | 2N                     |
| Match Filter                     | $N = (0) \\$ (if the known pattern is +1 or -1; then, multiplications can be replaced with additions ) | N                      |
| Golay Correlator                 | $log_2(N)\ (0)$ (if Golay sequence is +1 or -1; then, multiplications can be replaced with additions ) | N                      |

( N is the length of preamble or repeatd signal)

Moreover, three methods can be used for the carrier synchronization, the maximum correlation (a delay correlation method) [43], the pilot-based method [6], and the decision direction method [64]. The discussions of three methods are followed:

- Maximum correlation method reuses the hardware of the symbol boundary detection. When the maximum value of the correlation output is found, the phase is an estimation of the CFO. The required information can use preamble or guard interval (cyclic prefix or postfix). Both OFDM and SC system can adopt this method by using preamble. Furthermore, a SC system with GI can also adopt this method.
- The pilot-based method uses the known pilots inserted in frequency domain and calculates the difference of phase between the successive symbols. However, the SC system does not have plots in the frequency domain; hence, it cannot adopt this method.
- Decision direction method uses the sliced data and calculates the difference of the phase with the un-sliced data. This method will have large performance degradation when decision errors happen. This method does not require any known data; hence, both a SC receiver and an OFDM receiver can use this

method.

TABLE 5-4 Comparison of CFO Estimations

|                     | Required Information      | System     |
|---------------------|---------------------------|------------|
| Maximum Correlation | Repeated signals          | SC<br>OFDM |
| Pilot-based         | Plots in frequency domain | OFDM       |
| Decision Direction  | None                      | SC<br>OFDM |

For the SCO estimation, the traditional methods in wire line systems are the edge detection method, such as the early-late gate method, Gardner's method [65] and Mueller-Muller's method [66]. However, these methods are not suitable for wireless system because multi-path channels distort the edge of the transmitted signal. The most usable method in wireless channel is based on the phase rotation due to SCO in the frequency domain. The effect of SCO in the time domain is not obvious in short time period. For a wireless SC system, it is difficult to estimate SCO in the time domain. Fortunately, a typical transmitter and receiver usually use the same reference clock source so it is reasonable to assume that CFO and SCO have the same ratio [29]. Hence, the estimation of CFO can also be an estimation of SCO.

For the channel estimation, due to the cyclic prefix which forms a circular convolution, an OFDM receiver can use a one tap equalizer in the frequency domain. The channel estimation usually uses interpolation based algorithms which use pilots inserted in frequency domain to estimate channel frequency response (CFR). In 802.15.3c standard, the interval between known plots is 22 sub-carrier spaces. Thus, it is hard to do interpolation in an environment with long channel dispersion.

802.15.3c standard provides a preamble in time domain called Pilot Channel Estimation Sequence (PCES). Hence, PCES can be transferred into the frequency domain and estimation CFR. Moreover, a LMS tracking the CFR can be adopted to improve the performance of the estimation [67] [68].

A data block of SC mode in 802.15.3c standard is composed of 448 data and 64 known GI. The data structure is also called single carrier block transmission (STBC) and a SC-FDE [57] can be adopted. The known GI is a Golay Sequence; hence, a time domain estimation method [69] can be adopted by using the property of Golay Sequence. After getting the channel impulse response (CIR), the inverse of CIR is required for a time equalizer or a FFT which transfers CIR into CFR is required for the frequency equalizer.

In summary, the proposed architecture of the OFDM/SC dual mode receiver for 802.15.3c standard is as described below: 1896

- The symbol/Preamble detection uses the auto-correlation and CFO is also estimated by using the auto-correlation [70].
- Assume the ADC and Mixer use the same reference clock source [29]; hence, the estimation of CFO is also an estimation of SCO.
- The equalizer of SC mode uses SC-FDE and shares the FFT unit with the OFDM mode. Besides, the additional tracking loop (LMS estimation) is adopted to improve the system performance [68].

Finally, the block diagram of the proposed dual mode receiver is shown in Fig. 5-1.



Fig. 5-1 OFDM/SC dual modes receiver for 802.15.3c

### 5.3 CFO/SCO Synchronization

The proposed receiver assumes that the mixer and the ADC use the same clock source. Hence, the SCO estimation can use the result of CFO estimation and the translation between them is shown in Eqn. (5-1) [29]:

$$SCO = \frac{f_s \times CFO_{Estimated}}{f_c}$$
 (5-1)

Where  $f_s$  is the sampling frequency and  $f_c$  is the carrier frequency. The CFO/SCO synchronization adopts two stages method. First, when estimating the symbol boundary, CFO is also estimated. Second, a tracking loop is adopted to compensate the remainder CFO after the first estimation. The estimation of the remainder CFO uses Eqn. (5-2).

$$CFO_{remainder} \propto \angle (\sum_{k=0}^{N_{Pilots}-1} Pilots(k) \times received \_Pilots(k)^*)$$
 (5-2)

where  $N_{Pilots}$  is the number of pilot. Fig. 5-2 shows the tracking curve of CFO at OFDM mode, SNR = 15 dB and 50ppm CFO (equal to  $7.14 \times 10^{-3}$  normalized by sampling frequency).



Fig. 5-2 CFO tracking curve @ OFDM mode, SNR = 15dB and 50ppm CFO

### 5.4 Behavior Simulation of OFDM (HSI) mode

The behavior simulation result is shown in Fig. 5-4. The environment is OFDM mode, QPSK modulation, 50ppm CFO and 50ppm SCO. The channel coding adopts a 1/2 code rate (336,672) LDPC [23]. The BER is measured at steady state (after 96 OFDM symbols). The adopted channel model [18] is shown in Fig. 5-3 and its RMS delay is about 3.2ns. The simulation shows the receiver can achieve the required BER,  $10^{-6}$  at the receiver SNR equal to 11 dB.





Fig. 5-4 Receiver Performance@ OFDM mode, QPSK, 50ppm CFO, 50ppm SCO and Code rate =1/2 (BER is calculated at steady state (after 96 OFDM symbols))

### **5.5 Simulation of SCO Compensation**

There are three ways to compensate SCO, the analog method, the mixed mode method, the digital method [8][9]. In a high data rate system, the digital methods become very difficult to implement. Therefore, for the digital method; in general, the parallel process is a choice to achieve in a high data rate system. In 802.15.3c, the sample rate (chip rate) is 2640MHz in HSI mode. Two methods are used for digital SCO compensation, one is the time interpolation method [8][9] and the other one is the frequency rotation method [15][16]. The time interpolation uses an interpolator to generate the required data. The detail mathematical description of the time interpoltion is presented by [8][9]. Cubic Largrenge filter [9], B-Spine filter [71] or high order filters [72][73] are usually used for different conditions or requirements. The frequency rotation method compensates SCO in the frequency domain. This method requires a FFT unit to transform time domain signal into frequency domain; hence, this method is well suit for an OFDM system. The main effects of SCO in the frequency domain is composed of a phase rotation term and a inter carrier interference (ICI) term[15][16]. If SCO of the system specification is small, the ICI term can be ignored. Therefore, Only the effect of the phase rotation is considered. The phase rotation in frequency domaion is proportional to the index of the subcarrier and SCO[15][16]. Hence, if SCO is estimated. A derotator in the frequency domain can compensate SCO.



Fig. 5-5 Simulation model of SCO compensation

To simulate these two SCO compensation methods, a simulation model is built as shown in Fig. 5-5. For OFDM mode, the length of an OFDM symbol is 512, the guard interval (GI) is 64, and quadrature phase-shift keying (QPSK) modulation is adopted. For SC mode, a data block is composed of 448 data and 64 known GI and  $\pi$  /2 QPSK is adopted. In the transmitter, the expander inserts zeros for interpolation. Then, the simulation model uses Raised Cosine (RC) filter as an interpolation filter to remove the image band. Then, the resample block inserts SCO. To avoid the other effect, this simulation model only considers AWGN channel and the carrier frequency offset (CFO) is set to zero. After adding AWGN, the decimation filter and the decimator is used to control the over-sampling ratio form 1X to 2X. Besides, a low pass (LP) filter removes the out band noise. In the receiver, the simulation model assumes that SCO can be ideally estimated. Finally, the time interpolation method or the frequency rotation method compensates SCO and an equalizer (EQ) compensates the remainder phase



Fig. 5-6 Simulation results of using different methods, filters, and sampling rates for OFDM mode



Fig. 5-7 Simulation results of different sampling rates and methods for SC mode

A simulation result of the OFDM mode is shown in Fig. 5-6. This simulation shows different combination of over-sampling rates, interpolation filters and SCO compensation methods. For different over-sampling rates, the performance of the 2X oversampling is better than that of the 1X sampling at high SNR but the same at targeted input SNR. For different interpolation filters in 1X sampling, the Liu's fifteen orders filter [72] has better performance than other and the performance of the Cubic B-Spline filter is better than that of the Cubic Lagrange filter. However, interpolation filters have comparable performance in low input SNR range. For different compensation methods in 1X sampling, the frequency rotation method has better performance than time interpolation in low SCO (50ppm); however, the performance of the frequency rotation method degrades a lot in high SCO (500ppm). That is because the ICI in high SCO (500ppm) becomes serious. On the other hand, the time interpolation method has comparable performance in both low and high SCO. The maximum SCO is 50 ppm in 802.15.3c [3]

Fig. 5-7 is a simulation result for SC mode. In SC mode, the 1X sampling rate (1760 MHz) of the time interpolation method has poor performance; on the contrary, the 1.5X oversampling rate (2640 MHz, the same as OFDM 1X sampling rate) has better performance. Furthermore, if the receiver uses SC-frequency domain equalizer (SC-FDE) [57] in SC mode, the frequency rotation method could be adopted. However, the simulation shows it has poor performance.

In summary, a high order filter usually has better performance but the complexity is high. The frequency rotation method is suit for low SCO (50ppm) and the time interpolation method has fair performance in OFDM mode. However, the frequency

rotation method has poor performance in SC mode. Therefore, this work adopts the time interpolation method and uses the 1X sampling rate for OFDM mode and the 1.5X oversampling for SC mode. Hence, only a 2640 MHz clock is required.

To decide the required bit number of ADC, simulation results are shown in Fig. 5-8 and Fig. 5-9. Simulation shows when the bit number of ADC is equal to 10 bits, there is no performance loss compared with that without quantization. Besides, the performance of the frequency rotation method decreases more than that of the time interpolation method when the bits number of ADC decreasing. Moreover, the performance loss of 64QAM modulation is larger than that of QASK modulation.



Fig. 5-8 Simulation results of different ADC bits (QPSK, OFDM mode), 'T' means time interpolation method, 'F' means frequency rotation method, 'No' means no quantization, '10\_8' means that ADC is 10 bits and fractional part is 8 bits and others are by analogy.



Fig. 5-9 Simulation results of different ADC bits (64QAM, OFDM mode), T° means time interpolation method, 'F' means frequency rotation method, 'No' means no quantization, ' $10\_8$ ' means that ADC is 10 bits and fractional part is 8 bits and others are by analogy.

1896

### 5.6 Proposed Parallelized SCO Compensator

### **5.6.1 Time Interpolation**

The time interpolation method includes three basic components, an elastic buffer, a numerical control oscillator (NCO), and an interpolator. The mismatch of the clock frequency between the receiver and the transmitter causes the receiver get more or less samples. In general, an elastic buffer is adopted 'skip' or 'duplicate' samples; hence, the sampling rate can be equalized. In 'skip' situation, the elastic buffer drops a sample; instead, the elastic buffer repeats or inserts a sample in 'duplicate' situation. Besides, a serial version to implement 'skip' and 'duplication' operations using multiplexers (MUXs) has been proposed by [74]. This method contains a delay line and MUXs. The MUXs control the output of the delay line and form the 'skip' and

'duplication operation. The 'skip' and 'duplicate' situation are controlled by the NCO. The NCO basically is an accumulator. NCO uses the estimated SCO (frequency offset) to generate the corresponding phase. When the accumulated phase is greater than 1 sampling time (positive SCO) or smaller than 0 sampling time (negative SCO), the elastic buffer needs to perform 'skip' or 'duplicate'. Then, the interpolator uses the fractional phase generated by NCO to synchronize the frequency offset. However, there could be a remainder phase offset between the transmitter and the receiver. The EQ can compensate this phase offset.

The critical delay module of the time interpolation method is the interpolator. TABLE 5-5 is synthesis results of a Cubic-Spline interpolator. When using 65nm process, the required least parallelism to achieve 2.64Gs is 2. Hence, parallelism is necessary to meet the data rate. Besides, when considering the additional delay after physical layout, the required parallelism will increase. This work proposed a parallel architecture of SCO compensator and this architecture achieve any 'P' times parallel. This compensator will be integrated in a 60GHz baseband receiver. To match up with other modules, 'P' in the design example is set to 8.

TABLE 5-5 Synthesis results of an Cubic-Spline interpolator of different process

| Process              | 180nm    | 90nm      | 65nm       |
|----------------------|----------|-----------|------------|
| Maximum              | ≒285 MHz | ≒ 666 MHz | ≒ 1429 MHz |
| Clock Rate           | (3.5ns)  | (1.5ns)   | (0.7ns)    |
| Equivalent Gate      | ≒ 7.9 K  | ≒8.0K     | ≒5.7K      |
| Required Parallelism | ≥10      | >4        | >2         |
| to Achieve 2.64GS/s  | ≥10      | ≥4        | ≥∠         |

### **5.6.2** Parallel Elastic Buffer

When the input data are parallel, the 'skip' and 'duplication' will cause that the access from interpolators are irregular. Fig. 5-10 shows this phenomenon. Fig. 5-10 (a) is an example of normal access. The access range and the star point of interpolators are regular. However, when the 'skip' and 'duplication' happen as shown in Fig. 5-10 (b) and Fig. 5-10 (c), that become irregular. Hence, this work proposes a parallel elastic buffer to solve this problem.



Fig. 5-10 Illustration of irregular access in parallel (a) normal access (b) access of successive duplication (c) access of successive skip

In the beginning, the maximum access range is required to define. The maximum access range of interpolators is (P+O+S), where 'O' is the order of an interpolator and 'S' means the elastic buffer can tolerate S 'skips' in every 'P' parallel output samples (i.e. when the 'skip' happens, the range of access will increase). The proposed 'P' times parallel elastic buffer is composed of several FIFOs. The number of FIFOs (M) must be greater than or equal to the range, (P+O+S). To reduce the complexity of input control, 'M' can be set to times of 'P'. Thus, a simple de-multiplexer (DMUX) can be used for input control.

An example of the proposed parallel elastic buffer is shown in Fig. 5-11. In this example, the number of the parallelism (*P*) is 8 and the width of the parallel elastic buffer (*M*) is 16. The depth of the FIFOs is set to 4 and will explain the setting at the next subsection. The order of an interpolator (*O*) is set to 3. The parallel elastic is divided into 2 groups and a 1 to 2 DMUX is adopted to continue to store input samples into buffer. Furthermore, the illustrations of 'duplication' and 'skip' are shown. In the case of 'duplication', an interpolating set will access twice. In the case of 'skip', an interpolating set will be disregarded and access rage increase. The routing network is implemented by MUXs array and is used to route data to the corresponded interpolator. Then, the parallel elastic buffer drops the unnecessary data among the head and the tail. By using the proposed parallel elastic buffer, the data access is the same as the conventional serial version. Thus, there is no performance degradation due to the parallelism.



Fig. 5-11 Illustration of parallel elastic buffer access

### **5.6.3 Parallel NCO**

The proposed parallel NCO is shown in Fig. 5-12. The architecture generates the control signals of the parallel elastic buffer shown in Fig. 5-11, the control signal of the MUXs array shown in Fig. 5-13 and the fractional phase required for interpolators. In the Fig. 5-12, the multipliers which generate 1X~8X phases can be replaced by the combination of adders and shifters to reduce the hardware complexity. An addition phase accumulator is used for phase prediction and will explain in the next subsection. Then, the generated phases are divided into integer parts and fractional parts. The integer parts mean overflows or underflows (i.e. 'skip' or 'duplication' operation) of the NCO. The 'MUX\_id' is used to control the elastic buffer. The 'MUX\_id' uses 2's complement system. When the overflow of the 'MUX\_id' happens, the value of 'MUX\_id' will circularly return. This behavior is like the folding stored data in the parallel elastic buffer. Furthermore, the recorded 'head' and 'tail' decide the required sequence range for interpolators. Finally, the fractional parts are the fractional phases of the each interpolator.



Fig. 5-12 Architecture of parallelized NCO and phase prediction

### **5.6.4 Proposed Parallelized Time Interpolation Architecture**

Fig. 5-13 shows the architecture of the 8X parallel (P=8) time interpolation. The parallel time interpolation uses cubic B-Spline [71] (O=3) and 1 'skip' due to SCO can be tolerated in OFDM mode (S=1). 4 'skips' caused by the 1.5X oversampling and 1 'skip' due to SCO can be tolerated in SC mode (S=5). Hence, the maximum access range of interpolators is twelve (8+3+1) in OFDM mode and sixteen (8+3+5) in SC mode. Therefore, the width of the parallel elastic buffer is set to sixteen and it is divided into two groups. The input samples are stored into these two groups in sequence by using a 1 to 2 DMUX as shown in Fig. 5-13.



Fig. 5-13 Proposed parallelized time interpolation

Due to the operations of 'duplications' and 'skips', the parallel elastic buffer has a problem of overflow and underflow of the capacity. In 'P' times parallelism, the underflow means the remainder data in elastic buffer is smaller than (P+O+S) and the overflow means the remainder capacity is smaller than 'P'. When the underflow

happens, A mechanism disables the request of interpolators can solve the problem of underflow. There are two way to solve the problem of overflow. The first way is to increase the capacity of the elastic buffer and prevents overflow in a frame or a packet but the overhead is large. The second way is to reset the buffer during unwanted data period such as guard interval or an unused preamble. In this case, the phase accumulators shown in Fig. 5-12 are required to keep the phase continuality after the reset period. Hence, an additional phase accumulator shown at bottom of the Fig. 5-12 predicts the phase after a reset period ( ${}^{\prime}R_{p}{}^{\prime}$ ) and the phase accumulators stop working during the reset period to lower the power consumption.

Eqn. (5-3) is the relationship of the packet length and the buffer capacity:

$$Packet\_lengh_{max} = \begin{bmatrix} FIFO_{depth} \times M - prestore - P \\ SCO \end{bmatrix}$$
 (5-3)

In the design example, the depth of the FIFOs is set to 4 and 'M' is sixteen. Therefore, the total capacity is 64. In the initial, the elastic buffer will pre-store 24 data to prevent underflow of the buffer. According to Eqn. (5-3), this design can process about 640K samples without overflow in 50ppm SCO. When the length of a packet exceeds 640K, the mechanism of the reset buffer can use an unused period in the PCES (Pilot Channel Estimation Sequence) in 802.15.3c standard and prevents the overflow of the buffer.

### **5.6.5** Parallelized Frequency Rotation

The frequency rotation composes of three components and they are a  $NCO_f$  in time domain, a phase multiplier which generates phases proportional to the subcarrier

index, and a derotator. The NCO<sub>f</sub> of the frequency rotation method is different from that of the time interpolation method. The NCO of the time interpolation generates the relative phases of the samples. In contrast, the NCO<sub>f</sub> of the frequency rotation assumes the phase during an OFDM symbol is the same and generates the relative phases of the each OFDM symbol. The overflow or underflow of the NCO<sub>f</sub> means the FFT window needs to move forward or backward. The Moving of the FFT window reuses the proposed paralleled elastic buffer by changing the position of the head. The phase multiplier generates  $0 \sim (K-1)$  times the phase given by the NCO<sub>f</sub> and where K is the length of the FFT.

The operation of a derotator requires a sinusoidal values generation unit and a complex multiplier. This work uses the coordinate rotational digital computer (CORDIC) algorithm [31] based derotator [46] The CORDIC can generate trigonometric function. In this phase rotation application, a CORDIC unit can replace both the sinusoidal values generation and the complex multiplier. However, a basic CORDC unit has amplification factor, about 1.6. This factor does not cause serious problem because an EQ can equalize this gain. In this work, a basic CORDIC algorithm is adopted. To lower the requirement of the clock rate, this work uses the unfolding architecture [35] [36] and adds pipeline stages to increase the operating frequency.

### **5.6.6 Implementation Results of SCO Compensation**

Fig. 5-14 shows the block diagram of the hardware implementation. The blocks in the gray reign uses the synthesizable Veriolg hardware description language (HDL) and includes both the time interpolation method and the frequency rotation method.

The MUXs is used to change these two methods. The time interpolation method adopts the cubic B-Spline filter [71] which has better performance and lower complexity. The B-Spline filter uses the Farrow [47] structure which can decrease the usage of multipliers. In 8X parallelism, the time interpolation method totally requires sixteen interpolators for the real part and image part. The frequency rotation method uses 8 unrolling CORDICs. The FFT unit is a fixed pointed Verilog behavior model. To synchronize the latency of the FFT unit, a phase FIFO is added. The EQ is a floating point model and it is the same as that shown in Fig. 5-5.

The procedure of the simulation follows the Fig. 5-14. The transmitter shown in Fig. 5-5 generates the input samples and they are quantized into 10 bits. The quantized samples are fed into the Verilog model. Then, the output of the Verilog simulation is dumped and fed into the EQ to test the performance. Fig. 5-15 shows the performance results of the hardware implementation. In the time interpolation, the performance of the hardware is almost the same as that of the floating point. On the other hand, the performance of the hardware in the frequency rotation has about 1dB loss in high SNR reign but has almost no lose in design range( 9~30dB SNR).



Fig. 5-14 Block diagram the hardware implementation



Fig. 5-15 Performance Comparison

The synthesis results are shown in TABLE 5-6. The maximum operating frequency is 400 MHz in gate-level simulation and achieves 3.2GS/s. This throughput rate can meet the requirement of 802.15.3c, 2.64GS/s. The interpolators (B-spline filter) occupy the largest part in the time interpolation method; however, the time interpolation method has better performance in high SCO (500ppm) in OFDM mode. In contrast, the frequency rotation method has lower complexity better performance in low SCO (50ppm). In short, for the SCO specification (50ppm) of 802.15.3c, the frequency rotation method has lower complexity and better performance at OFDM mode. However, considering the receiver of dual mode, the time interpolation is suitable for both the SC and the OFDM mode.

TABLE 5-6 Synthesis results

| Process                  |                  | 90nm        |
|--------------------------|------------------|-------------|
| Max. Operating Frequency |                  | 400MHz      |
| Time Interpolation       | B-Spline filters | 119K (58%)  |
| Freq. Rotation           | CORDICs          | 38K (19%)   |
| Shared                   | Elastic Buffers  | 15K (7%)    |
|                          | Others           | 32K (16%)   |
| Total Gate Count         |                  | 204K (100%) |

### **5.7 Summary**

In this chapter, an architecture of OFDM/SC dual mode baseband receiver is presented. This architecture includes a preamble/symbol detection, CFO/SCO synchronization loops, an OFDM/SC dual mode frequency domain equalizer. Moreover, the behavior performance of OFDM mode is shown. When the received SNR is equal to 11dB, the BER can achieve 10<sup>-6</sup>. Besides, an overview of the time and frequency compensation method is presented. Then, a parallel architecture is proposed to speed up the SCO compensation. This parallel architecture can achieve 'P' times parallelism and there is no performance degradation due to the parallelism. Finally, a design example of 8X parallelism is implemented. It operates at 400 MHz clock rate and achieves 3.2 GS/s .The gate counts of the time interpolation method are about 166K and that of the frequency rotation method are about 85K as shown in Fig. 5-16.



Fig. 5-16 Pie chart of synthesis results



# Chapter 6

## **Conclusion**

In this thesis, an overview for effects of the frequency offset, channel models, and the link budget is first introduced. Several commonly used data-paths for a baseband receiver are also discussed. In addition, two baseband receivers are presented. One is OFDM receiver for DVB-T/H and the other one is an OFDM/SC dual mode receiver for 802.15.3c/802.11ad.

The proposed OFDM baseband receiver for DVB-T/H integrates a Mode/GI/Symbol detection, a multimode FFT, a channel estimation, a carrier frequency synchronization loop, a sampling clock synchronization loop. A ICFO/RCFO/SCO memory sharing architecture is proposed. In this architecture, ICFO estimation and RCFO/SCO estimation use the same memory block to reduce the hardware complexity. Besides, a differential encoding method for recoding the pilot location is adopted to reduce the capacity of the storage unit. The synchronization loops can compensate 50 sub-carrier spacing CFO and 200 ppm SCO. The equivalent gate count of the proposed DVB-T/H receiver is about 810K gates including 102.8 KB memory. This receiver is fabricated in a 0.18μm CMOS technology and its core size is 12.96 mm².

The architecture of the OFDM/SC dual mode baseband receiver for 802.15.3c/802.11ad includes a boundary detection, CFO/SCO synchronization loops, an OFDM/SC dual mode frequency domain equalizer. The boundary detection uses a

correlation-based method [70] and this method also estimates CFO. The baseband receiver assumes that the mixer and the ADC have the same reference clock. Hence, CFO and SCO use the same estimation. The OFDM/SC dual mode frequency domain equalizer uses LS-LMS method [68] to improve performance. When the received SNR is equal to 11dB, the coded BER performance of OFDM mode can achieve 10<sup>-6</sup> in floating behavior simulation. Besides, a parallel architecture of the SCO compensation is proposed to speed up the operation. This architecture can achieve any 'P' times parallel and solve the irregular access from interpolators in parallel. In addition, a design example of 8X parallelism is implemented. It operates at 400 MHz clock rate and achieves 3.2 GS/s with about 204 K equivalent gate counts by using 90nm CMOS process and meets the data rate requirement of 802.15.3c/802.11ad.



# Chapter 7

## **Future Work**

The new application of DVB-T/H is to have higher mobility for usage during transportation. Hence, the Doppler Effect must be considered. The mobile environment makes the channel vary with time. The varying channel inferences the estimation of the frequency offset and channel estimation. Besides, more and more wireless broadband applications are operated in this environment. Therefore, estimating algorithms which can overcome the varying channel is an issue.

The SC mode of 802.15.3c/802.11ad inserts a period of known GI and the equalizer can use SC-FDE. However, SC-FDE requires two extra FFTs. By considering the characteristic of the 60GHz channel, the reflected paths have large path loss. Hence, the 60GHz channel has large probability to be a LOS channel which has one strong path and 2~3 small paths. A simple time domain equalizer (TEQ) which considers LOS channel may be adopted to reduce the hardware cost.

# Reference

- [1] ETSI, "Digital Video Broadcasting: Framing Structure, Channel Coding and Modulation for Digital Terrestrial Television, *European Telecommunication Standard* EN 300 744 V1.5, Nov. 2004.
- [2] ETSI, "Transmission System for Handheld Terminals (DVB-H)," *European Telecommunication Standard* EN 302 304 V1.1.1 Nov. 2004.
- [3] IEEE Std 802.15.3c-2009, "IEEE Standard for Information technology Telecommunications and information exchange between systems Local and metropolitan area networks Specific requirements. Part 15.3: Wireless Medium Access Control (MAC) and Physical Layer (PHY) Specifications for High Rate Wireless Personal Area Networks (WPANs) Amendment 2: Millimeter-wave-based Alternative Physical Layer Extension," IEEE, Oct., 2009.
- [4] IEEE Std. P802.11 TGad D0.1, "PHY/MAC Complete Proposal Specification," IEEE, May, 2010.
- [5] M. Speth, S. A. Fechtel, G. Fock and H. Meyr, "Optimum receiver design for wireless broadband systems using OFDM Part I," *IEEE Trans. Commun.*, vol. 47, no. 11, pp. 1668-1677, Nov. 1999.
- [6] M. Speth, S. Fechtel, G. Fock and H. Meyr, "Optimum receiver design for OFDM-Based broadband transmission Part II: A case study," *IEEE Trans. Commun.*, vol. 49, no. 4, pp. 571–578, Apr. 2001.
- [7] S. A. Fechtel, "OFDM carrier and sampling frequency synchronization and its performance on stationary and mobile channels," *IEEE Trans. Consumer Electronics*, vol. 46, no. 3, pp.438–441, Aug. 2000.
- [8] F. M. Gardner, "Interpolation in digital modems-part I: Fundamentals," *IEEE Trans. Commun.*, vol. 41, no. 3, pp. 501–507, Mar. 1993.
- [9] L. Erup, F. M. Gardner and R.A. Harris, "Interpolation in digital modems-part II: Implementation and performance," *IEEE Trans. Commun.*, vol. 41, no. 6, pp. 998–1008, June 1993.
- [10] T. C. Wei, W. C Liu and S. J. Jou, "A jointed mode detection and symbol detection scheme for DVB-T," *IEEE Trans. Consumer Electronics*, vol. 54, no. 2, pp.336–341, May 2008.
- [11] W.C. Liu, "Design of symbol boundary detection and scattered pilot synchronization for DVB-T/H," *Master Thesis*, Dept. of EE, National Chiao Tung University, Hsinchu, Taiwan, Sep. 2006.
- [12] W. C. Liu, T.C. Wei and S. J. Jou, "Blind Mode/GI detection and coarse symbol synchronization for DVB-T/H," in *Proc. ISCAS 2007*, New Orleans, May 2007,

- pp. 2092–2095.
- [13] T. C. Wei, W. C. Liu, C. Y. Tseng, and S. J. Jou, "Low Complexity Synchronization Design of an OFDM Receiver for DVB-T/H," *IEEE Trans. Consumer Electronics*, vol. 55, no. 2, pp.408–413, May 2009.
- [14] W.C. Liu T.C. Wei, and S. J. Jou, "Two-stage scattered pilot synchronization with channel estimation scattered pilots pre-filling for DVB-T/H," *in Proc. IEEE VLSI-DAT 2007*, Apr. 2007, pp. 1–4.
- [15] T. Pollet, and M. Peeters, "Synchronization with DMT modulation," *IEEE Communications Magazine*, vol.37, no.4, pp.80–86, Apr. 1999.
- [16] T. Pollet, P. Spruyt, and M. Moeneclaey, "The BER performance of OFDM systems using non-synchronized sampling," In *Proc. IEEE GLOBECOM* 1994, vol.1, Dec. 1994, pp.253–257.
- [17] P. Y. Tsai, "Design and implementation of a MC-CDMA baseband transceiver for Next-Generation Mobile Communication system," *PHD Thesis*, Dept. of EE, National Taiwan University, Taipei, Taiwan, June. 2005.
- [18] channel-model-matlab-code-release [Online]. Available: http://www.ieee802.org/15/pub/TG3c\_contributions.html
- [19] Y. S. Cho, J. Kim, and W. Y. Yang, MIMO-OFDM Wireless Communications with MATLAB, John Wiley and Sons, 2010.
- [20] B. Sklar, *Digital Communications: Fundamentals and Applications*. Prentice-Hall, N.J. 2004.
- [21] J. G. Andrews, A. Ghosh, and R. Muhamed, "Fundamentals of WiMAX: Understanding Broadband Wireless Networking," USA, Prentice Hall, 2007
- [22] IEEE Std. 802.16e-2005 and IEEE Std. 802.16-2004/Cor 1-2005.
- [23] S. Y. Hung, "Design and Implementation of Multiple Code-rates LDPC Decoder and Encoder for IEEE 802.15.3c," *Master Thesis*, Dept. of EE, National Chiao Tung University, Hsinchu, Taiwan, Sep. 2010.
- [24] Jri Lee, Y. L. Huang, Y. T. Chen, H. C. Lu, and C. J. Chang, "A low-power fully integrated 60GHz transceiver system with OOK modulation and on-board antenna assembly," In Proc. ISSCC 2009, pp.316–317, 2009.
- [25] C. C. Chang, "Design and implementation of a baseband receiver for VDSL system," *Master Thesis*, Dept. of EE, National Taiwan University, Taipei, Taiwan, June. 2002.
- [26] C. Y. Tseng, "Design and Implementation of DVB-T/H Synchronization Loop and Low Power Techniques," *Master Thesis*, Dept. of EE, National Central University, Taoyuan, Taiwan, Sep. 2006.

- [27] C. Y. Tseng, T.C. Wei, W.C. Liu and S. J. Jou, "Low power and power aware design for DVB-T/H baseband inner receiver," *in Proc. IEEE VLSI-DAT 2007*, Apr. 2007, pp. 1–4.
- [28] A.Wenzler and E. Luder, "New structures for complex multipliers and their noise analysis," in *Proc. ISCAS 1995*, vol. 2, May 1995, pp. 1432–1435.
- [29] P. Y. Tsai and T. D. Chiueh, "A Low-Power Multicarrier-CDMA Downlink Baseband Receiver for Future Cellular Communication Systems," *IEEE Trans. on Circuits and Systems I*, vol.54, no.10, pp.2229–2239, Oct. 2007.
- [30] J. S. Wu, M. L. Liou, H. P. Ma and T. D. Chiueh, "A 2.6-V, 44-MHz all-digital QPSK direct-sequence spread-spectrum transceiver IC," *IEEE JSSC*, vol. 32, no. 10, Oct 1997, pp.1499–1510.
- [31] J. E. Volder, "The CORDIC trigonometric computing technique," *IRE Trans. Electron. Computers*, vol. C-8, pp. 330–334, Sept. 1959.
- [32] C. S. Wu and A. Y. Wu "Modified Vector Rotational CORDIC (MVR-CORDIC) Algorithm and Architecture," *IEEE Trans. Circuits and Systems—II Analog and Digital Signal Processing*, vol. 48, no. 6, pp. 548–561, June 2001.
- [33] C. S. Wu, "A Unified Design Framework of High-Speed/Low-Cost Rotation Engines: An Angle Quantization Respective," *PHD Thesis*, Dept. of EE, National Central University, Taoyuan, Taiwan, July 2002.
- [34] R. Andraka, "A survey of CORDIC algorithms for FPGA based computers," in *Proc. ACM Symp. FPFA*, Feb. 1998, pp.191–200.
- [35] S. Wang, V. Piuri, and Jr. E.E. Swartzlander, "A unified view of CORDIC processor design," in *Proc. IEEE MWSCAS 1996*, Aug. 1996, pp. 852–855.
- [36] L. Koskinen, M. Kosunen, S. Lindfors and K. Halonen, "Truncation DC-error elimination in FIR filters," in Proc. *IEEE Midwest Symposium on Circuits and Systems* 2000, vol.3, pp.1292-1295.
- [37] Synopsys DesignWare IP [Online]. Available: <a href="http://www.synopsys.com/ip/Pages/default.aspx">http://www.synopsys.com/ip/Pages/default.aspx</a>
- [38] S. He, and M. Torkelson, "Designing pipeline FFT processor for OFDM (de)modulation," in *Proc. ISSSE 98*, Oct. 1998, pp.257–262.
- [39] T. H. Yu, I W. Lai, and T. D. Chiueh, "Design of a DVB-T/H compliant baseband receiver," in *Proc. IEEE VLSI-DAT 2008*, Apr. 2008, pp.279–282.
- [40] S. Magar, S. Shen, G. Luikuo, M. Fleming, and R. Aguilar, "An application specific DSP chip set for 100 MHz data rates," in *Proc. ICASSP*-88., Apr. 1988, pp.1989–1992.
- [41] Y. W. Lin, H. Y. Liu, and C. Y. Lee, "A 1-GS/s FFT/IFFT processor for UWB applications," *IEEE JSSC*, vol.40, no.8, pp.1726–1735, Aug. 2005.

- [42] S. Chen, W. He, H. Chen and Y. Lee, "Mode detection, synchronization, and channel estimation for DVB-T OFDM receiver," in *Proc. IEEE GLOBECOM* 2003, vol. 5, Dec. 2003, pp.2416–2420.
- [43] J. Terry and J. Heiskala, *OFDM Wireless LANs: A Theoretical and Practical Guide*. USA, Sams, 2002.
- [44] C. K. Kuang, "Timing synchronization for DVB-T system," *Master Thesis*, Dept. of EE, National Chiao Tung University, Hsinchu, Taiwan, Sep. 2004
- [45] T. Z. Wei, S. J. Jou and M. T. Shieu, "Memory reduction ICFO estimation architecture for DVB-T," in *Proc. ISCAS 2006*, Greece, May 2006, pp. 3406–3409.
- [46] Y. Ahn, S. Nahm and W. Sung, "VLSI design of a CORDIC-based derotator," in *Proc. ISCAS 1998*, May 1998, pp. 449–452.
- [47] C. W. Farrow, "A continuously variable digital delay element" in *Proc. ISCAS* 1988, June, 1988, pp. 2641–2645.
- [48] L. Schwoerer, J. Vesma, "Fast scattered pilot synchronization for DVB-T and DVB-H," in *Proc. 8th International OFDM Workshop*, Hamburg, Germany, Sept. 2003.
- [49] L. Schwoerer, "Fast pilot synchronization schemes for DVB-H," in *Proc. 4th International Multi-Conference Wireless and Optical Communications*, Canada, July 2004, pp.420–424.
- [50] T. A. Lin, C. Y. Lee, "Predictive equalizer design for DVB-T system," in *Proc. IEEE ISCAS 2005*, vol.2, May 2005, pp. 940–943.
- [51] L. F. Chen, Y. Chen, L. C. Chien, Y. H. Ma, C. H. Lee, Y. W. Lin, C. C. Lin, H. Yu. Liu, T. Y. Hsu and C. Y. Lee, "A 1.8V 250mW COFDM baseband receiver for DVB-T/H applications," in *Proc. IEEE ISSCC* 2006, Feb. 2006, pp. 1002–1011.
- [52] Syu-Siang Long, "The Blind Frequency-Domain Equalizer in OFDM Communications," *Master Thesis*, Dept. of EE, National Central University, Taoyuan, Taiwan, July. 2005.
- [53] L. Jia, Y. Gao, J. Isoaho, and H. Tenhunen, "A new VLSI oriented FFT algorithm and implementation," in *Proc. of ASIC Conference*, Sept. 1998, pp. 337–341.
- [54] E. H. Wold and A. M. Despain, "Pipeline an parallel-pipeline FFT processors for VLSI implementation," *IEEE Trans. Comput.*, vol. 33, no. 5, pp. 414–426, May 1984.
- [55] T. C. Wei, W. C. Liu, C.Y. Tseng, S.S. Long, S.J. Jou1, and M.T. Shiue "A 28mW OFDM Baseband Receiver Chip for DVB-T/H with All Digital Synchronization," *in Proc. IEEE CICC 2008*, Sep. 2008, pp.351–354.
- [56] L. Caetano, and S. Li, "Benefits of 60 GHz," Sibeam Corp., Nov., 2005.

- [57] D. Falconer, S. L. Ariyavisitakul, A. Benyamin-Seeyar, and B. Eidson, "Frequency Domain Equalization for Single-Carrier Broadband Wireless Systems", *IEEE Communications Magazine*, vol. 40, no. 4, 2002, pp. 58–66.
- [58] M. Lei, C.S. Choi, R. Funada, H. Harada and S. Kato, "Throughput Comparison of Multi-Gbps WPAN (IEEE 802.15.3c) PHY Layer Designs under Non-Linear 60-GHz Power Amplifier," *in Proc. PIMRC* 2007, pp.1–5, Sept. 2007.
- [59] R. C. Daniels and R.W. Heath, "60 GHz Wireless Communications: Emerging Requirements and Design Recommendations," *IEEE Vehicular Technology Magazine*, vol. 2, no. 3, pp. 41–50, Sep., 2007.
- [60] H. Sari, G. Karam, and I. Jeanclaude, "Transmission Technique for Digital Terrestrial TV Broadcasting," *IEEE Communications Magazine*, vol.33, no.2, pp.100–109, Feb. 1995.
- [61] T. Pollet, M. Van Bladel and M. Moeneclaey, "BER Sensitivity of OFDM Systems to Carrier Frequency Offset and Wiener Phase Noise," *IEEE Trans on Commun*, vol. 43, pp. 191–193, 1995.
- [62] T. Pollet, and M. Peeters, "Synchronization with DMT modulation," *IEEE Communications Magazine*, vol.37, no.4, pp.80–86, Apr. 1999.
- [63] B. M. Popovic, "Efficient Golay correlator," *Electronics Letters*, vol.35, no.17, pp.1427–1428, Aug. 1999.
- [64] A. Leclert, and P. Vandamme, "Universal Carrier Recovery Loop for QASK and PSK Signal Sets," *IEEE Trans. on Communications*, vol.31, no.1, pp.130–136, Jan, 1983.
- [65] F. M. Gardner, "A BPSK/QPSK Timing-Error Detector for Sampled Receivers," *IEEE Trans. on Communications*, vol. 34, no. 5, pp.423–429, May 1986.
- [66] K. H. Mueller and M. Muller, "Timing recovery in digital synchronous data receivers," *IEEE Trans. on Communications*, vol.14, pp.516–530, May 1976.
- [67] T.Y. Liu, "Design of Fast Convergent Adaptive Frequency-Domain Equalizer for Single Carrier Indoor Wireless Receiver," *Master Thesis*, Dept. of EE, National Chiao Tung University, Hsinchu, Taiwan, Oct. 2009.
- [68] F. C. Yeh, T. Y. Liu, T.C. Wei, W. C. Liu, and S. J. Jou, "A SC/OFDM Dual Mode Frequency-Domain Equalizer for 60GHz Multi-Gbps Wireless Transmission," *in Proc. IEEE VLSI-DAT 2011*, pp.406–409, Apr. 2011.
- [69] M. Lei, and Y. Huang, "CFR and SNR Estimation Based on Complementary Golay Sequences for Single-Carrier Block Transmission in 60-GHz WPAN," in Proc. *IEEE WCNC* 2009, Apr. 2009, pp.1–5.
- [70] Y. S. Huang, W. C. Liu and S. J. Jou, "Design and Implementation of Synchronization Detection for IEEE 802.15.3c," *in Proc. IEEE VLSI-DAT 2011*, pp.83–86, Apr. 2011.

- [71] Carl de Boor, A Practical Guide to Splines. Springer-Verlag. Springer, NY, 2001.
- [72] G. S. Liu, and C. H. Wei, "A new variable fractional sample delay filter with nonlinear interpolation," *IEEE Trans. Circuit and System II*, vol. 39, no. 3, pp. 123–126, Feb. 1992.
- [73] T. I. Laakso, V. Valimaki, M. Karjalainen and U.K. Laine, "Splitting the unit delay [FIR/all pass filters design]," *IEEE Signal Processing Magazine*, vol. 13, no.1, pp. 30–60, Jan. 1996.
- [74] W. H. Tseng, C. C. Chang, and C. K. Wang, "Digital VLSI OFDM transceiver architecture for wireless SoC design," in *Proc. IEEE ISCAS* 2005, May 2005, pp.5794–9797.



# 作者簡歷

### 參與下列計畫

- 1. 數位電視廣播接收器之數位解調與同步設計及其平台與晶片製作
- 2. 室內無線十億級傳輸率之基頻傳收機與低功率設計技術

### 著作與論文

- [1] <u>Ting Chen Wei</u>, W. C. Liu, C. Y. Tseng, and S. J. Jou, "Low Complexity Synchronization Design of an OFDM Receiver for DVB-T/H," *IEEE Trans. Consumer Electronics*, vol. 55, no. 2, pp.408–413, May 2009.[journal paper]
- [2] <u>Ting Chen Wei</u>, W. C Liu and S. J. Jou, "A jointed mode detection and symbol detection scheme for DVB-T," *IEEE Trans. Consumer Electronics*, vol. 54, no. 2, pp.336–341, May 2008. [journal paper]
- [3] <u>Ting Chen Wei</u>, W. C. Liu, C.Y. Tseng, S.S. Long, S.J. Jou, and M.T. Shiue "A 28mW OFDM Baseband Receiver Chip for DVB-T/H with All Digital Synchronization," *in Proc. IEEE CICC* 2008, Sep. 2008, pp.351–354. [conference paper]
- [4] F. C. Yeh, T. Y. Liu, <u>Ting-Chen Wei</u>, W. C. Liu, and S. J. Jou, "A SC/OFDM Dual Mode Frequency-Domain Equalizer for 60GHz Multi-Gbps Wireless Transmission," *in Proc. IEEE VLSI-DAT 2011*, pp.406–409, Apr. 2011. [conference paper]
- [5] J. N. Lin, H. Y. Chen, <u>Ting Chen Wei</u> and S. J. Jou, "Symbol and Carrier Frequency Offset Synchronization for IEEE802.16e," *in Proc. IEEE ISCAS* 2008, May 2008, pp. 3082–3085. [conference paper]
- [6] W. C. Liu, <u>Ting Chen Wei</u> and S. J. Jou, "Blind Mode/GI detection and coarse symbol synchronization for DVB-T/H," in *Proc. ISCAS 2007*, New Orleans, May 2007, pp. 2092–2095. [conference paper]
- [7] C. Y. Tseng, <u>Ting Chen Wei</u>, W.C. Liu and S. J. Jou, "Low power and power aware design for DVB-T/H baseband inner receiver," *in Proc. IEEE VLSI-DAT* 2007, Apr. 2007, pp. 1–4. [conference paper]
- [8] W.C. Liu, <u>Ting Chen Wei</u>, and S. J. Jou, "Two-stage scattered pilot synchronization with channel estimation scattered pilots pre-filling for

DVB-T/H," in Proc. IEEE VLSI-DAT 2007, Apr. 2007, pp. 1–4. [conference paper]

