# 國立交通大學

電子工程學系 電子研究所碩士班

## 碩士論文

# 無線都會網路之下行邊界偵測和載波頻 率飄移同步設計

Design of Symbol Boundary Detection and Carrier Frequency Offset Synchronization for WMAN Downlink

### 研究生:林俊男

指導教授:周世傑 博士

## 中華民國九十六年十月

## 無線都會網路之下行邊界偵測和載波頻率飄移 同步設計

Design of Symbol Boundary Detection and Carrier Frequency

Offset Synchronization for WMAN Downlink

研究生:林俊男

Student : Juan-Nan Lin

指導教授:周世傑 博士

Advisor : Dr. Shyh-Jye Jou

## 國立交通大學 電子工程學系 電子研究所碩士班 碩士論文

A Thesis Submitted to Department of Electronics Engineering & Institute of Electronics College of Electrical and Computer Engineering National Chiao Tung University in Partial Fulfillment of the Requirements for the Degree of Master of Science

In

Electronics Engineering October 2007 Hsinchu, Taiwan, Republic of China

## 中華民國九十六年十月

## 無線都會網路之下行邊界偵測和載波頻率飄移 同步設計

研究生:林俊男

指導教授:周世傑 博士

#### 國立交通大學

電子工程學系 電子研究所碩士班

### 摘要

IEEE802.16e 標準是為了增加 IEEE802.16-2004 移動性而提出的修正規格。 正交多頻多工存取(OFDMA)技術被使用來支援多重存取的目的。在此論文中, 我們將專注於討論運用在時速高達 120 公里多重天線系統下的同步問題。以相關 性(correlation)為基礎的演算法利用前置碼(preamble)的特性被應用在符號邊界及 載波頻率飄移的同步估測。透過量化(quantization)的方法,乘法器的數量可以被 大幅度減少成只使用加法器。此外,我們提出乒乓(ping-pong)演算法來增加整數 載波頻率同步估計的準確性。使用龍捲風式的記憶體存取法,節省了 28%的記憶 體使用。而修正有號數的計算過程成無號數的運算,則節省了 28%的面積及 22% 的功率消耗。設計結果顯示整體相較於原架構將達到節省 56%面積及 64%的功 率消耗。整個同步電路一共使用 82017 個邏輯開數量(gate count)及 12.5mW 的功 率。

## Design of Symbol Boundary Detection and Carrier Frequency **Offset Synchronization for WMAN Downlink**

Student : Juan-Nan Lin Advisor : Dr. Shyh-Jye Jou

Department of Electronics Engineering & Institute of Electronics National Chiao Tung University

### Abstract

IEEE 802.16e standard is an amendment to IEEE 802.16-2004 for supporting mobility of WMAN. For the purpose of multiple access, Orthogonal Frequency Division Multiple Access (OFDMA) modulation scheme, is adopted. In this thesis, we focus on the synchronization problems of WMAN system which supports high mobility up to 120 km/hr and multiple transmitting antennas environment. The correlation based algorithms which use the characteristic of preamble are used for symbol timing synchronization and carrier frequency synchronization. By using quantization scheme, the use of multipliers can be substantially reduced and simplified to only adders. Moreover, a ping-pong algorithm for integer carrier frequency offset synchronization is proposed to increase the accuracy of estimation. By using the twister memory access operation, 26% memory cost is saved. Furthermore, a modified architecture of calculation process from signed to unsigned reduces 28% area and 22% power cost. Design results show that total 56 % area reduction and 64% power saving are achieved than the original architecture. The overall synchronization block uses 82017 gates count with power consumption of

12.5 mW.

### 誌 謝

在研究生活中的七百多個日子中,最要感謝的人莫過於我的指導教授-周世 傑老師,兩年前老師讓我能進入這個實驗室,心中對老師是滿滿的感謝,現在要 離開這個實驗室,對老師是無限的感恩。無論在研究或生活上,都受到老師許多 的啟迪和關懷,讓我能不斷改進自己的缺點,一步一步前進。

再者要感謝總是照顧我的 MOMO 學姐,不管是研究上或是生活上的大小事 情,學姐總是不吝惜地給予我意見及幫忙,能和學姊合作真的很開心也很愉快。 謝謝庭禎學長總是不厭其煩的解答我許多的問題,提供許多寶貴設計的經驗,讓 我獲益匪淺。謝謝古孟璘學長在演算法上給我許多想法,祝福學長能夠早日康 復。學長姐們紹維、小胖、儷蓉、晉欽、totoro、誠文、阿賢、彥欽、建欽、阿 龍都是一路上幫助我的貴人。

我很幸運地有一群很可貴的實驗室夥伴 spice、阿樸、俊誼、國光、企鵝、 建君、宜欣、JUJU、篤雄陪伴我走過許多研究生活的酸甜苦辣,在 316 實驗室 裡的許多漫漫長夜,研究不順的懊惱,突破瓶頸的喜悅,都因為有你們的分享、 參與而更珍貴。一起談天說地的快樂,一起蓋房子、蓋城堡、殺野豬、跑顯示卡 測試程式的日子很難忘,希望這段難得的友誼能永遠長存。

學妹舒蓉是扮演實驗室最辛苦的腳色,常要忍受學長們的開玩笑,因為有學 妹的犧牲讓我們的壓力能夠減輕許多,謝謝妳也希望妳能繼續往天才 van 的夢想 前進。實驗室優秀的學弟運翔、祥珄、喻喧,謝謝你們這些日子的協助與支持。

最後要感謝筱涵一直陪伴在我身邊加油打氣,給我前進的力量,以及我的父母在我求學生涯裡一路默默的支持、付出,提供我良好的環境,讓我可以全心完成學業,辛苦你們了,這份榮耀全歸你們。

俊男

謹誌於 新竹

2007年10月

# Contents

| Chapter 1 Introduction                                       |
|--------------------------------------------------------------|
| 1.1 Overview of Wireless System                              |
| 1.2 Introduction of IEEE 802.16e Standards                   |
| 1.2.1 The IEEE 802.16 Family                                 |
| 1.2.2 The Distinct Features of IEEE 802.16e-2005 Standard5   |
| 1.3 Motivation7                                              |
| 1.4 Thesis Organization                                      |
| Chapter 2 OFDMA and 802.16e Technology                       |
| 2.1 Concept of OFDMA                                         |
| 2.1.1 OFDM Technology Overview9                              |
| 2.1.2 Data Format and System Comparison of OFDM and OFDMA 11 |
| 2.2 802.16e Technology                                       |
| 2.2.1 Basic Specification of 802.16e14                       |
| 2.2.2 Downlink Subcarrier Allocation Scheme16                |
| 2.2.2.1 Downlink Full Usage of Subcarriers                   |
| 2.2.2.2 Downlink Partial Usage of Subcarriers                |
| 2.2.3 Packet Format                                          |
| 2.2.4 Channel Coding                                         |
| 2.2.4.1 Randomization                                        |
| 2.2.4.2 Coding                                               |
| 2.2.4.3 Interleaving                                         |
| 2.2.5 MIMO Technology                                        |
| 2.2.6 Reference Signal                                       |
| 2.2.7 Basic Specification of 802.16e26                       |

| Chapter 3 | Synchronization Subsystem Design                           |    |
|-----------|------------------------------------------------------------|----|
| 3.1 Sy    | mbol Synchronization                                       |    |
|           | 3.1.1 Effect of Symbol Timing Offset                       |    |
|           | 3.1.2 Traditional Symbol Synchronization Algorithm         | 32 |
|           | 3.1.3 Proposed Scheme for Hardware Implementation          | 33 |
|           | 3.1.4 Performance Simulation Results and Comparison        | 34 |
| 3.2 Ca    | rrier Frequency Synchronization                            | 37 |
|           | 3.2.1 Effect of Carrier Frequency Offset                   | 37 |
|           | 3.2.2 Integer Carrier Frequency Estimation                 |    |
|           | 3.2.3 Proposed Integer Carrier Frequency Offset Estimation | 40 |
|           | 3.2.4 Fractional Carrier Frequency Offset Estimation       | 43 |
|           | 3.2.5 Derotator and NCO                                    | 44 |
|           | 3.2.6 Simulation Result                                    | 45 |
| 3.3 SC    | O Estimation                                               | 47 |
|           | 3.3.1 Effect of Sampling Clock Offset                      | 47 |
|           | 3.3.2 Sampling Clock Offset Estimation                     | 48 |
|           | 3.3.3 Simulation Result                                    | 49 |
| Chapter 4 | Architecture and Hardware Design                           | 50 |
| 4.1 Ov    | erview of Baseband Receiver                                | 50 |
| 4.2 Arc   | chitecture of Symbol Synchronization                       | 51 |
|           | 4.2.1 Original Architecture of Symbol Synchronization      | 51 |
|           | 4.2.2 Modified Calculation Process Architecture            | 53 |
| 4.3Arc    | hitecture of Carrier Frequency Synchronization             | 55 |
| 4.4 CC    | Oordinate Rotational DIgital Computer                      | 58 |
| 4.5 Lo    | w power consideration                                      | 61 |

| 4.6 Implementation Results           |    |
|--------------------------------------|----|
| Chapter 5 Conclusion and Future Work | 65 |
| Reference                            | 66 |
| Biobibliography                      |    |

# **List of Figures**

| Fig. 1.1 Network area definitions [1]1                                           |
|----------------------------------------------------------------------------------|
| Fig. 1.2 Concept of AMC                                                          |
| Fig. 1.3 Project of M-Taiwan [11]7                                               |
| Fig. 2.1 Sub-channels in (a) FDM (b)OFDM10                                       |
| Fig. 2.2 Effect of multipath with filling zero signal in guard interval [13] .11 |
| Fig. 2.3 OFDM symbol with cyclic prefix                                          |
| Fig. 2.4 Comparison of OFDM and OFDMA subcarriers allocation12                   |
| Fig. 2.5 Data allocation schemes                                                 |
| Fig. 2.6 OFDMA frame data structure                                              |
| Fig. 2.7 Procedure of FUSC                                                       |
| Fig. 2.8 Procedure of PUSC                                                       |
| Fig. 2.9 Frame structure                                                         |
| Fig. 2.10 OFDMA frame with multiple zones                                        |
| Fig. 2.11 Channel coding process                                                 |
| Fig. 2.12 PRBS for data randomization                                            |
| Fig. 2.13 Convolutional encoder                                                  |
| Fig. 2.14 structure of beamforming                                               |
| Fig. 2.15 Transmit diversity                                                     |
| Fig. 2.16 Pilot structure of MIMO mode                                           |
| Fig. 3.1 Constellation of (a) earlier and (b) later case of symbol boundary      |
| offset for 64 QAM                                                                |
| Fig. 3.2 ISI free region for (a) one transmitting antenna (b) two transmitting   |
| antennas                                                                         |
| Fig. 3.3 Correlation of preamble                                                 |

| Fig. 3.4 Correlation result   33                                           |
|----------------------------------------------------------------------------|
| Fig. 3.5 Performance of different correlation length in symbol timing      |
| estimation (SNR=9.4dB)                                                     |
| Fig. 3.6 Performance for different quantization method                     |
| Fig. 3.7 Symbol miss error probability when multipath is very closed36     |
| Fig. 3.8 Symbol miss error probability of different vehicles               |
| Fig. 3.9 (a) without CFO effect (b) with CFO effect                        |
| Fig. 3.10 Flow chart of ICFO estimation algorithm40                        |
| Fig. 3.11 Correlation when the CFO is in the border with 0.49 $\Delta f$   |
| (SNR=9.4dB)41                                                              |
| Fig. 3.12 ICFO region for ping-pong algorithm                              |
| Fig. 3.13 Error probability of modified ping-pong and original algorithms  |
| for ICFO estimation (SNR=9.4dB)42                                          |
| Fig. 3.14 The architecture of derotator                                    |
| Fig. 3.15 Block diagram of NCO45                                           |
| Fig. 3.16 Correlation results of different accumulation length (SNR=9.4dB) |
|                                                                            |
| Fig. 3.17 Overall performance of different vehicles and quantization46     |
| Fig. 3.18 Sampling clock is slower in receiver than in transmitter47       |
| Fig. 3.19 Compensation of SCO effect                                       |
| Fig. 4.1 Overall block diagram of baseband                                 |
| Fig. 4.2 Using XOR gates to do the correlation function                    |
| Fig. 4.3 Block diagram of symbol synchronization                           |
| Fig. 4.4 One part of the correlation bank                                  |
| Fig. 4.5 Storage unit for storing previous correlation aresults            |

| Fig. 4.6 Structure of cross correlation for FCFO synchronization  | .56  |
|-------------------------------------------------------------------|------|
| Fig. 4.7 R/W operation of different memory modules                | .57  |
| Fig. 4.8 Complex multiplier reduction                             | .58  |
| Fig. 4.9 Iterative vector rotation                                | . 59 |
| Fig. 4.10 Architecture of CORDIC                                  | .61  |
| Fig. 4.11 Adding NAND gate for reducing power in correlation bank | .62  |
| Fig. 4.12 State diagram of overall block                          | .62  |
| Fig. 4.13 Area proportion of each block circuit                   | .63  |

## **List of Tables**

| Table 1.1 Comparisons of IEEE 802.11 standards [5]              | 3  |
|-----------------------------------------------------------------|----|
| Table 1.2 Basic comparisons of IEEE802.16 standards             | 5  |
| Table 2.1 Scalable OFDMA parameters                             | 13 |
| Table 2. 2 Puncture configuration                               | 22 |
| Table 2.3 Example of 802.16e system specifications              | 27 |
| Table 4.1 Area and power comparisons of five delay-line methods | 57 |
| Table 4.2 Synthesis results                                     | 63 |
| Table 4.3 Synthesis results of correlation bank                 | 64 |
| Table 4.4 Comparisons of design results                         | 64 |

## Chapter 1

## Introduction

### 1.1 Overview of Wireless System

In recent years, the fast development of wireless technology not only brings a great benefit but also revolutionizes people's lives. You can easily use a wireless device to receive the latest information of stock quotes, to send private email, even to watch a live baseball game anytime and anywhere. These things were impossible twenty years ago, but they become so easy now. In the future, the wireless technology supporting more high data rate and mobility will facilitate people's lives further.



Fig. 1.1 Network area definitions [1]

The IEEE (Institute of Electrical and Electronics Engineers) classifies the

wireless communication system into four layers and specifies the different standards to each layer according to different applications. Fig. 1.1 illustrates the four layers with different coverage: Wireless Personal Area Network (WPAN), Wireless Local Area Network (WLAN), Wireless Metropolitan Area Network (WMAN) and Wireless Wide Area Network (WWAN). We will briefly introduce these layers and their applications.

WPAN is a wireless technology which transmits with low power and short distance. It includes three main parts: IEEE 802.15.1 Bluetooth [2] \ IEEE 802.15.3 Ultra Wideband (UWB) [3] \ IEEE 802.15.4 Low-rate WPAN (LR/WPAN or ZigBee) [4]. Bluetooth is the earliest development of WPAN and supports the data rate up to 10 Mbps. It is generally used in the personal device such as cell phone, PDA, and note book. UWB uses the wide bandwidth to achieve high data rate up to 480Mbps and can be used to transmit the high quality video. Unfortunately, the development of UWB is slow due to the two incompatible standards; DS-UWB and MB-OFDM, which are specified by different groups. ZigBee exchanges data rate for power. It offers data rates only up to 250 Kb/s, but it is suitable to the battery-powered devices such as wireless sensor and medical equipment.

WLAN has broader coverage than WPAN and also works on the unlicensed band. The development of WLAN is very fast, and it has been generally provided in the metropolis. WLAN is typically used in the indoor environment such as home, office building, supermarket, and so forth. Several versions of WLAN are listed and compared in Table 1.1. The technology of WLAN is restricted by the small transmitting range and the lack of mobility.

|                           | 802.11a | 802.11b | 802.11g   | 802.11n          |
|---------------------------|---------|---------|-----------|------------------|
| Date issued               | 1999    | 1999    | 2003      | 2007(Draft 2.0)  |
| Frequency                 | 5 GHz   | 2.4 GHz | 2.4 GHz   | 2.4 GHz / 5 GHz  |
| Data rates                | 54 Mbps | 11 Mbps | 54 Mbps   | 600 Mbps         |
| Modulation                | OFDM    | DSSS    | DSSS、OFDM | DSSS、OFDM        |
| Number of spatial antenna | 1       | 1       | 1         | 1 • 2 • 4        |
| Channel width             | 20 MHz  | 20 MHz  | 20 MHz    | 20 MHz or 40 MHz |

Table 1.1 Comparisons of IEEE 802.11 standards [5]

WMAN is initially proposed to solve the problem of "last mile". In the past, a wired cable or optical fiber of the subscriber is necessary for connecting to the internet. However in the WMAN based system, users can connect to the wireless Access Point (AP) of WLAN, and then the AP communicates with Base Station through a WMAN device. Finally, the Base Station connects to the network backbone by wired cables. It saves the cost of wireless services and provides the higher data rate up to 70 Mbps.

WWAN supports the widest coverage and high vehicular mobility up to 250 kmph. The establishment of IEEE 802.20 standard is still in progress. This standard likely operates on the licensed band below 3.5 GHz and provides the data rates of 4 Mbps and 1.2 Mbps in the downlink and uplink respectively.

### 1.2 Introduction of IEEE 802.16e Standards

#### 1.2.1 The IEEE 802.16 Family

The IEEE 802.16 Working Group on Broadband Wireless Access (BWA) Standards was established in 1998, which is responsible to the development of IEEE 802.16 standard. The original 802.16 standard was completed in December 2001. It was based on the fixed light-of-sight (LOS) operation in the 10 GHz-66 GHz range. In the physical layer (PHY), only single carrier modulation is supported.

IEEE 802.16a standard is an amendment to 802.16 in 2003. It provides the capacities of point-to-multipoint connection and non-line-of-sight (NLOS) transmission in the 2GHz-11GHz frequency band. Three modulation schemes of PHY are supported: SC, Orthogonal Frequency Division Multiplexing (OFDM), and Orthogonal Frequency Division Multiple Access (OFDMA)

IEEE 802.16-2004, also known as IEEE 802.16d, is only for fixed broadband wireless and has upgraded to all prior versions in June 2004. An industry consortium, the Worldwide Interoperability for Microwave Access (WiMAX) adopted IEEE 802.16-2004 as the first solution of fixed applications.

IEEE 802.16e-2005 amends 802.16-2004 to add mobility support. It enables mobile speed up to 120km/h, but also be backward compatible to support the fixed mode in 802.16-2004. Operation in mobile mode is limited to the licensed bands between 2GHz-6GHz, on the other hand, operation in fixed mode is limited to 2GHz-11GHz. IEEE 802.16e-2005 is usually referred to as mobile WiMAX. In this thesis, we will focus on this standard.

The other five amendment projects are in progress: 802.16g, 802.16h, 802.16i, 802.16j, and 802.16m. The basic comparisons and characteristics of the various IEEE 802.16 standards are summarized in Table 1.2 [9].

|                | 802.16        | 802.16-2004    | 802.16e-2005           |
|----------------|---------------|----------------|------------------------|
| Date issued    | Dec. 2001     | Jun. 2004      | Dec. 2005              |
| Frequency band | 10 GHz-66 GHz | 2 GHz-11 GHz   | 2 GHz-11 GHz for fixed |
|                |               |                | 2 GHz-6 GHz for mobile |
| Application    | Fixed LOS     | Fixed NLOS     | Fixed and mobile NLOS  |
| Transmission   | Only SC       | SC, SCa,       | SC, SCa, OFDM, ODMA    |
| scheme         |               | OFDM ,ODMA     |                        |
| Data rate      | 32 Mbps-      | 1 Mbps-75 Mbps | 1 Mbps-75 Mbps         |
|                | 134.4 Mbps    |                |                        |
| Duplexing      | TDD and FDD   | TDD and FDD    | TDD and FDD            |

Table 1.2 Basic comparisons of IEEE802.16 standards

### 1.2.2 The Distinct Features of IEEE 802.16e-2005 Standard

Due to the silent features, the developments of IEEE 802.16e-2005 standard have become a very hot trend both in the industry and academic. Some of these distinct features are described as follows [9] [10]:

High data rates: The peak data rates can achieve to 75 Mbps when using 64 QAM modulation scheme with 5/6 coding rate in 20MHz bandwidth. The use of multiple-input multiple-output (MIMO) and spatial multiplexing enables the data rates to be higher under good transmitting conditions.

Adaptive modulation and coding (AMC): There are total 52 configurations of modulation and forward error correction coding schemes which can be adaptively selected according to the channel conditions. When the channel is good, the base station transmits as high a data rate as possible such as 64 QAM and high coding rate. When the channel is poor, the base station chooses the small constellation and lower coding rate to transmit. The adoption of AMC can increase the transmitting coverage and maximizes the throughput rate. This concept is illustrated in Fig. 1.2.



Fig. 1.2 Concept of AMC

Quality of service (QoS): The fundamental premise of IEEE 802.16 medium access control (MAC) architecture is QoS. It supports a large number of users which have respective QoS requirements. Additionally, the schemes of subchannelization and medium access protocol (MAP) let the scheduling become more flexible by using space, frequency, and time physical resources over the air interface on a frame-by-frame basis.

Mobility: The mechanism of handover is optimized to be no longer than 50 milliseconds. It can ensure the real-time operation without service degradation. This is especially important for the real-time applications such as Voice over Internet Protocol (VoIP). The system also supports the power management with sleep and idle modes to extend the battery life of handheld subscriber devices.

Security: The security specification is the most advanced in current wireless access systems. It offers Extensible Authentication Protocol (EAP) based authentication, Advanced Encryption Standard-Counter with CBC-MAC mode (AES-CCM) based authentication encryption, and Cipher-based Message Authentication Code (CMAC) and Hashed Message Authentication Code (HMAC) based control message protection scheme.

### 1.3 Motivation

In recent years, the research and development of WMAN is a very attractive and worldwide topic. The industry of WLAN in Taiwan is very successful to have more than 90% of world market share on Wi-Fi (Wireless Internet Fidelity Internet) products. Base on this good foundation and the huge market and applications of WMAN, the Taiwan government chose the latest WMAN technology as the next generation wireless industry. The "M (Mobile)-Taiwan Program" was proposed in 2005 as shown in Fig. 1.3.



Fig. 1.3 Project of M-Taiwan [11]

The technology of OFDM has been widely used in communication systems. The orthogonal property of subcarriers let the use of spectrum be more efficient and the insertion of guard interval with cyclic prefix can resist the multipath interference. In order to support multiple access, the multiple access scheme based on OFDM system, OFDMA, is adopted in the mobile mode of IEEE 802.16e. Additionally, MIMO system is also included to gain the diversity for increasing overall performance.

In the wireless environment, several problems will let the system fail and need to

be solved in the receiver. They include symbol timing offset, carrier frequency offset (CFO), and sampling clock offset (SCO). The wrong symbol boundary causes the subcarriers with phase rotation or loss of their orthogonality. CFO is divided into integer part and fractional part. Integer CFO causes the index shift of subcarriers, and fractional CFO causes inter-carrier-interference (ICI). SCO causes the phase rotation which is proportional to the subcarrier index. Therefore, the mechanisms of synchronization are necessary to overcome these problems.

In this thesis, we propose the synchronization architectures based on correlation algorithm, and the complexity of hardware can be reduced through using quantization scheme. In order to reduce the use of complex multipliers, we use the architecture of Coordinate Rotation Digital Computer (CORDIC) to compute the trigonometric value and rotation phase.

### 1.4 Thesis Organization

The remainder of this thesis is organized as follows. In chapter 2, the concepts of OFDM and OFDMA will be briefly introduced, and the technology of WMAN 802.16e standard will also be discussed. In chapter 3, we will introduce various imperfect effects, and propose the corresponding algorithms. The system performance simulation of the proposed solutions will be shown. In chapter 4, the architecture and hardware design will be presented. Finally, conclusions and future work are made in Chapter 5.

## Chapter 2

## **OFDMA and 802.16e Technology**

The physical (PHY) layer of 802.16e includes several specifications for different applications and frequency range. For example, single carrier (SC) operates from 10 GHz to 66 GHz in line-of-sight (LOS). SCa, OFDM and OFDMA operate below 11 GHz in non-line-of-sight (NLOS). In this thesis, we focus on the OFDMA specification for mobility. We will briefly introduce the concepts about OFDM and OFDMA. Then, we give an overview of the IEEE802.16e OFDMA standard.

# 2.1 Concept of OFDMA

### 2.1.1 OFDM Technology Overview

Orthogonal frequency division multiplexing (OFDM) has been widely adopted in high data rate communication systems such as Asynchronous Digital Subscriber Line (ADSL), Digital Video Broadcasting for Terrestrial (DVB-T) and 802.11b/g (Wi-Fi). One of the main reasons to use OFDM is to increase the robustness against frequency selective fading or narrowband interference. Due to the recent development of digital signal processing (DSP) and very large scale integrated circuit (VLSI), the initial obstacles of OFDM implementation, such as massive complex computation, do not exist now.

OFDM is based on the idea of frequency division multiplexing (FDM). The

concept of using parallel data transmission and was published in the mid 60s. In FDM, the total frequency bandwidth is divided into N non-overlapping sub-channels which are modulated with a separate symbol. In order to prevent from the adjacent channel interference, frequency spacing is allocated between sub-channels. However, this is inefficient to use the spectrum. In OFDM, the total frequency bandwidth is divided into N overlapping sub-channels which are mutual orthogonal. The orthogonality allows simultaneous transmission on a lot of subcarriers without interference from each other. The sub-channels in FDM and OFDM are shown in Fig. 2.1.



Fig. 2.1 Sub-channels in (a) FDM (b)OFDM

Due to the multipath effect, the inter-symbol interference (ISI) which destroys the orthogonality of subcarriers in OFDM is induced. In order to avoid this phenomenon, a guard interval is designed as longer than the delay spread and inserted to the head of each symbol. The guard interval can be implemented by filling with zero, but it will cause the problem of intercarrier interference (ICI). This effect can be illustrated in Fig. 2.2. The orthogonality between subcarrier #1 and delayed subcarrier #2 is damaged due to their difference is not a multiple of cycle. To eliminate ICI, the OFDM symbol is cyclically extended in the guard interval, which is also called cyclic prefix (CP) as shown in Fig. 2.3. CP makes the subcarrier signal has integral periods, so the orthogonality can be maintained.



Fig. 2.2 Effect of multipath with filling zero signal in guard interval [13]



### 2.1.2 Data Format and System Comparison of OFDM and OFDMA

The system of WMAN suffers a challenge from the requirement of multiuser communication: many users in the same cell require high data rates in a finite bandwidth with low latency. Therefore, multiple access techniques are necessary to be adopted. However, OFDM does not have this characteristic to support multiple access, all the subcarriers are simultaneously used by a single user. OFDMA is designed to improve this drawback. So, OFDMA is a multiuser version of OFDM. The all available subcarriers of uplink and downlink in OFDMA are divided into several groups of subcarriers termed as subchannels as shown in Fig. 2.4. The different



subchannels may be allocated to different users as a multiple access mechanism [9].

Fig. 2.4 Comparison of OFDM and OFDMA subcarriers allocation

The allocated subcarriers of user are dynamically spread in frequency domain like FDMA and in different time slots like TDMA. In other words, OFDMA is essentially a hybrid of FDMA and TDMA. Subchannelization in OFDMA has another significant advantage of allowing users to transmit only specific subchannels instead of entire frequency band in OFDM. This is useful to save the battery power in user devices. There are two types of data allocation for subchannelization can be chosen: contiguous and diversity. The contiguous permutation groups contiguous subcarriers to form a subchannel. On the contrary, the diversity permutation pseudo-randomly spread out the subcarriers of subchannel over the entire bandwidth and brings the benefit of frequency diversity and robustness against the frequency select fading channel. Fig. 2.5 illustrates the two allocation schemes.



Adjacent subcarrier allocation (AMC)

Distributed subcarrier allocation (FUSC, PUSC)

## Fig. 2.5 Data allocation schemes

Additionally, the scalable OFDMA (S-OFDMA) scheme is adopted in standard. It supports a wide range of bandwidth from 1.25MHz to 20MHz as shown in Table 2.1. And the number of subcarriers is decided according to bandwidth for keeping the fixed subcarrier spacing at 10.94 KHz which is derived as the optimum tradeoff in [12].

| Table 2.1 Scalable O | FDMA parameters |
|----------------------|-----------------|
|----------------------|-----------------|

| Parameters                     |      | Val       | ues      |      |
|--------------------------------|------|-----------|----------|------|
| System channel bandwidth (MHZ) | 1.25 | 5         | 10       | 20   |
| FFT size                       | 128  | 512       | 1024     | 2048 |
| Sampling factor                |      | 28/       | 25       |      |
| Sampling frequency             | 1.4  | 5.6       | 11.2     | 22.4 |
| CP ratio                       | 1/.  | 32 • 1/16 | · 1/8 ·  | 1/4  |
| Modulation mode                | QPS  | SK 🔹 16QA | AM 、 64Q | AM   |
| Subcarrier frequency spacing   |      | 10.94     | • kHz    |      |
| Frame duration                 |      | 5 r       | ns       |      |

### 2.2 802.16e Technology

#### 2.2.1 Basic Specification of 802.16e

In 802.16e standard, there are several processing units specified in the PHY layer of OFDMA. They include one-dimension domain units and two-dimension domain units because of data allocated on both time and frequency. An essential OFDMA symbol is based on the format of OFDM symbol and the most basic unit is "subcarrier". The "subcarrier" is the one-dimension unit and consists of three types:

- 1. Data subcarriers: for carrying data
- Pilot subcarriers: for the purpose of channel estimation, channel tracking and synchronization
- 3. Null subcarriers: no power is allocated to the DC subcarrier and guard subcarriers. The frequency of DC subcarrier is equal to RF, this will induce the local oscillator re-radiation effect to cause this subcarrier is not suitable to carry data. In guard band, the reason is for reducing the interference between adjacent channels.

A set of subcarriers in frequency domain are grouped as a "subchannel". Two-dimension units from large size to small size are "frame", "sub-frame", "zone", "segment", "burst", "slot" and "cluster". Brief definitions are described as follows respectively and the relationship between these units is shown in Fig. 2.6.

- 1. Frame: It is an essential packet format of transmitted data sequence.
- Sub-frame: It is a component to make up a frame and is identified as downlink and uplink.
- 3. Zone: A zone is the region of contiguous OFDMA symbols with the same





- 4. Segment: It is a subdivision of the set of subchannels for certain particular allocation zone. The content of the Medium Access Control (MAC) layer is the same in a segment.
- 5. Burst: It is a region which includes the contiguous subchannel and OFDMA symbol to transmit the broadcast or unique data for corresponding users.
- Slot: It is the minimum possible data allocation unit and described in both time and subchannel dimension. It contains 48 data subcarriers for all subchannelization schemes, but their arrangement is different in different schemes [15].
- Cluster: It contains 14 adjacent subcarriers over 2 contiguous symbols with
   4 pilot subcarriers in partial usage of subcarriers (PUSC) permutation scheme.

#### 2.2.2 Downlink Subcarrier Allocation Scheme

As mentioned in section 2.1.2, the constitution of subchannels is the most significant characteristic of OFDMA system. The data allocation scheme is made up of the subcarrier permutation mode. Several schemes in 802.16e will be discussed next.

#### 2.2.2.1 Downlink Full Usage of Subcarriers

In the DL full usage of subcarriers (FUSC) mode, each subchannel contains 48 data subcarriers. The pilot subcarriers in FUSC are divided to two constant set pilots and two variable set pilots. The index of variable set pilots changes from one symbol to the next and obeys the following rule

*PilotLocation* = *VariableSet*  $#x + 6 \cdot (FUSC \_ SymbolNumber \mod 2)$  (2.1) where VariableSet #x indicates the pilot allocations of two variable sets specified in standard and FUSC\_SymbolNumber is the FUSC symbol number of current zone. This procedure of subchannelization is described in Fig. 2.7. The pilot subcarriers are allocated first, the remaining subcarriers are contiguously portioned into groups. The number of groups is equal to the number of subcarriers per subchannel. Then, the subchannel extracts one subcarrier from each of these groups. The exact subcarrier allocation is according to the permutation rule [7][8] as *subcarrier*(k, s) = (2.2)

$$(2.2)$$

$$N_{subchannel} \cdot n_k + \{p_s[n_k \mod N_{subchannels}] + DL\_PermBase\} \mod N_{subchannels}$$

where subcarrier (k,s) is the subcarrier index k in subchannel s,  $n_k$  is equal to (k+13 · s) mod  $N_{subcarriers}$ ,  $N_{subchannel}$  is the number of subchannels,  $p_s[j]$  is the series obtained by rotating basic permutation sequence cyclically to the left s times and DL\_PermBase is an integer ranging from 0 to 31.



Fig. 2.7 Procedure of FUSC

### 2.2.2.2 Downlink Partial Usage of Subcarriers

Another diversity permutation scheme is PUSC which is similar to FUSC. The most significant difference between them is the mapping region of subcarriers. In FUSC, the subcarriers may be mapped to full band. In PUSC, the subcarriers are mapped in a specific interval. This procedure of permutation is illustrated in Fig. 2.8. First, the adjacent subcarriers are grouped as a physical cluster which consists of 24 data subcarriers over two contiguous symbols and 4 pilot subcarriers. These physical clusters are renumbered to the logical clusters according to a pseudorandom numbering scheme as shown in below,

$$LogicalCluster = RS(PC + 13 \cdot DL PermBase) \mod N_{clusters}$$
(2.3)

where RS is the renumbering sequence defined in the standard, PC is the index of the physical clusters and  $N_{clusters}$  means the number of clusters. Then, the clusters are partitioned into six major groups depending on the FFT size. The scbcarriers in each major group are allocated to subchannels using the same procedure for FUSC.



Fig. 2.8 Procedure of PUSC

#### 2.2.3 Packet Format



In the licensed bands, the duplexing method shall be either time division duplex (TDD) or frequency division duplex (FDD). The other license-exempt bands, the duplexing method shall be TDD. In the case of FDD, the uplink and downlink subframes are transmitted simultaneously on the different frequency. In the case of TDD, the uplink and downlink subframes are transmitted respectively on the same frequency but at different time. The OFDMA frame structure in TDD mode is illustrated in Fig. 2.9. In the beginning of each transmitted frame, a preamble symbol is transmitted, which is the known data in receiver for the use of synchronization and initial channel estimation. In the OFDMA symbol following the frame preamble, the initial subcahnnels are allocated for the frame control header (FCH). The FCH contains the system control information such as the subcarriers used, length of the downlink-MAP (DL-MAP) and the DL\_Frame\_Prefix. The FCH shall be always transmitted using QPSK rate 1/2 with four repetitions to ensure the robustness and

reliable performance. DL\_MAP and UL\_MAP following FCH specify the data region of the various users in the DL and UL subframes. The transmit transition gap (TTG) is used to give the base station (BS) and subscriber station (SS) enough time to change from downlink mode to uplink mode. For the same reason, the receive transition gap (RTG) is inserted at the end of each frame.



In an OFDMA frame, it can include several different permutation zones for allocating the subcarriers. But only one permutation scheme is allowed in an OFDMA symbol. Because there is not any information about the permutation scheme in the beginning, the first zone must be the PUSC zone to ensure the FCH and DL\_MAP can be received successfully. The information of zone transition is indicated in the DL\_MAP and UL\_MAP. Fig. 2.10 describes an OFDMA frame with multiple zones.



Fig. 2.10 OFDMA frame with multiple zones

### 2.2.4 Channel Coding

The purpose of channel coding is to help the transmitted data through noisy channel with lower error probability. It is composed of the following steps: (1) data randomization (2) encoding (3) interleaving (4) repetition (5) modulation as shown in Fig. 2.11. Because the control message of the system is very important, the repetition coding scheme is used to enhance the error correct performance of them. The repetition code factor, R, can be 2, 4, or 6 by utilizing the multiple subchannels. The other functional blocks will be discussed next.



Fig. 2.11 Channel coding process

#### 2.2.4.1 Randomization

All the transmitted data on both the downlink and uplink are randomized except the FCH and preamble. The purpose of randomization is to prevent the transmitted data with a long successive sequence of zeros or ones, because it will reduce the performance of coding and synchronization. A randomizer is made up of a pseudo-random binary sequence (PRBS) generator and XOR gates. The PRBS is generated by the shift register with the polynomial function,  $G(x)=1+x^{14}+x^{15}$ , as shown in Fig. 2.12. The bit stream sequentially enters the randomizer, started from MSB, to generate information bits.



Fig. 2.12 PRBS for data randomization

### 2.2.4.2 Coding



In the specification of 802.16e, it contains several channel coding schemes such as convolutional codes (CC), turbo codes, convolutional turbo codes (CTC), block turbo codes (BTC) and low density parity check codes (LDPC). The mandatory coding scheme is based on the tail-biting convolutional coding scheme, and the others are optional. Therefore, we will only briefly introduce the convolution codes. The convolutional encoder has a constraint length 7 and native rate of 1/2. The generator polynomials are used to derive its two coded bits X and Y as follows:



Fig. 2.13 Convolutional encoder

After encoder, these coded bits are needed to be punctured and serialized according to the different code rates. Table 2. 2 shows the patterns for puncture and serialization order. "1" means a transmitted bit and "0" means a removed bit.

| Code Rates        | 1/2                           | 2/3         | 3/4            |
|-------------------|-------------------------------|-------------|----------------|
| d <sub>free</sub> | 10                            | 6           | 5              |
| X                 |                               | 10          | 101            |
| Y                 | Ĩ                             | 11 Jun      | 110            |
| Output            | X <sub>1</sub> Y <sub>1</sub> | $X_1Y_1Y_2$ | $X_1Y_1Y_2X_3$ |

Table 2. 2 Puncture configuration

### 2.2.4.3 Interleaving

The operation of interleaving is divided into a two-step permutation. The first step ensures that the adjacent coded subcarriers are mapped onto the nonadjacent subcarriers. It provides the diversity gain and improves the performance of the decoder. The second step ensures that the adjacent coded bits are alternately mapped onto the less or more significant bits of constellation. The error probability of all bits in 16 QAM or 64 QAM is not the same. For example, the most significant bit (MSB) has the lower error probability than the least significant bit (LSB) has. Thus, the second step prevents the case which the coded bits are always mapped on the low

reliable bits from being carried out.

The interleaving rules of the 802.16 specification are described as Eqn. (2.4). Let  $N_c$  be the total number of coded bits, i.e. 2 for QPSK, 4 for 16 QAM, and 6 for 64 QAM. s is equal to the half of  $N_c$ . Let k be the original index of the coded bits before interleaving,  $m_k$  and  $j_k$  be the index after the first and second steps respectively, d be the modulo used for permutation.

$$m_{k} = \left(\frac{N_{c}}{d}\right) \cdot k_{\text{mod}(d)} + floor\left(\frac{k}{d}\right)$$

$$j_{k} = s \cdot floor\left(\frac{m_{k}}{s}\right) + \left(m_{k} + N_{c} - floor\left(\frac{d \cdot N_{c}}{N_{c}}\right)\right)_{\text{mod}(d)}$$

$$(2.4)$$

#### 2.2.5 MIMO Technology

The technology of multiple-input multiple-output (MIMO) is supported in WMAN, which provides several benefits as described below [9]:

a filling

- 1. Increase the system reliability 1896
- 2. Increase the achievable data rate and enhance system capacity
- 3. Increase the coverage area
- 4. Decrease the required transmit power

However, these advantages usually conflict with one another. For example, increasing the data rate will decrease the reliability or increase the transmitted power. For different purposes, different MIMO schemes are adopted such as beamforming, spatial multiplexing, and space-time code.

1. Beamforming:

When multiple antennas are used in the closed-loop mode, the transmitter will know the channel state information. Thus, the interference caused from directional noise signals can be reduced by the technology of adaptive antenna systems, beamforming [17][18]. A beamformer is similar to the spatial filter, which adjusts the strength of the transmitted and received signals based on direction as shown in Fig. 2.14. Two principles of beamforming, direction of arrival (DOA) based and eigenbeamforming based are generally used.



Fig. 2.14 structure of beamforming

ANILLER.

2. Spatial multiplexing:

Spatial multiplexing is a technique to increase the throughput rate by using multiple antennas at both ends. The transmitted data stream is divided into N<sub>t</sub> independent sub-streams. Multiple sub-streams are parallelly transmitted through multiple antennas. If these data sub-streams can be separated successfully in the receiver, the data rate and system capacity will be higher than single antenna system. 3. Transmit diversity:

Transmit spatial diversity means that the transmitted signals can pass through different transmit antenna to overcome the deep fading channel. The transmit diversity is attractive for subscriber stations, because the cost of multiple antennas is on the transmitter. The space time block code (STBC), a transmit diversity technique, proposed by Alamouti in 1998 [19] is supported in WMAN. The case of 2\*1 antenna system is described as shown in Fig. 2.15.


Fig. 2.15 Transmit diversity

The transmitted data are encoded by the following matrices:

Antenna 1  
Antenna 2  
$$\begin{bmatrix} S_1 & -S_2^* \\ S_2 & S_1^* \end{bmatrix}$$
  
Time

where  $S_1$  and  $S_2$  are the two consecutive symbol. The matrix is orthogonal in nature and can be detected by the maximum likelihood (ML) algorithm. The other matrices for three and four antennas are defined in [1][8] with different kinds of coding rate.

#### 2.2.6 Reference Signal



The reference signals in WMAN-OFDMA consist of a preamble and pilot tones. The preamble contains the specific pattern known to the receiver and occupies the duration of one symbol time. It is usually used for packet detection, timing offset synchronization, frequency offset synchronization and initial channel estimation. On the other hand, the pilot tones are scattered over all the transmitted signals and are used for sampling clock offset estimation and tracking the time-varying channel.

#### (1) Preamble structure:

The subcarriers of preamble are grouped into three carrier sets for different segments. Eqn. (2.6) specifies all subcarriers allocated to the specific preamble.

$$PreambleCarrierSet_n = n + 3 \cdot k \tag{2.6}$$

where n indicates the number of the preamble carrier set from 0 to 2, k is a variable index and ranges from 0 to  $(N_{used} - 3)/3$ . It contains the guard band subcarriers both

on the left and right side of the spectrum. The DC subcarrier will always be zeroed even if the segment is 0. These subcarriers of preamble carrier set are modulated using BPSK modulation by the specific pseudo-noise (PN) series defined in a Hexadecimal format in [8]. In each FFT size, there are total 114 PN series to be chosen by the ID cell parameter and the segment index. The power of the preamble subcarriers is boosted by a factor,  $2\sqrt{2}$ , to increase the reliability of preamble.

(2) Pilot structure:

In the PUSC permutation mode, the pilot subcarriers are allocated in the clusters. Each pilot is boosted 2.5 dB over the average non-boosted power of each data subcarriers. The cluster structure with pilot subcarrier is illustrated in Fig. 2.16. If the pilot subcarrier is transmitted from one antenna, the other antenna will not transmit data on the same location to avoid the interference. And the pilot locations periodically change every four OFDMA symbols.



Fig. 2.16 Pilot structure of MIMO mode

#### 2.2.7 Basic Specification of 802.16e

The system specifications are different according to different purposes and applications. Taking the system of our thesis as an example, the main parameters can be derived and described in Table 2.3.

| Parameters                                                           |                                               | Deriving formulas                        | Values    |
|----------------------------------------------------------------------|-----------------------------------------------|------------------------------------------|-----------|
| FFT siz                                                              | 1024                                          |                                          |           |
| System                                                               | 10 MHz                                        |                                          |           |
| Sampli                                                               | ng factor (n)                                 | n=28/25 if B is a multiple               | 28/25     |
|                                                                      |                                               | of 1.25 \cdot 1.5 \cdot 2 \cdot 2.75 MHz |           |
|                                                                      |                                               | n=8/7 for the other cases                |           |
| Sampli                                                               | ng frequency (F <sub>s</sub> )                | $floor(n \cdot BW/8000) \times 8000$     | 11.2 MHz  |
| Subcarr                                                              | tier spacing ( $\Delta$ f)                    | F <sub>s</sub> / N <sub>FFT</sub>        | 10.94 kHz |
| Useful                                                               | symbol time (T <sub>b</sub> )                 | $1/\Delta f$                             | 91.4 us   |
| Guard t                                                              | ime (T <sub>g</sub> )                         | $G \cdot T_b$                            | 11.4 us   |
| OFDM                                                                 | 102.9 us                                      |                                          |           |
| Frame                                                                | 5 ms                                          |                                          |           |
| Numbe                                                                | r of OFDMA symbols (N)                        | $floor(T_F/T_s)$                         | 48        |
| DL                                                                   | Number of null subcarriers (N <sub>n</sub> )  | 184                                      |           |
| PUSC                                                                 | Number of clusters (N <sub>c</sub> )          | (N <sub>FFT</sub> -N <sub>n</sub> )/14   | 60        |
|                                                                      | Number of subchannels (N <sub>sc</sub> )      | N <sub>c</sub> /2                        | 30        |
|                                                                      | Number of pilot subcarriers (N <sub>p</sub> ) | $N_c \times 2$                           | 120       |
|                                                                      | Number of data subcarriers $(N_d)$            | $N_c \times 12$                          | 720       |
| Modula                                                               | QPSK                                          |                                          |           |
| Raw da                                                               | 13.6 Mbps                                     |                                          |           |
| Coding                                                               | 3/4 CC                                        |                                          |           |
| Peak data rate * $N_d \times 2 \times 3/4 \times (N-2) \times 1/T_F$ |                                               |                                          | 9.93 Mbps |

Table 2.3 Example of 802.16e system specifications

\* assuming 46 data OFDMA symbol in a frame

# Chapter 3

# Synchronization Subsystem Design

### **3.1 Symbol Synchronization**

Because the receiver in the actual operation doesn't know when the symbol boundary will start, the main purpose of symbol synchronization is to find out the appropriate symbol index information of the transmitted OFDMA signals. Additionally, in the wireless transmitting environment, the transmitted signals pass through different path to arrive at the receiver. The multipath effect leads the transmitted symbols overlap with different delays. Timing offset will cause phase rotation of constellation which is proportional to the drift of the subcarrier index. If the estimated symbol boundary locates in the channel response region, the inter-symbol interference (ISI) effect will be introduced.

The requirement in WMAN receiver is different from the broadcasting system. In a WMAN receiver, the symbol timing synchronization should be finished before the end of the preamble symbol, but it can spend several symbol time to estimate in the broadcasting system. The detailed effect and synchronization method will be discussed in this section.

#### 3.1.1 Effect of Symbol Timing Offset

In order to introduce the symbol synchronization algorithm, the effect of symbol

timing offset is derived in advance. Symbol timing offset means the estimated symbol boundary does not locate on the accurate location. It consists of two possible cases, earlier or later than the accurate boundary index. If the estimated boundary is earlier than the ideal index but not locates at the channel response region, it induces the constellation of signals to rotate in the frequency domain, but can be compensated by channel estimation. In order to describe this phenomenon, the received signal with this timing offset in mathematics is derived in Eqn. (3.1)

$$\begin{aligned} \hat{X}(k) &= X(k-\varepsilon) \\ &= \sum_{n=0}^{N-1} x(n-\varepsilon) e^{j2\pi k \frac{n}{N}} \\ &= \sum_{n=0}^{\varepsilon-1} x(n-\varepsilon) e^{j2\pi k \frac{n}{N}} + \sum_{n=\varepsilon}^{N-1} x(n-\varepsilon) e^{j2\pi k \frac{n}{N}} \\ &= \sum_{n=0}^{\varepsilon-1} x(n+N-\varepsilon) e^{j2\pi k \frac{n+N-\varepsilon}{N}} e^{j2\pi k \frac{\varepsilon}{N}} + \sum_{n=\varepsilon}^{N-1} x(n-\varepsilon) e^{j2\pi k \frac{n-\varepsilon}{N}} e^{j2\pi k \frac{\varepsilon}{N}} \\ &= \sum_{n=0}^{N} x(m') e^{j2\pi k \frac{m'}{N}} e^{j2\pi k \frac{\varepsilon}{N}} + \sum_{m'=0}^{N-\varepsilon-1} x(m'') e^{j2\pi k \frac{m''}{N}} e^{j2\pi k \frac{\varepsilon}{N}} \\ &= \sum_{n=0}^{N-1} x(n) e^{j2\pi k \frac{n}{N}} e^{j2\pi k \frac{\varepsilon}{N}} \end{aligned}$$

$$(3.1)$$

where k is the sub-carriers index, n is the sample index in time domain, N is the number of subcarriers in an OFDMA symbol,  $\varepsilon$  is the drift of index and X(k) is the respected signal without timing offset in the frequency domain. As shown in Eqn. (3.1), the last term  $e^{j2\pi k \frac{\varepsilon}{N}}$  causes a phase rotation to the respected signal X(k) and the rotated phase is proportional to the drift of the subcarrier index  $\varepsilon$ . This phenomenon is presented in Fig. 3.1(a). On the contrary, the mutual orthogonal characteristic between different symbols will be destroyed if the estimated boundary is later than the accurate index. Because the data of the next symbol are covered into the FFT window, the constellation is shown in Fig. 3.1(b). The constellation cannot be

compensated by the equalizer. In order to prevent the latter case, the estimated symbol boundary is better than the exact symbol boundary but within cyclic prefix length.



Fig. 3.1 Constellation of (a) earlier and (b) later case of symbol boundary offset for 64 QAM

As mentioned above, the requirements for symbol timing offset estimation are

determined by the difference between the cyclic prefix and channel impulse response. This region is the part of cyclic prefix and is not affected by the previous symbol due to channel dispersion [16]. It is usually called as ISI free region and is described in Fig. 3.2(a) for using only one transmitting antenna case.



Fig. 3.2 ISI free region for (a) one transmitting antenna (b) two transmitting antennas

In the environment of two transmitting antennas, the two transmitted signals from different antennas will probably arrive at the receiver with different delays due to the multipath effect as shown in Fig. 3.2(b). So the estimated boundary must locate in the common safe boundary of both ISI free regions to prevent the respective ISI effect from the previous symbol.

#### 3.1.2 Traditional Symbol Synchronization Algorithm

In 802.16e transmitting format, it consists of a preamble symbol in the front of a packet. Because the preamble data are known in the receiver, these signals can be used to correlate the received signals in time domain. If the signals are matched with the known data, there will be a peak of the correlation values on the symbol boundary. This method is described in Fig. 3.3.



Fig. 3.3 Correlation of preamble

The peak of correlation means that the present sample index is matched to the present preamble index window. The correlation model in mathematics can be described below:

$$B_{est} = \max_{d} \left( \sum_{m=0}^{L-1} r_{d+m} P_m^* \right)$$
(3.2)

where d is the sample index, r is the received signals, P is the preamble value and L is the correlation length. Thus, when a peak appears on the correlation result, the symbol index is found. Furthermore, the mismatch of oscillator frequency in receiver and transmitter causes frequency offset to the received signals. This phenomenon destroys the characteristic of correlation result, and we will introduce this effect in Section 3.2. In order to overcome this problem, we propose to pre-shift the known preamble signals with several possible integer carrier frequency offsets (ICFO) in time domain to overcome this effect. The correlation peak will only appear in the correct sample index and the correct integer frequency offset. This characteristic brings an additional advantage which the value of ICFO can be detected by the correlation result simultaneously. According to the 802.16e specification, the frequency offset can only have  $\pm 14$ ppm variation. In other words, there will be only seven possible integer frequency offset values in the range of  $\pm 3 \Delta f$ . Fig. 3.4 shows the correlation results to the received signals which have 100 delays and 2.3  $\Delta f$  frequency drift. The peak locates at the symbol index 100 and indicates the ICFO value equal to 2.



Fig. 3.4 Correlation result

#### 3.1.3 Proposed Scheme for Hardware Implementation

As described above, the method of correlation can find out the correct symbol

index and ICFO value. But there are several important problems that should be solved for hardware implementation. The first one is long length of correlation, because the hardware complexity is proportional to the correlation length. Secondly, the algorithm uses mass of multipliers to do the calculation of correlation. This results in the hardware area being too huge and consuming a lot of power.

To solve the first problem, full symbol lengths are not used to correlate with. Instead, a suitable length is used to take both performance and hardware complexity into consideration. For the second problem, the coefficients of the correlation function are quantized into {-1, 1} values in both real and imaginary parts [13]. It works successfully by a well design of preamble. This improvement simplifies the multiplication as simple addition and subtraction in Eqn. (3.2). It substantially replaces the multipliers into adders in the matched filter.

Based on this algorithm, furthermore, the received signals are also quantized into {-1, 0, 1} that can be represented only by two bits in the hardware implementation. In other words, it means only to consider whether the signals and the coefficients are in phase or not. Because of this improvement, it has the advantage of reducing the numbers of registers for storing received signals and computing adders. Thus, increasing the correlation length to make up for the quantization loss does not pay a lot of cost. The simulation results are shown in Section 3.1.4. Because of MISO system, the other transmitter antenna will also produce more interference than noise dose. Thus, an expected performance is shown even in low SNR when both received signals and coefficients are quantized.

#### 3.1.4 Performance Simulation Results and Comparison

In this thesis, the simulation environment is based on a time-variant multipath

channel in discrete expression. The number of multipath is six and the power value of each path is defined in ITU Veh.A channel model as shown in Table 3.1. The excess delays of different paths and antennas are randomly generated from 0 to 50 subcarrier index. And Jake's model is used to simulate the vehicle environment.

 Table 3.1 Power value of multipath channel

| Path number | Path 1 | Path 2 | Path 3 | Path 4 | Path 5 | Path 6 |
|-------------|--------|--------|--------|--------|--------|--------|
| Power (dB)  | 0      | -1.0   | -9.0   | -10.0  | -15.0  | -20.0  |

Fig. 3.5 shows the performance of different correlation length under 120km/hr. In this case, the delays of multipath are randomly generated by C program to simulate various environments. By this simulation result and the ICFO estimation which will be mentioned later, the correlation length is chosen as 300. Fig. 3.6 shows the comparison of different quantization method. The performance of floating coefficient and signals is better, but it is impractical as mentioned above. The performances of the other two quantized methods are almost the same. Thus, the results can support the idea to quantize the preamble and received signals for implementation.



Fig. 3.5 Performance of different correlation length in symbol timing estimation (SNR=9.4dB)



Fig. 3.6 Performance for different quantization method

Fig. 3.7 describes the performance when the delay of multipath is much closed. The delay values of multipath are [7, 8, 12, 14, 15, 18]. Because the first two paths are very closed, this causes more significant interference to correlation results. To be compared with Fig. 3.6, the symbol miss error probability increases due to the closed paths.



Fig. 3.7 Symbol miss error probability when multipath is very closed Fig. 3.8 shows the performance of different vehicles. The vehicle effect will not cause



Fig. 3.8 Symbol miss error probability of different vehicles

## 3.2 Carrier Frequency Synchronization

In wireless system, there are crystal oscillators in the transmitter and the receiver which are responsible to generate respective carrier frequency and sampling clock. Ideally, they should work in the same frequency, but it is impossible to find out two oscillators which can oscillate in the same frequency. The difference of oscillator frequency will induce the CFO effect. Because the OFDM systems use smaller subcarrier bandwidth, they are sensitive to the CFO effect which destroys the orthogonal characteristic of subcarriers. Thus, the mismatch problem of oscillators should be overcame to maintain the properties of OFDM. Usually, the CFO effect is divided into two parts: an integer part and a fractional part. The value of fractional part is restricted between [-0.5~0.5] subcarrier spacing. They are separately estimated by using different algorithms as discussed in the following sections.

#### 3.2.1 Effect of Carrier Frequency Offset

The ICFO causes the subcarriers index shift with an integer value. This shift

causes a wrong index of pilots. So it is necessary to compensate the ICFO effect for the usage of correct pilots in the channel estimation. On the other hand, the fractional carrier frequency offset (FCFO) induces the phase rotation of constellation and inter carrier interference (ICI) as shown in Fig. 3.9.



The effect of CFO in mathematics expression is described as follows. Assuming the carrier frequency of transmitter is equal to  $f_c$ ,  $\Delta f$  is the frequency error as a fraction of the subcarrier spacing, which is composed of ICFO $\Delta f_I$  and FCFO $\Delta f_F$ , the transmitted signal is

$$y(n) = s(n) \cdot e^{j2\pi f_c n} \tag{3.3}$$

The received signal with CFO is

$$s(t)e^{j2\pi f_c t} \times e^{-j2\pi (f_c + \Delta f)t} = s(t)e^{j2\pi \Delta f t}$$
(3.4)

The received signal in frequency domain is derived as

$$\widehat{X}(k) = \sum_{n=0}^{N_{sc}-1} x(n) \cdot e^{j2\pi n \perp f} e^{j2\pi k \frac{n}{N_{sc}}}$$

$$= \sum_{n=0}^{N_{sc}-1} x(n) \cdot e^{\frac{j2\pi n \perp fN_{sc}}{N_{sc}}} e^{j2\pi k \frac{n}{N_{sc}}}$$

$$= \sum_{n=0}^{N_{sc}-1} x(n) \cdot e^{\frac{j2\pi n(n_{l}+n_{f})}{N_{sc}}} e^{j2\pi k \frac{n}{N_{sc}}}$$

$$= \sum_{n=0}^{N_{sc}-1} x(n) \cdot e^{\frac{j2\pi (k+n_{l})n_{sc}}{N_{sc}}} e^{j2\pi n \frac{n_{f}}{N_{sc}}}.$$
(3.5)

According to Eqn. (3.5),  $e^{j2\pi(k+n_I)\frac{n}{N_{sc}}}$  clearly expresses that the ICFO (n<sub>I</sub> term) causes subcarrier index to shift an integer value and  $e^{j2\pi n\frac{n_f}{N_{sc}}}$  describes that the FCFO (n<sub>F</sub> term) causes an additional phase rotation of constellation in frequency domain. The angle caused by FCFO is a constant value, it is different with the phase rotation caused by SCO effect which will be mentioned later.

### 3.2.2 Integer Carrier Frequency Estimation

The method used to estimate ICFO has been roughly mentioned in section 3.1.2. According to [7][8] there are only seven possible values of ICFO. Because the preamble values are known in the receiver, it can be separately pre-shifted with these possible ICFO values and then correlate them with the received signals. The peak of the correlation result appears when the pre-shifted preamble value is the same with the received signals. Therefore, the correlation algorithm can be used both in the symbol boundary synchronization and the ICFO synchronization. This brings a benefit that it can be implemented using only one correlation bank operating in seven times of the sampling frequency to compute the seven possible integer frequency offset correlation results. However, because the loss from reducing the correlation length and quantization of coefficients and signals for consideration of hardware implementation, we must pay attention to derive a scheme to indentify the maximum correlation values among the seven correlation results. In order to solve this problem and not to have extra hardware overhead, the accumulation of correlation can be separated into two parts to increase the accuracy and in the same time not to destroy the correlation characteristics as shown in Eqn. (3.6).

$$R(d) = \sum_{n=0}^{N} r_{d+n} \cdot P_n^*$$
  
=  $\sum_{n=0}^{\frac{N}{2}-1} r_{d+n} \cdot P_n^* + \sum_{n=\frac{N}{2}}^{N} r_{d+n} \cdot P_n^*$  (3.6)

Therefore, we can decide the symbol boundary first and then accumulate the second correlation again. It does not need to increase the hardware cost in implementation, but can have the more data to correlate for determining ICFO value more accurately. The flow of calculation is described in Fig. 3.10.



Fig. 3.10 Flow chart of ICFO estimation algorithm

#### 3.2.3 Proposed Integer Carrier Frequency Offset Estimation

The performance of the above algorithm substantially decreases when the CFO is in the border of two integer offset. This is because the distance between the border and the two integer offsets is very close. The ICFO effect of the two integer values to time domain preamble is almost the same, so there are two peaks in the results of correlation as shown in Fig. 3.11. The correct CFO value is  $0.49 \ \Delta f$  which the value of ICFO value should equal 0, but the ICFO will be detected to be 1 if only considering the peak. It may easily get the wrong value due to the ambiguous peak caused by the interference of another antenna and noise. However, the value of FCFO can be estimated very accurate by using the characteristic of cyclic prefix. Thus, the accurate estimation of FCFO can give feedback information to help to fix the ICFO estimation better.



Fig. 3.11 Correlation when the CFO is in the border with 0.49  $\Delta f$  (SNR=9.4dB)

This proposed modified algorithm (ping-pong) is described as follows. Entire frequency offset region is separated into two parts, strong region and weak region depending on the distance to the integer value. As shown in Fig. 3.12, the strong region means that it is close to a single integer frequency, so the correlation result has strong reliability to determine ICFO by detecting the peak value. On the other hand, the weak region means that it is in the border of two integer offsets and the correlation result has weak reliability to determine ICFO by detecting peak value.



estimated FCFO value can be used to determine integer offset in weak region.

Fig. 3.12 ICFO region for ping-pong algorithm

For example, if the correct CFO is 1.42  $\triangle$ f and locates in the weak region, the correlation results may have two peaks at index 1 and 2 with index 2 has the largest correlation results. However the FCFO can be correctly estimated as 0.42. Therefore, we choose the ICFO to be two, then the overall CFO will become 2.42. But this is unreasonable because another maximum peak is at index 1 but not index 3. It means that the correct ICFO should locate in the interval of 1 and 2. Therefore, according to the value of FCFO, the estimated ICFO should be index 1 that has second maximum peak. It improves the risk of choosing wrong ICFO in weak region successfully and the performance is shown in Fig. 3.13.



Fig. 3.13 Error probability of modified ping-pong and original algorithms for ICFO

estimation (SNR=9.4dB)

#### 3.2.4 Fractional Carrier Frequency Offset Estimation

The phenomenon which is induced by FCFO and causes the phase rotation of constellation has been presented in section 3.2.1. Therefore, the transmitted data should be compensated for a proper estimated value before entering the FFT window. Many algorithms have been proposed to estimate FCFO in OFDM system, such as [20]. Because symbol synchronization step has successfully obtained the boundary index, the information of cyclic prefix index is also known. According to [21], we can use the repeating characteristic of the cyclic prefix to estimate FCFO value by correlating the received signal data with the cyclic prefix. Assuming the value of FCFO is  $\varepsilon \Delta f$ , then, the result of correlation, P, is

$$P = \sum_{m=0}^{L-1} r_{d+m}^* r_{d+m+N}$$
  
=  $\sum_{m=0}^{L-1} |r_{d+m}|^2 e^{j2\pi\varepsilon \Delta f N T_s}$  (3.7)

where d is the index of cyclic prefix, N is the symbol length, L is cyclic prefix length and  $T_s$  is the sample period. From Eqn. (3.7), we can find that the correlation result includes the FCFO term  $\varepsilon$  in the part of phase. Therefore, the  $\varepsilon$  can be obtained by calculating the phase of P, as shown in Eqn. (3.8) and Eqn. (3.9)

$$\hat{\phi} = phase(P)$$

$$= 2\pi\varepsilon \vartriangle f \cdot N \cdot T_{s}$$

$$= \tan^{-1} \frac{\operatorname{Im}\left\{\sum_{m=0}^{L-1} r_{d+m}^{*} r_{d+m+N}\right\}}{\operatorname{Re}\left\{\sum_{m=0}^{L-1} r_{d+m}^{*} r_{d+m+N}\right\}}$$
(3.8)

$$\mathcal{E} \bigtriangleup f = \frac{\phi}{2\pi NT_s}$$

$$\mathcal{E} = \frac{1}{2\pi} \cdot \hat{\phi}$$
(3.9)

#### 3.2.5 Derotator and NCO

After estimating the value of CFO, the received signal is needed to compensate an angle before entering FFT. A derotator is used to remove this frequency error. Assuming the transmitted signal without CFO is  $r_t(t)e^{j\omega_t t}$  in passband, and there is a sampling frequency offset  $\theta_r(t)$  in the receiver. Then the received signal in baseband is shown as

$$r_r(t) = r_t(t)e^{j(w_c + \theta_r(t))}$$
(3.10)

If the estimated frequency offset is  $\theta_{e}$ , the compensated signal can be described as

$$r_q(t) = r_t(t)e^{j(\theta_r - \theta_e)t}$$
(3.11)

It can be rewritten as

$$\operatorname{Re}(r_{q}(t)) = \operatorname{Re}\{r_{t}(t)e^{j\theta_{t}t}\} \cdot \cos\theta_{e} - \operatorname{Im}\{r_{t}(t)e^{j\theta_{t}t}\} \cdot \sin\theta_{e}$$
$$\operatorname{Im}(r_{q}(t)) = \operatorname{Re}\{r_{t}(t)e^{j\theta_{t}t}\} \cdot \sin\theta_{e} + \operatorname{Im}\{r_{t}(t)e^{j\theta_{t}t}\} \cdot \cos\theta_{e} \quad (3.12)$$

From Eqn.(3.12), the structure of a derotator can be derived as shown in Fig. 3.14



Fig. 3.14 The architecture of derotator



Fig. 3.15 Block diagram of NCO

On the other hand, the derotator needs a numerical controller oscillator (NCO) which generates the sine and cosine function for compensating CFO effect. NCO is a computing block rendering digital word sequences in time, which thereafter provides the different values depending on the estimated CFO value [23]. Fig. 3.15 shows the block diagram of NCO [25]. It contains two parts: phase accumulator (PA) and sinusoidal function generator (FG). In each clock cycle, the frequency control word, f, is added in the PA. Then the PA output is sent to the FG, which may be implemented by using look-up table (LUT) to yield the sine and cosine value. The output word length of PA, W<sub>F</sub>, is usually truncated to W<sub>A</sub> with W<sub>A</sub> < W<sub>F</sub>. But the truncation must conform to Eqn. (3.13) [23].

$$W_{A} > W_{FG} + 1 + \log_{2}(\pi)$$
 (3.13)

where  $W_{FG}$  is the word length of the function generator output.

#### 3.2.6 Simulation Result

Fig. 3.16 shows the performance of ICFO estimation with different twice correlation length. The accuracy of estimation increases with the longer length. To trade off the accuracy and complexity, we choose the correlation 300 to implement.



Fig. 3.16 Correlation results of different accumulation length (SNR=9.4dB)



Fig. 3.17 Overall performance of different vehicles and quantization

Fig. 3.17 shows the overall performance after channel estimation. High vehicle induces more Doppler effect to reduce the performance. On the other hand, the data is quantized to 10 bits to enter FFT block for hardware implementation. The damage of performance caused by quantization is also shown in Fig. 3.17.

### 3.3 SCO Estimation

#### 3.3.1 Effect of Sampling Clock Offset

The sampling clock offset (SCO) is caused due to the inconsistent sampling clock period between Digital to Analog Converter (DAC) in the transmitter and Analog to Digital Converter (ADC) in the receiver. The SCO has two main effects: One is the slow drift of symbol index as shown in Fig. 3.18. The amount of drift gradually increases by time and will rotate the phase of subcarriers. The other one causes loss of the orthogonality of the subcarriers. This is due to the ICI generated by the incorrect sampling indexes [22].



Fig. 3.18 Sampling clock is slower in receiver than in transmitter

Assuming the sampling clock is T in the transmitter and  $T(1+\Delta)$  in the receiver where  $\Delta T$  is the term of SCO. The normalized sampling error is defined as

$$t_{\Delta} = \frac{\Delta T}{T} \tag{3.14}$$

Then, the received signals after FFT with SCO effect,  $R_{l,k}$ , can be shown as

$$R_{l,k} = e^{j2\pi k t_{\Delta} l \frac{T_s}{T_u}} X_{l,k} \sin c(\pi k t_{\Delta}) H_{l,k} + W_{l,k} + N_{t_{\Delta}}(l,k)$$
(3.15)

where l is the symbol index, k is the subcarrier index,  $T_s$  is the period of an OFDM symbol,  $T_u$  is the period of the useful data portion,  $W_{l,k}$  is AWGN and  $N_{t_{\Delta}}$  is the

induced ICI due to SCO [22]. The first term,  $e^{j2\pi kt_{\Delta}t_{T_u}^{T_s}}$ , shows the phase rotation caused by SCO error. The angle increases by both subcarrier index and symbol index and differs with FCFO effect which has constant rotation angle. The last ICI term in Eqn. (3.15) is usually ignored if the  $t_{\Delta}$  is small enough.

#### 3.3.2 Sampling Clock Offset Estimation

The approach of estimating SCO is based on measuring the pilot subcarriers between two symbols. There are two important assumptions in this algorithm. One is number of pilot subcarriers in both halves of the spectrum should be the same. The second one is the channel does not change much within four symbols time due to the transmitting format of pilots. As shown in Eqn. (3.15), the received pilot subcarriers can be described as a simplified form:

$$R_{l,k} = H_k P_{l,k} e^{j2\pi k t_A l \frac{T_s}{T_u}}$$
(3.16)

where  $P_{l,k}$  is the pilot subcarrier. Then, the temporal correlation of the pilot subcarriers is shown in Eqn.(3.17)

$$Z_{l,k} = R_{l,k} R_{l-4,k}^{*}$$

$$= H_{k} P_{l,k} e^{j2\pi k t_{\Delta} l \frac{T_{s}}{T_{u}}} \left( H_{k} P_{l-1} e^{j2\pi k t_{\Delta} (l-4) \frac{T_{s}}{T_{u}}} \right)^{*}$$

$$= |H_{k}|^{2} |P_{l,k}|^{2} e^{j2\pi k t_{\Delta} (l-(l-4)) \frac{T_{s}}{T_{u}}}$$

$$= |H_{k}|^{2} |P_{l,k}|^{2} e^{j2\pi k t_{\Delta} 4 \frac{T_{s}}{T_{u}}}$$
(3.17)

The pilot subcarriers are divided into two sets: C1 denotes the set of pilot indices in

the left half (on negative subcarriers), and C2 denotes the set of pilot indices in the right half (on positive subcarriers). The separate cumulative phase of the two sets is as follows:

$$\phi_{1,l} = \measuredangle \left[ \sum_{k \in C_1} Z_{l,k} \right] \qquad \phi_{2,l} = \measuredangle \left[ \sum_{k \in C_2} Z_{l,k} \right]$$
(3.18)

Then, the estimated SCO can be derived in Eqn. (3.19)

$$\hat{t}_{a} = \frac{1}{2\pi (1 + \frac{N_{g}}{N})} \cdot \frac{1}{\frac{K}{2}} \cdot (\phi_{2,l} - \phi_{1,l}) \quad .$$
(3.19)

#### 3.3.3 Simulation Result

Fig. 3.19 shows the performance of SCO compensation. In the maximum offset value, 20ppm, the overall performance decreases about two orders than without SCO effect. The compensation can improve about one order in the high mobility environment, 120km/hr.



Fig. 3.19 Compensation of SCO effect

# Chapter 4

# Architecture and Hardware Design

### 4.1 Overview of Baseband Receiver

The overall block diagram of 802.16e OFDMA baseband is shown in Fig. 4.1. It mainly contains three parts: synchronization, FFT and channel estimation. In the following, we will focus on the synchronization circuits. The synchronization algorithms used have been discussed in Chapter 3. The main issue now is hardware complexity and architecture. For example, many complicated mathematic operations which cost mass of hardware and power are used in these algorithms, such as angle computation. It may be implemented by using look-up table method which needs a lot of memory to store the values of angle. Therefore, we use coordinate rotational digital computer (CORDIC) scheme instead of the memory based circuit. It will only use adders, shifters and multiplexers to realize the angle calculation. Additionally, some mathematic and logical simplifications also successfully reduce the hardware cost and power. The detailed methods about implementation will be discussed in the following sections.



Fig. 4.1 Overall block diagram of baseband

# 4.2 Architecture of Symbol Synchronization

#### 4.2.1 Original Architecture of Symbol Synchronization

As shown in Section 3.1, the symbol synchronization algorithm is based on the calculation of correlation, and Eqn. (3.2) describes this mathematic model. Therefore, a large complicated circuit is better to compute the correlation result. For the purpose of saving hardware cost, it is needed to truncate the length of correlation and word as mentioned in section 3.1.3. The basic computing component of Eqn. (3.2) is the complex multiplying operation and can be expressed as shown in Eqn. (4.1)

$$(A+Bi) \times (C-Di)$$

$$= (AC+BD) + i(BC-AD)$$

$$(4.1)$$

where, A and B are the real and imaginary part of the received signal separately. Similarly, C and D describe the conjugate of the preamble. Because the preamble values in time domain are quantized to -1 or 1, in other words, this means both of C and D are also -1 or 1. From Eqn. (4.1), each term can be simplified to add or subtract the value of the received signal. Furthermore, the XOR gate can be used to compute additive or subtractive calculation instead of using multiplier. First, the quantized coefficients are encoded 1 to 0 and -1 to 1. If the coefficient is encoded to 0, the XOR gate will not change the input signal, on the other hand, if the coefficient is encoded to 1, it will reverse the signal for subtraction. Thus, Eqn. (4.1) can be implemented using only adders, subtractors and XOR gates. The block diagram is shown in Fig. 4.2:



Fig. 4.2 Using XOR gates to do the correlation function

Additionally, the functional bock needs to compensate 1 to each part of the subtraction based on two's complement number. For this purpose, a lot of adders will be used. However, the preamble value is known in advance, it means that the information of the numbers of subtractions is also known in the receiver. Therefore, to add the compensative factor in each component becomes awkward. Summing all the factors to one term and putting it in the last step can also compensate the functions, but reduces the complexity. Fig. 4.3 describes the overall block diagram of the symbol synchronization. The received signal is quantized to two bits and stored in the registers. Then, the data in the registers use the correlation unit as shown in Fig. 4.2 to correlate with the preamble value. The summation of the correlation result and compensative factor is implemented by the CSA tree. Then the summation result is compared to the threshold. If the summation result is larger than threshold, the maximum value is changed to current summation result. Until the correlation with maximum

correlation result.



Fig. 4.3 Block diagram of symbol synchronization

#### 4.2.2 Modified Calculation Process Architecture

In the above architecture, it substantially reduces the usage of registers and multipliers due to the quantization. But there is still a drawback which can be improved in the hardware implementation. The original correlation algorithm includes subtraction as shown in Eqn. (4.1). The subtractive computing operation makes the calculation of signed number become necessary. Clearly, the computation of signed bits needs more hardware effort to overcome the problem of sign extension. Therefore, if the signed computation can be avoided, the consumption of hardware and power will be saved.

The algorithm in our design has two features. First, the total numbers of summing components are fixed to the taps of shift registers. Secondly, if the stored data in the register is 0, the result of its correlation will always be zero. According to these features, our idea is that if we calculate the numbers of 0 which are stored in the

registers, we will only need to calculate the 1's components of summing series. Because if the information of 0's and 1's components is known, then the information of -1's components can also be obtained at the same time. The example is shown in below:

Total number for correlation: 300

Number of 0: 150

Number of 1:86

Number of -1: 300-150-86

 $\text{Result}=86 + (300 - 150 - 86)^*(-1)$ 

=86\*2-300+150

Initially, because all of the registers store value 0, the initial value is equal to 300. Next, a monitor circuit detects if there is value 0 that removes from registers and if there is value 0 that enters registers in the next cycle. Using the initial numbers of 0 to substrate the differential value will get the current numbers of 0. On the other hand, a logical circuit judges that the result of correlation numbers of 1 is 1 or not. The logical circuit is designed by the truth table, and only NAND gate and NOR gates are required. The block diagram is shown in Fig. 4.4.



Fig. 4.4 One part of the correlation bank

# 4.3Architecture of Carrier Frequency Synchronization

As mentioned above, the algorithm of CFO synchronization is partitioned into two parts: ICFO and FCFO. The ICFO synchronization can be implemented by the same calculating circuits with the symbol timing synchronization as shown in Fig. 4.4, because they both use autocorrelation algorithm. In order to increase the accuracy of ICFO estimation, we accumulate the correlation results twice. Therefore, an additional circuit is necessary for storing the previous correlation result. This operation can be realized by the combination of registers and multiplexers as shown in Fig. 4.5. When the correlation result exceeds the present threshold, it means that the symbol boundary may be found. The correlation results are needed to be stored at this time slot. The circuit of symbol synchronization will produce a control signal to decide the operation mode of this circuit. If the control signal is positive, the seven registers will respectively store the current correlation results of the seven possible ICFO values like memory. On the contrary, if the control signal is negative, the registers will only pass the data like shift-registers. The control signal will always be positive until the stored data are needed to accumulate the second correlation result. Base on the proposed ICFO algorithm, the accumulated results are sent to the comparator circuit to store the two maximum values.



Fig. 4.5 Storage unit for storing previous correlation aresults

On the other hand, the FCFO synchronization uses the repeating characteristic of guard interval to estimate FCFO as shown in Eqn. (3.8) using cross correlation based algorithm. It needs to accumulate the correlation value of CP period. Because the symbol synchronization has finished, the accumulating region can be known. Therefore, we can only use a control signal and a register to implement instead of 128 registers. The structure of accumulating circuit is illustrated in Fig. 4.6.



Fig. 4.6 Structure of cross correlation for FCFO synchronization

As shown in Fig. 4.6, a 1024 delay-line is necessary for storing received data to correlate with. If we use shift register to realize the delay-line, it will cost a lot of area and power consumptions. Therefore, the usage of register file or SRAM is a better

choice than using shift register. According to [26], the twister memory operation can be used to reduce the area cost. Therefore, we have several possible candidates include dual-port SRAM, single-port SRAM, two-port register file, single-port register file, which are supported by UMC. The access operations of these schemes are illustrated as shown in Fig. 4.7.



Fig. 4.7 R/W operation of different memory modules

Table 4.1 Area and power comparisons of five delay-line methods

|       | Shift                 | 1K Dual              | 1K Two Port          | 512 Single Port      | 512 Single Port       |
|-------|-----------------------|----------------------|----------------------|----------------------|-----------------------|
|       | Register              | Port SRAM            | Register File        | SRAM ×2              | Register File ×2      |
| Area  | 0.166 mm <sup>2</sup> | $0.053 \text{ mm}^2$ | $0.047 \text{ mm}^2$ | $0.064 \text{ mm}^2$ | 0.038 mm <sup>2</sup> |
|       |                       |                      |                      | (0.032×2)            | (0.019×2)             |
| Power | 720 uw                | 160 uw               | 127 uw               | 248 uw               | 119 uw                |

The area and power comparisons for five delay-line schemes are listed in Table 4.1. The area of two 512 single port register file modules reduces 77% and 28 % area than shift register and one 1K dual port SRAM module respectively. In power consumption, the two 512 single port register file modules save about 83% and 26% than the other two schemes. Therefore, we adopt two 512 single port register modules to realize the delay-line circuit by using twister memory access operation in this thesis.

As shown in Fig. 4.6, a complex multiplier is necessary for the calculation of

correlation. Based on the original equation, it totally needs four multipliers and two adders to implement the operation. According to Eqn. (4.2) [29], one multiplier can be replaced by using the common term. Therefore, four multipliers can be reduced to three multipliers by adding two adders and one subtractor as shown in Fig. 4.8. Finally, this improvement will save more than 14% (8233 mm<sup>2</sup>->7078 mm<sup>2</sup>) area and 10% (30.6 uw -> 27.5 uw) power consumption.

$$(A+Bi) \times (C-Di)$$

$$= (AC+BD) + i(BC-AD)$$

$$= (AC+BD) + i[(A+B) \times (C-D) - AC + BD]$$

$$(4.2)$$



4.4 COordinate Rotational DIgital Computer

For the purpose of compensating the phase rotation caused by CFO effect, an angle generator is needed to provide the values of sine and cosine. The scheme of looking-up table (LUT) can be used, but it costs large memory to store data. Therefore, the coordinate rotational digital computer (CORDIC) algorithm which only uses adders and shifters works better for the consideration of area. The concept of CORDIC is proposed by J. E. Volder in 1959 [27]. The idea is derived from the vector rotation, which a vector ( $x_{in}$ , $y_{in}$ ) is rotated through an angle  $\theta$  in Eqn. (4.3) and can be rewritten as Eqn. (4.4).

$$\begin{bmatrix} X_{out} \\ Y_{out} \end{bmatrix} = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix} \begin{bmatrix} X_{in} \\ Y_{in} \end{bmatrix}$$
(4.3)

$$\begin{bmatrix} X_{out} \\ Y_{out} \end{bmatrix} = \cos\theta \begin{bmatrix} 1 & -\tan\theta \\ \tan\theta & 1 \end{bmatrix} \begin{bmatrix} X_{in} \\ Y_{in} \end{bmatrix}$$
(4.4)

If the tangent term is restricted to the power of 2, the multiplication will be replaced by a simple shift operation. For this reason, the rotation angle is decomposed to N parts as shown in Eqn. (4.5), which each term is composed of  $\tan^{-1}(2^{-i})$ .

$$\theta = \sum_{n=0}^{N-1} d_n \cdot \tan^{-1}(2^{-i}) + \xi$$
(4.5)

where  $\xi$  is the residual angle beyond the resolution of CORDIC and N is the number of iterations. Therefore, the rotation is completed by several iterations such as a windscreen wiper to rotate to the wanted values as shown in Fig. 4.9. The operation in every iteration is expressed as Eqn. (4.6),

$$\begin{bmatrix} X_{n+1} \\ Y_{n+1} \end{bmatrix} = K_n \begin{bmatrix} 1 & -d_n 2^{-n} \\ d_n 2^{-n} & 1 \end{bmatrix} \begin{bmatrix} X_n \\ Y_n \end{bmatrix}$$
(4.6)



Fig. 4.9 Iterative vector rotation

where  $d_n$  is the rotation sequence that indicates the direction of rotation and equal to +1 or -1,  $K_n = \cos(\tan^{-1}(2^{-i}))$  which scales the length of vector. After N iterations, the

product of the K<sub>n</sub>,  $K = \prod_{n=0}^{N-1} K_n = \prod_{n=0}^{N-1} (\sqrt{1+2^{-i}})$  [28], is called as K factor and can be calculated in advance. The Eqn. (4.6) can be simplified to as follows:

$$x_{n+1} = x_n - y_n \cdot d_n \cdot 2^{-n}$$
  

$$y_{n+1} = y_n + x_n \cdot d_n \cdot 2^{-n}$$
  

$$z_{n+1} = z_n - d_n \tan^{-1}(2^{-n})$$
(4.7)

where  $z_{n+1}$  is the residual angle in the  $n_{th}$  iteration.

The CORDIC algorithm contains two operation modes: the rotation mode and the vectoring mode. The main difference of the two operations is determined by the direction of rotation. In the rotation mode, it rotates a vector to a new one by an angle  $\theta$ . It tries to minimize the residual angle zn and the rotation sequence, dn, is obtained by the sign of zn. On the contrary, the vectoring mode calculates the angle between vector and x-axis. It tries to minimize the magnitude of y component to obtain the angle value and the rotation sequence is obtained by the opposite sign of yn. Eqn. (4.8).

rotation mode 
$$d_n = \begin{cases} sign[z_m] &, n = 0 \\ sign[z_{n-1}] &, n > 0 \end{cases}$$
(4.8)

vectoring mode 
$$d_n = \begin{cases} -sign[y_{in}] & , n = 0 \\ -sign[y_{n-1}] & , n > 0 \end{cases}$$

The CORDIC algorithm can be implemented by two methods: iterative structure and unrolled structure. Fig. 4.10(a) describes the iterative structure which uses only one operating unit to calculate repeatedly. It consists of the adder-subtractors, shifters and registers for storing the output data to be the beginning data in the next iteration. Thus, the hardware cost is economical due to use the same resources, but it needs an additional clock which operates at N times the data rate. Too many clock domains will increase the complexity and difficulty of implementation. Fig. 4.10(b) shows the
unrolled structure which unrolls the iterative structure to N operating units. The constant value in each stage is fixed and can be hardwired without using ROM to store. It also can be pipelined to achieve the higher data throughput rate. After comparing the two structures, the unrolled structure is adopted in our design.



The issue of low power is very important in current IC design. In this thesis, we adopt two schemes to reduce the power consumption. The first scheme is to reduce the switching activity of circuit. As shown in Fig. 4.4, mass of logic gates are used to compute the correlation results. Seven possible coefficients are needed to calculate in one data cycle. Each change of the coefficient induces switching activities of three gates. It can be improved by using the differential calculation. An additional NAND gate is added as shown in Fig. 4.11, where c is the specific coefficient of one possible ICFO and d are the possible differential coefficients of the other possible ICFO compared with c. If the value of d is 1, the calculating operation only needs to pass through one gate to get the correct result instead of three gates. Thus, the switching

activity will be successfully reduced.



Fig. 4.11 Adding NAND gate for reducing power in correlation bank

The other scheme is to turn off the idle circuits, which is well-known the most efficient scheme to save power. In other words, the circuits will be in the sleep mode until an enable signal awakes them. These control signals can be produced by a central controller. The state diagram of the overall block is described in Fig. 4.12



Fig. 4.12 State diagram of overall block

#### 4.6 Implementation Results

The overall circuits include symbol boundary detection, ICFO estimation, FCFO estimation, NCO and derotator in this thesis are synthesized using UMC 90nm process. The synthesis tool is Synopsys Design Complier and the synthesis results are shown in Table 4.2. There are two clock domains to serve the input data rate and the operating frequency of ICFO estimation, 11.2 MHz and 83.3 MHz respectively. Fig. 4.13 illustrates the area proportion of each block circuit.

| Process                   | UMC 90 nm                            |
|---------------------------|--------------------------------------|
| System required Speed     | 11.2 MHz                             |
| Combinational gate counts | 50806 (248789 um <sup>2</sup> )      |
| Sequential gate counts    | <b>31211 (146694 um<sup>2</sup>)</b> |
| Overall gate counts       | 82017 (395483 um <sup>2</sup> )      |
| Power                     | 12.5 mW                              |

Table 4.2 Synthesis results

Area proportion of each block circuit



Fig. 4.13 Area proportion of each block circuit

The synthesis results of the improvement of correlation bank from signed calculation to unsigned calculation as mentioned in section 4.2 are listed in Table 4.3. The modified structure saves about 28% area and 22% power.

|                     | Signed calculation | Unsigned calculation |
|---------------------|--------------------|----------------------|
| Process             | UMC 90 nm          | UMC 90 nm            |
| Operating frequency | 83.3 MHz           | 83.3 MHz             |
| Gate counts         | 31971              | 23261                |
| Power               | 5.3 mW             | 4.1 mW               |

 Table 4.3 Synthesis results of correlation bank

The comparisons between the original and proposed architectures are shown in Table 4.4. The improvements of calculating process and delay-line operation save 56% area cost and 63% power consumption

Table 4.4 Comparisons of design results

| Synthesis result using UMC 90 nm |                                     |                                   |  |
|----------------------------------|-------------------------------------|-----------------------------------|--|
|                                  | Original architecture               | Proposed architecture             |  |
| Clock domain 1                   | 11.9 MHz                            | 11.9 MHz                          |  |
| Clock domain 2                   | 83.3 MHz(11.9 MHz×7)                | 83.3 MHz(11.9 MHz×7)              |  |
| Gate Counts                      | 188,861 ( 888,216 um <sup>2</sup> ) | 82,017 (395,483 um <sup>2</sup> ) |  |
| Power                            | 35.3 mW                             | 12.5 mW                           |  |

# Chapter 5

### **Conclusion and Future Work**

In this thesis, we implement the synchronization architecture for 802.16e OFDMA standard. This architecture can support high mobility up to 120 km/hr and multiple transmitting antennas environment. According to the quantization of preamble and signals, the mass of multipliers can be simplified to only adders, and we accumulate the correlation results twice to increase the accuracy of ICFO estimation by using the same hardware. Furthermore, a ping-pong algorithm is proposed to increase the accuracy of ICFO synchronization about two orders. The improvement of calculating process from signed to unsigned saves 28% area cost and 22% power consumption. Finally the CFO effect is compensated by CORDIC based derotator by only using shifters and multiplexers. The overall proposed architecture saves about 56% area and 64% power consumption than the original architecture.

In the future, we can continue to implement the residual synchronizations such as preamble match, guard interval mode detection, and improve the performance of SCO estimation. Of course we will simultaneously concentrate on the issue of low power and low cost.

# Reference

- [1] IEEE 802.15.1 WPAN Press Kit, Jan. 2001
- [2] Wireless Medium Access Control (MAC) and Physical Layer (PHY) Specifications for Wireless Personal Area Networks, IEEE Std 802.15.1, 2002
- [3] Wireless Medium Access Control (MAC) and Physical Layer (PHY) Specifications: Higher Speed Physical. Layer Extension for the High Rate Wireless Personal Area Networks, IEEE Std 802.15.3, Sep. 2003
- [4] Wireless Medium Access Control (MAC) and Physical Layer (PHY) Specifications for Low-Rate Wireless Personal Area Networks (LR-WPAN), IEEE Std 802.15.4, Oct. 2003
- [5] BROADCOM, 802.11n: Next-Generation Wireless LAN Technology, White Paper, Apr. 2006
- [6] Stallings, W., "IEEE 802.11: Wireless LANs from a to n. Technical Paper," Volume 6, Issue 5, Sept.-Oct. 2004 pp. 32 – 37
- [7] Part 16: Air interface for Fixed Broadband Wireless Access Systems, IEEE Std 802.16-2004, Oct. 2004
- [8] Part 16: Air interface for Fixed and Mobile Broadband Wireless Access Systems, IEEE Std 802.16e-2005, Dec. 2005
- [9] J. G. Andrews, A. Ghosh, R. Muhamed, Fundamental of WiMAX, 1st ed. Prentice Hill, 2007
- [10] WiMAX Forum, "Mobile Mobile WiMAX-Part 1: A Technical Overview and Performance Evaluation," White Paper, Feb. 2006
- [11] http://www.mtaiwan.org.tw
- [12] H. Yaghoobi, "Scalable OFDMA Physical Layer in IEEE 802.16 Wireless MAN," Intel Technology Journal, Volume 8, Issue3, 2004.
- [13] R.V. Nee, OFDM for wireless multimedia communication, 1<sup>st</sup> ed. Artech House, 2000
- [14] Intel, "Introduction to mobile WiMAX Radio Access Technology: PHY and MAC Architecture," Technical Report, Dec. 2006
- [15] S. Ahmadi, "Introduction to mobile WiMAX Radio Access Technology: PHY and MAC Architecture," Technical Report, Dec. 2006
- [16] J. J. van de Beek, M. Sandell, M. Isaksson, and P. O. B"orjesson, "Low complex frame synchronization in OFDM systems," in Proc. IEEE Int. Conf. Universal Personal Commun., Nov. 1995, pp. 982–986.
- [17] B. Widrow, P. E. Mantey, L. J. Griffiths and B. B. Goode, "Adaptive

Antenna Systems," in Proc. IEEE, pp. 2143-2159, Dec. 1967.

- [18] B. D. Van Veen, K. M. Buckley, "Beamforming: A Versatile Approach to Spatial Filtering, IEEE ASSP," IEEE ASSP Mag., Apr. 1988Apr
- [19] S. M. Alamouti, "A simple transmit diversity technique for wireless communications," IEEE J. Sel. Areas Comm., vol. 16, Oct. 1998, pp. 1451-1458.
- [20] M.H. Hsieh, C.H. Wei, "A Low-Complexity Frame Synchronization and Frequency Offset Compensation Scheme for OFDM Systems over Fading Channels," IEEE Trans. on Vehicular Technology, vol. 48, no. 3, Sep. 1999, pp 1596-1609,
- [21] P. H. Moose, "A Technique for Orthogonal Frquency Division Multiplexing Frequency Offset Correction," IEEE Trans. Communication, vol. 42, no. 10, Oct. 1994, pp. 2908-2914
- [22] J. Terry and J. Heiskala, OFDM Wireless LANs: A Theoretical and Practical Guide, 1<sup>st</sup> ed. Sams, 2002
- [23] S. A. Fechtel, "OFDM Carrier and Sampling Frequency Synchronization and its Performance on Stationary and Mobile Channels," IEEE Trans. Consumer Electronics, vol. 46, no.3, Aug. 2000,pp.438-441
- [24] M. Dachroth, B. Hoppe, H. Meuth, U.H. Steiger, "High-speed Architecture and Hardware Implementation of a 16-bit 100-MHz Numerically Controlled Oscillator," in Proc. ESSCIRC'98, Sept. 1998, pp. 456-459
- [25] I. Janiszewski, B. Hoppe and H. Meuth, "VHDL-based Design and Design Methodology for Reusable High Performance Direct Digital Frequency Synthesizers," in Proc. Design Automation Conference, 2001, pp. 573 – 578
- [26] W. C. Liu, T. C. Wei, S. J. Jou, "Blind Mode/GI Detection and Coarse Symbol Synchronization for DVB-T/H," IEEE ISCAS, 2007, pp. 2092-2095.
- [27] J. E. Volder, "The CORDIC trigonometric computing technique," IRE Trans. Electron. Computers, vol. C-8, pp. 330–334, Sept. 1959.
- [28] J. Duprat and J.-M Muller, "The CORDIC Algorithm: New Results for Fast VLSI Implementation," IEEE Trans. Computers, vol. 42, no. 2, pp. 168-178 Feb. 1993.
- [29] K. W. Shin, B. S. Song, "A complex multiplier architecture based on redundant binary arithmetric," IEEE ISCAS, Vol. 3, June 1997, pp.1944-1947

## Biobibliography

姓名:林俊男

- 出生地:台灣省苗栗縣
- 出生日期:1982年10月12日
- 學歷: 1989.9~1995.6 苗栗縣苑裡鎮苑裡國小

1995.9~1998.6 苗栗縣苑裡鎮苑裡國中

1998.9~2001.6 國立台中一中

2001.9~2005.6 國立交通大學 電子工程學系 學士

2005.9~2007.10 國立交通大學 電子研究所 系統組 碩士

