## 國立交通大學

## 電子工程學系 電子研究所碩士班

## 碩士論文

應用於正交分頻多工技術為基礎之低複雜度



Study on Low Complexity Baseband

Frame Synchronization for OFDM Applications

學生: 張瑋哲

指導教授 : 李鎮宜 教授

中華民國九十四年七月

### 應用於正交分頻多工技術為基礎之低複雜度

### 接收端基頻框架同步器

## Study on Low Complexity Baseband Frame Synchronization for OFDM Applications

| 研 究 生:張瑋哲 | Student : Wei-Che Chang |
|-----------|-------------------------|
| 指導教授:李鎮宜  | Advisor : Chen-Yi Lee   |

國 立 交 通 大 學 電子工程學系 電子研究所 碩士班



Submitted to Institute of Electronics College of Electrical Engineering and Computer Science National Chiao Tung University in Partial Fulfillment of the Requirements for the Degree of Master of Science in

> Electronics Engineering July 2005

Hsinchu, Taiwan, Republic of China

中華民國九十四年七月

### 應用於正交分頻多工技術為基礎之低複雜度

### 接收端基頻框架同步器

研究生:張瑋哲

指導教授:李鎮宜 教授

### 摘要

在無線通訊的系統中,高速傳輸以及低功率消耗一向是最為關切的兩個研究主題,尤其 在近年來發展的超寬頻技術(UWB)中,在接受端的時域同步化需要超過 500MHz 的頻寬,應 用於這樣的高速設計,必需使用平行化架構來作資料處理,同時造成功率消耗的線性成長, 使得低功率消成為超寬頻技術發展中最大的挑戰。在本論文中,我們藉由改良的比對濾波器 (matched-filter)與動態門檻(dynamic threshold)提出應用於正交多頻分工技術(OFDM)之超寬 頻系統的低複雜度框架同步器。在這個設計中,我們使用可以降低比對濾波器複雜度和減少 暫存器存取資料次數的演算法來達到低複雜度與低功率消耗的需求,並保持框架同步器的誤 差在可接受的範圍之內。此外在平行架構下,不同於一般的設計用多套暫存器存取多重資料 流的資料來和多重的比對濾波器作運算,我們基於暫存器共用的觀念,將比對濾波器的資料 重新排列後,來讓多重的比對濾波器能夠同時分享一套暫存器的資料,以減少平行架構中所 需要的暫存器數量。根據模擬的結果,在802.11a的系統平台,我們提出的設計在10%PER 下所造成的誤差小於 0.35dB 的 SNR; 而在超寬頻技術的系統平台, 我們提出的設計在 8% PER 下所造成的誤差則是小於 0.45dB 的 SNR。而在硬體的實現上,我們使用.18 µ m 製程, 和一般使用平行架構達到 528MSample/s 的框架同步器相比,我們的設計不但能處理 528MSample/s 的資料,還可以節省 58%的功率消耗和 65%的硬體面積。

i

# Study on Low Complexity Baseband Frame Synchronization for OFDM Applications

Student : Wei-Che Chang

Advisor : Dr. Chen-Yi Lee

Department of Electronics Engineering Institute of Electronics National Chiao Tung University

### ABSTRACT

In wireless communication, high data rates and low power consumption are the main concerns to improve the transmission speed and extend the IC working time. In recent years, ultra-wideband (UWB) has received much attention as a high speed, low power wireless portable device. It requires over 500MSamples/s throughput in time domain synchronization and can be achieved by parallel architecture, leading high power dissipation increasing in linear. Therefore, low power issue becomes the challenge of UWB baseband design. In this thesis, a low-complexity frame synchronizer combining improved matched-filter and dynamic-threshold design is proposed for OFDM-based UWB system. It provides a methodology to reduce matched-filter complexity and redundant access of register-files with an acceptable performance loss. Based on the register-sharing algorithm, single register-files shares received data for parallel matched-filters are developed to achieve 528MSample/s throughput for the 480Mb/s UWB design. Simulation results show the synchronization loss of the propose design can be limited to 0.35dB SNR for 10% PER in IEEE 802.11a WLAN system and 0.45dB SNR for 8% PER of LDPC-COFDM and MB-OFDM UWB systems. In hardware implementation, the proposed design can save 58% power consumption and 65% area cost from the conventional design in 0.18  $\mu$  m CMOS process.

## 誌謝

不知不覺,在交大已經過了六年的時光,尤其是這兩年的研究生活,身為 SI2 研究室的一份子,過的相當的充實,也學習到了許多專業的知識和技術。

我要特別感謝李鎮宜教授親切的指導和建議,實驗室學長們熱誠 的提攜和討論研究,和同學們及學弟妹們的互相砥礪與合作,多虧了 大家不斷的給予我幫助,我才能化阻力為助力,順利的完成這本論文。

在這裡還要特別的謝謝一同研究UWB system 的軒字學長,瑞元 學長,林宏和婉君,和大家一起團隊合作時的努力和交流,讓我能在 這個研究領域不斷的成長和精進。還有菁哲學長和建青學長,當我在 硬體實現時遇到了瓶頸時,總是依靠著你們豐富的經驗助我度過難 關。和各位在一起的時光,相信是我一生無法忘懷的回憶。

最後還要感謝我的父母,謝謝你們無微不至的愛護,陪我完成了 這兩年的碩士學業。感謝我的姊姊以及好友們,在我心情低落的時 候,給我繼續向前衝刺的鼓勵及動力。僅將這篇論文獻給你們,表達 我內心最真切的感激。

#### 瑋哲 94年7月

# **Contents**

| ABSTRACT.        | • • • • • • • • • • • • • • • • • • • •    | ii          |
|------------------|--------------------------------------------|-------------|
| Contents         | •••••••••••••••••••••••••••••••••••••••    | iv          |
| List of Figure   | es                                         | vii         |
| List of Tables   | 5                                          | xii         |
| CHAPTER 1        | Introduction                               | 1           |
| 1.1 MOTIVATION   | Γ                                          | 1           |
| 1.2 REVIEWS OF   | THE FRAME SYNCHRONIZER DESIGN              | 2           |
| 1.3 Introduction | ON TO OFDM SYSTEM                          |             |
| 1.4 OUTLINE OF   | THIS THESIS                                | 7           |
| CHAPTER 2        | System Platform                            | 9           |
| 2.1 IEEE 802.11  | A PHY                                      | 9           |
| 2.1.1            | System Platform                            | 9           |
| 2.1.2            | Frame Format                               |             |
| 2.2 Ultra-Wide   | EBAND SYSTEM                               |             |
| 2.2.1            | System Platform                            |             |
| 2.2.2            | Frame Format                               |             |
| 2.3 SIMULATED    | CHANNEL MODEL                              |             |
| 2.3.1            | Multi-Path Fading Channel                  |             |
| 2.3.2            | AWGN Model                                 |             |
| 2.3.3            | Carrier Frequency Offset Model             |             |
| 2.3.4            | Sampling Clock Offset Model                |             |
| <b>CHAPTER 3</b> | <b>3</b> A Low Complexity Frame Synchroniz | er for OFDM |
|                  | Application                                |             |

| 3.1 | FRAME SYN | NCHRONIZER DATA FLOW                         | 23    |
|-----|-----------|----------------------------------------------|-------|
|     | 3.1.1     | Packet Detection                             | 24    |
|     | 3.1.2     | FFT Window Detection                         |       |
| 3.2 | PROPOSED  | Algorithm                                    | 29    |
|     | 3.2.1     | Most-Significant Taps Scheme                 | 29    |
|     | 3.2.2     | Quantization Approach                        |       |
| CHA | APTER     | 4 A Low Complexity and High Throughput       | Frame |
|     |           | Synchronizer for OFDM-Based UWB System.      |       |
| 4.1 | Motivatio | N                                            |       |
| 4.2 | LDPC-CO   | FDM Design                                   |       |
|     | 4.2.1     | Frame Synchronizer Flow                      |       |
|     |           | 4.2.1.1 Packet Detection                     | 37    |
|     |           | 4.2.1. 2 FFT Window Detection                |       |
|     |           | 4.2.1. 3 Preamble Timing Detection           |       |
|     | 4.2.2     | Proposed Algorithm                           | 40    |
|     |           | 4.2.2. 1 Tap-Reduction Scheme                | 40    |
|     |           | 4.2.2. 2 Register-Sharing Algorithm          | 44    |
|     |           | 4.2.2. 3 Dynamic Threshold Design            | 47    |
| 4.3 | Multi-Ban | ND OFDM DESIGN                               | 48    |
|     | 4.3.1     | Frame Synchronizer (MB-OFDM)Flow :           | 50    |
|     | 4.3.2     | Proposed Algorithm                           | 51    |
|     |           | 4.3.2. 1 Training AGC                        | 51    |
|     |           | 4.3.2. 2 Band Detection                      | 54    |
|     |           | 4.3.2. 3 Other Function Block                | 56    |
| CHA | APTER     | 5 Simulation Result and Performance Analysis | 58    |
| 5.1 | SIMULATIO | N OF IEEE 802.11A System                     | 58    |
| 5.2 | SIMULATIO | N RESULT OF LDPC-COFDM SYSTEM                | 69    |
|     | 5.2.1     | Frame Error Rate of Tap-Reduction Scheme     | 69    |

| 5.2.2      | .2 Performance of Dynamic Threshold                     | 73   |
|------------|---------------------------------------------------------|------|
| 5.2.2      | .3 System Performance                                   | 74   |
| 5.3 SIMULA | ATION RESULT OF MB-OFDM SYSTEM                          | 76   |
| 5.3.       | .1 Boundary Variation Distribution                      | 76   |
| 5.3.2      | .2 System Performance                                   | 88   |
| CHAPTE     | <b>CR 6</b> Hardware Implementation and Measured Result | 94   |
| 6.1 DESIGN | ARCHITECTURE                                            | 94   |
| 6.1.       | .1 Detail Architecture of Tap-Reduction Matched-Filter  | 97   |
| 6.1.2      | .2 Detail Architecture of Shared Auto-Correlator        | 98   |
| 6.1.       | .3 Address-Based Register-Files                         | .100 |
| 6.2 HARDW  | VARE MEASURED RESULT                                    | .101 |
| 6.3 OFDM-  | -BASED UWB BASEBAND TRANSCEIVER                         | .102 |
| CHAPTE     | CR 7 Conclusion and Future Work                         | 104  |
| Bibliograp | phy.                                                    | 106  |
|            |                                                         |      |

# List of Figures

| FIG. 1.1 SPECTRUM OF SINGLE-CARRIER SYSTEM.                                                 | 4  |
|---------------------------------------------------------------------------------------------|----|
| FIG. 1.2 Spectrum of conventional multi-carrier system.                                     | 4  |
| FIG. 1.3 Spectrum of OFDM system                                                            | 4  |
| FIG. 1.4 Use IDFT/DFT FOR OFDM MODULATION/DEMODULATION                                      | 5  |
| FIG. 1.5 Use CP as GI to prevent ISI and maintain circular convolution                      | 6  |
| FIG. 1.6 SIMPLIFIED BLOCK DIAGRAM OF OFDM SYSTEM                                            | 7  |
| FIG. 2.1 IEEE 802.11A SYSTEM PLATFORM                                                       | 10 |
| FIG. 2.2 PPDU FRAME FORMAT OF IEEE 802.11A PHY                                              | 12 |
| FIG. 2.3 PLCP PREAMBLE FORMAT                                                               | 12 |
| FIG. 2.4 An example of MB-OFDM system for TFC $(1 \cdot 2 \cdot 3 \cdot 1 \cdot 2 \cdot 3)$ | 14 |
| FIG. 2.5 System block diagram of OFDM based UWB system                                      | 16 |
| FIG. 2.6 FRAME FORMAT OF MB-OFDM UWB SYSTEM                                                 | 16 |
| FIG. 2.7 CHANNEL MODEL DATA FLOW OF SIMULATION PLATFORM                                     | 17 |
| FIG. 2.8 Multi-path interference and ISI effect                                             | 18 |
| FIG. 2.9 IEEE 802.11A CHANNEL IMPULSE RESPONSE                                              | 19 |
| FIG. 2.10 UWB CHANNEL IMPULSE RESPONSE                                                      | 19 |

| FIG. 2.11 LINEAR PHASE SHIFT CAUSED BY CFO                                   | 21 |
|------------------------------------------------------------------------------|----|
| FIG. 2.12 SCO EFFECT IN TIME DOMAIN AND FREQUENCY DOMAIN                     | 22 |
| FIG. 3.1 FRAME SYNCHRONIZER DATA FLOW                                        | 23 |
| FIG. 3.2 EXAMPLE OF PACKET DETECTION IN PROPOSED DESIGN                      | 25 |
| FIG. 3.3 FFT WINDOW DETECTION IN AWGN AND MULTI-PATH CHANNEL                 | 27 |
| FIG. 3.4 FFT WINDOW DETECTION IN AWGN AND MULTI-PATH CHANNEL                 | 29 |
| FIG. 3.5 Power distribution of $C_0 \sim C_{63}$ and $S_{[1]} \sim S_{[64]}$ |    |
| FIG. 3.6 ANALYSIS OF MOST SIGNIFICANT TAP NUMBER VERSUS POWER RATIO          | 31 |
| FIG. 3.7 TAP POWER ANALYSIS OF QUANTIZED APPROACH                            | 32 |
| FIG. 3.8 FER BETWEEN CONVENTIONAL AND QUANTIZATION APPROACH                  |    |
| FIG. 4.1 FRAME SYNCHRONIZER FLOW                                             |    |
| FIG. 4.2 PACKET DETECTION FLOW                                               |    |
| FIG. 4.3 PREAMBLE TIMING DETECT FLOW                                         | 40 |
| FIG. 4.4 DATA FLOW OF CONVENTIONAL DESIGN WITH 128 TAPS ( $W=1$ )            | 42 |
| FIG. 4.5 DATA FLOW OF TAP-REDUCTION SCHEME WITH 32 TAPS ( $W=4$ , $J=3$ )    | 43 |
| FIG. 4.6 EXAMPLE OF TAP-REDUCTION SCHEME WITH PARALLELISM                    | 44 |
| FIG. 4.7 DATA FLOW EXAMPLE OF THE PROPOSED DESIGN                            | 45 |
| FIG. 4.8 DATA FLOW OF REGISTER-SHARING ALGORITHM WITH 32 TAPS                | 47 |
| FIG. 4.9 BASEBAND RECEIVED DATA OF LDPC-COFDM SYSTEM                         | 49 |
| FIG. 4.10 BASEBAND RECEIVED DATA OF MB-OFDM SYSTEM                           | 50 |
| FIG. 4.11 FRAME SYNCHRONIZER FLOW OF MB-OFDM UWB SYSTEM                      | 50 |

| FIG. 4.12 DETAIL DATA FLOW OF TRAINING AGC                                 |        |
|----------------------------------------------------------------------------|--------|
| FIG. 4.13 Accumulated power of continuous 128 samples                      | 54     |
| FIG. 5.1 PER OF PERFECT FRAME SYNCHRONIZATION AT 6MB/S DATA RATE           | 59     |
| FIG. 5.2 FER OF PURE AWGN CHANNEL, CFO=0KHz, RMS=0NS                       | 59     |
| FIG. 5.3 FER OF IEEE-FADING CHANNEL: RMS=100NS, CFO=0KHz                   | 60     |
| FIG. 5.4 FER OF IEEE-FADING CHANNEL: RMS=150NS, CFO=0KHz                   | 60     |
| FIG. 5.5 FER OF AWGN CHANNEL WITH CFO=20KHz                                | 61     |
| FIG. 5.6 FER OF AWGN CHANNEL WITH CFO=100KHz                               | 61     |
| FIG. 5.7 FER OF AWGN CHANNEL WITH CFO=200KHz                               |        |
| FIG. 5.8 FER OF IEEE-FADING CHANNEL: CFO=20KHz, RMS=100ns                  |        |
| FIG. 5.9 FER OF IEEE FADING CHANNEL: CFO=20KHz, RMS=150NS                  | 63     |
| FIG. 5.10 FER OF IEEE FADING CHANNEL: CFO=100KHz, RMS=150NS                | 63     |
| FIG. 5.11 PER OF IEEE FADING CHANNEL WITH RMS DELAY SPREAD=100NS           | 64     |
| FIG. 5.12 PER OF IEEE FADING CHANNEL WITH RMS DELAY SPREAD=150NS           | 64     |
| FIG. 5.13 TAP NUMBER VERSUS FFT WINDOW DETECTION                           | 69     |
| FIG. 5.14 TAP-NUMBER VERSUS FRAMER SYNCHRONIZER                            | 70     |
| FIG. 5.15 TAP-NUMBER VERSUS PER                                            | 71     |
| FIG. 5.16 SIMULATED THRESHOLD VALUE OF PREAMBLE TIMING DETECTION           | 72     |
| FIG. 5.17 PERFORMANCE OF DYNAMIC AND FIXED THRESHOLD DESIGN                | 72     |
| FIG. 5.18 AWGN CHANNEL, CFO=0KHz, SCO=0 PPM                                | 75     |
| FIG. 5.19 MULTI-PATH CHANNEL RMS DELAY SPREAD=5NS [21], CFO=400KHz, SCO=40 | )ppm75 |

FIG. 5.20 BOUNDARY VARIATION IN AWGN CHANNEL WITH CFO=400KHz, SNR=2DB ......79 FIG. 5.21 BOUNDARY VARIATION IN AWGN CHANNEL WITH CFO=400KHz, SNR=20DB ......79 FIG. 5.22 BOUNDARY VARIATION IN ORIGINAL CM1 CHANNEL, CFO=400KHz, SNR=2DB......80 FIG. 5.24 BOUNDARY VARIATION IN ORIGINAL CM1 CHANNEL, CFO=400KHz, SNR=20DB......81 FIG. 5.25 BOUNDARY VARIATION IN BEST 90% CM1 CHANNEL CFO=400KHz, SNR=20DB ......81 FIG. 5.26 BOUNDARY VARIATION IN ORIGINAL CM2 CHANNEL, CFO=400KHz, SNR=2DB......82 FIG. 5.27 BOUNDARY VARIATION IN BEST 90% CM2 CHANNEL, CFO=400KHz, SNR=2DB......82 FIG. 5.28 BOUNDARY VARIATION IN ORIGINAL CM2 CHANNEL, CFO=400KHz, SNR=20DB.........83 FIG. 5.29 BOUNDARY VARIATION IN BEST 90% CM2 CHANNEL, CFO=400KHz, SNR=20dB........83 FIG. 5.30 BOUNDARY VARIATION IN ORIGINAL CM3 CHANNEL, CFO=400KHz, SNR=2DB......84 FIG. 5.31 BOUNDARY VARIATION IN BEST 90% CM3 CHANNEL, CFO=400KHz, SNR=2DB...........84 FIG. 5.32 BOUNDARY VARIATION IN ORIGINAL CM3 CHANNEL, CFO=400KHz, SNR=20DB.......85 FIG. 5.33 BOUNDARY VARIATION IN BEST 90% CM3 CHANNEL, CFO=400KHz, SNR=20DB.......85 FIG. 5.34 BOUNDARY VARIATION IN ORIGINAL CM4 CHANNEL, CFO=400KHz, SNR=2DB......86 FIG. 5.36 BOUNDARY VARIATION IN ORIGINAL CM4 CHANNEL, CFO=400KHz, SNR=20DB............87 FIG. 5.37 BOUNDARY VARIATION IN BEST 90% CM4 CHANNEL, CFO=400KHz, SNR=20DB........87 FIG. 5.38 PER OF CM1 CHANNEL AT DATA RATE=110 MB/S, CFO=400KHz, SCO=40PPM ......90 FIG. 5.39 PER OF CM2 CHANNEL AT DATA RATE=110 MB/s, CFO=400KHz, SCO=40PPM ......90 FIG. 5.40 PER OF CM3 CHANNEL AT DATA RATE=110 MB/s, CFO=400KHz, SCO=40PPM ......91

| FIG. 5.41 PER OF CM4 CHANNEL AT DATA RATE=110 MB/s, CFO=400KHz, SCO=40PPM | 91  |
|---------------------------------------------------------------------------|-----|
| FIG. 5.42 PER AT 110~480 MB/S DATA RATE FOR REQUIRED WORST CM CHANNEL     | 92  |
| FIG. 5.43 PER VERSUS TRANSMISSION DISTANCE AT CFO=400KHz, SCO=40ppm       | 93  |
| FIG. 6.1 ARCHITECTURE OF PROPOSED FRAME SYNCHRONIZER                      | 95  |
| FIG. 6.2 DETAIL ARCHITECTURE OF TAP-REDUCTION MATCHED-FILTER              | 96  |
| FIG. 6.3 DETAIL ARCHITECTURE OF SHARED AUTO-CORRELATOR                    | 99  |
| FIG. 6.4 Architecture of Address-Based Register-Files                     | 100 |
| FIG. 6.5 MICROPHOTO OF THE UWB TRANSCEIVER CHIP IN 0.18UM PROCESS         | 103 |



# List of Tables

| TABLE 2.1 IEEE 802.11a PHY system parameters    11                                     |
|----------------------------------------------------------------------------------------|
| TABLE 2.2 IEEE 802.11a PHY DATA RATE DEPENDENT PARAMETERS       11                     |
| TABLE 2.3 TIMING PARAMETERS OF PLCP PREAMBLE                                           |
| TABLE 2.4 LDPC-COFDM SYSTEM SPEC    14                                                 |
| TABLE 2.5 REQUIREMENT FOR 8% PER OF LDPC-COFDM SYSTEM    14                            |
| TABLE 2.6 MB-OFDM SYSTEM SPEC   15                                                     |
| TABLE 2.7 REQUIREMENT FOR 8% PER OF MB-OFDM SYSTEM                                     |
| TABLE 5.1 SNR LOSS FROM CFO=0 TO 20KHz AT IEEE FADING CHANNEL DELAY SPREAD=100NS.      |
|                                                                                        |
| TABLE 5.2 SNR LOSS FROM CFO=0 TO 20KHz AT IEEE FADING CHANNEL DELAY SPREAD=150NS       |
|                                                                                        |
| TABLE 5.3 SNR loss from CFO=0 to 100KHz at IEEE fading channel delay spread=150ns      |
|                                                                                        |
| TABLE 5.4 SNR LOSS OF PROPOSED DESIGN FOR 8% PER    74                                 |
| TABLE 5.5 REQUIRED SNR FOR PER=8% OF CM1~CM4 AT 110MB/S DATA RATE                      |
| TABLE 5.6 Performance of proposed design for 8% PER of $90_{TH}$ percentile CM channel |
| REALIZATION                                                                            |

| TABLE 5.7 TRANSMISSION DISTANCE OF PROPOSED DESIGN                        | 93  |
|---------------------------------------------------------------------------|-----|
| TABLE 6.1 Register-files cost of the conventional and the proposed design | 97  |
| TABLE 6.2 GATE-COUNT COST OF THE PROPOSED FRAME SYNCHRONIZER              | 101 |
| TABLE 6.3 AREA COST COMPARISON (0.18UM CELL LIBRARY)                      |     |
| TABLE 6.4 POWER CONSUMMATION COMPARISON (POST-LAYOUT SIMULATION)          |     |
| TABLE 6.5 UWB TRANSCEIVER CHIP SUMMARY                                    |     |



## **CHAPTER 1**

## Introduction

In this chapter, we describe the motivation for researching low complexity frame synchronization of OFDM system. The differences between current approaches and proposed design will be shown also. In the end of this chapter, we list the outline of this thesis.

### 1.1 Motivation

In recent years, orthogonal frequency division multiplexing (OFDM) has become an important digital multi-carrier transmission scheme [1~2]. Because of its high bandwidth efficiency and robustness in multi-path environments, OFDM system is wildly applied for high-speed wireless communication. However, it also has several disadvantages, such as high sensitivity to synchronization errors and high hardware complexity of baseband transceiver. The objective of this thesis is to propose a low complexity frame synchronizer of OFDM based wireless communication that can greatly reduce hardware implementation cost with acceptable performance loss from conventional frame synchronizer. In conventional design, matched-filter is the most hardware cost block that contains over 60% gate count of total frame synchronizer [3~4]. In general, OFDM systems with 64 or 128 point FFT use matched-filter with 64 or 128 taps to detect FFT window boundary, such as IEEE 802.11a [5], hiperLAN/2 [6], and developing UWB system [7~8]. Therefore, we propose some efficient schemes to maintain required taps of matched-filter as few as possible and remove other taps from matched-filter to save design complexity. Moreover, UWB communication has received much attention as a high speed, low cost wireless LAN implementation in short distance since FCC allowed spectrum from 3.1GHz to 10.6GHz, total 7.5GHz band for UWB devices in 2002. It requires over hundreds of MS/s bandwidth synchronizer for OFDM based system. Such high throughput design was proposed to realize by parallel approaches with multiple matched-filters, leading to high power consumption of frame synchronization [9~10]. However, by applying the proposed schemes to reduce the complexity of matched-filers and required size of register-files, almost 58% power consumption and 65% area can be saved from the conventional design. The SNR loss in packet error rate (PER) simulation is restricted to less than 0.5 dB compared with perfect frame synchronizer (frame error rate=0) in order to maintain system performance.

# **1.2 Reviews of The Frame Synchronizer Design**

In OFDM system, frame synchronizer is the first function block of baseband receiver for data processing. Basically, it uses correlation algorithms and training symbols of PLCP preamble to find out the timing of OFDM symbols before true transferred data [11]. Since OFDM system is seriously sensitive to synchronization errors, especially at high data rate transferring, frame synchronizer requires high accuracy to prevent dominating system performance. In general, frame synchronizer composes packet detection and FFT window detection. Packet detection, known as coarse timing synchronization, detects the coming of the valid packet by evaluating the periodic training symbols [12]. To detect such a periodicity, actual incoming signals will be compared with the delayed version of the same signals by applying auto-correlation algorithm. Sometimes it also

detects the last training symbol to decide the end of PLCP preamble. FFT window detection, known as fine timing synchronization, cuts the start boundary of FFT window in OFDM symbols to remove GI by comparing the incoming signals with known value of training symbols. It computes the corresponding timing metric by a matched-filter based on the cross-correlation algorithm in one OFDM symbol [13]. The index of the maximum timing metric will be seen as the estimated FFT window boundary. Therefore, frame synchronizer can be done successfully by combining the two main function blocks.

## 1.3 Introduction to OFDM System

OFDM system was drawn firstly by Chang in 1966 of band-limited signals for multi-channel data transmission [1]. The main approach of multi-carrier system is dividing original bandwidth **1996** into a lot number of parallel sub-bands to transmit data simultaneously. But to avoid interference caused by signals in adjacent bands, it requires sufficiently guard bandwidth between the separated sub-bands and decreases bandwidth efficiency. Therefore, OFDM systems use sub-carriers overlapping with each other but maintaining orthogonal property to improve bandwidth efficiency. Moreover, by adding guard interval for cyclic OFDM symbol extension to reserve the orthogonality in multi-path fading channel, influence of inter symbol interference (ISI) will be resolved. With the advantages of high bandwidth efficiency and the robustness in multi-path environment, OFDM system has become more attractively for new generation communication systems, such as digital subscribe lines (DSL), digital audio broadcasting (DAB) [14], digital video broadcasting (DVB) [15], high-speed wireless local area network (WLAN) like IEEE 802.11a and Hiperlan/2, and the developing Ultra-Wideband (UWB) systems.



FIG. 1.1 Spectrum of single-carrier system.



FIG. 1.2 Spectrum of conventional multi-carrier system.



FIG. 1.3 Spectrum of OFDM system

The basic idea of OFDM system is shown in the following. FIG 1.1 is the spectrum of the serial system with one carrier. Suppose it transfers data in the time interval  $T_s$ , the spectrum in frequency domain will have bandwidth= $2 \times f_s$ , where  $f_s = 1/T_s$ . Similarly, the spectrum of conventional multi-carrier system with 5 subcarriers is shown as FIG 1.2. To divide available spectrum of subcarriers more efficiently, OFDM system overlaps subcarriers to save bandwidth without ICI by maintaining orthogonal property of the subcarriers as FIG 1.3. For the conventional multi-carrier system with 'N' sub-carriers, the required bandwidth is equal to  $(2N+1)f_s$ . But for OFDM system, the required bandwidth is only  $(N+1)f_s$ , saving almost 50% from the multi-carrier system as  $N \to \infty$ .



FIG. 1.4 Use IDFT/DFT for OFDM modulation/demodulation

However, OFDM system needs large number of sinusoidal oscillators to obtain orthogonal transformation until Weinstein and Ebert suggested using discrete Fourier transform (DFT) to replace the required oscillators, reducing implementation complexity of OFDM modem [16]. The OFDM modulation and demodulation by using Inverse DFT (IDFT) and DFT is shown as FIG 1.4. In realty, the IDFT/DFT is replaced by inverse fast Fourier transform (IFFT)/fast Fourier transform (FFT) with proper size to reduce hardware cost.

After IFFT, cyclic prefix (CP) will be added in front of the original FFT window as guard interval (GI) proposed by Peled and Ruiz. It makes the linear convolution with the channel impulse response similar to a circular convolution as FIG 1.5. Since circular convolution in time domain is equivalent to multiplication in the DFT domain. Orthogonality of subcarriers distorted by multi-path channel can be easily recovered with an equalizer to avoid ICI. The length of GI should be set longer than the expected delay spread of multi-path environment. Otherwise, ISI influence will exist.



FIG. 1.5 Use CP as GI to prevent ISI and maintain circular convolution

FIG 1.6 shows the simplified block diagram of OFDM transceiver. At first is FEC coder. It corrects the errors at weak subcarriers caused by frequency-selective-fading to reduce error probability. The trade off is the reduction of data rate by transmitting additional encoded data. Then QAM mapping increases the data rates of system, decreasing the noise margins of transferred as trade off. After QAM mapping, IFFT and GI insertion introduced previously complete the OFDM modulation.



FIG. 1.6 Simplified block diagram of OFDM system

## 1.4 Outline of This Thesis

In this thesis, Chapter 2 introduces the simulation platform and system specification, including IEEE 802.11a WLAN, LDPC-COFDM UWB system, and multi-band Viterbi COFDM (MB-OFDM) UWB system. The proposed algorithms according to different system requirements will be described in Chapter 3 and Chapter 4. In Chapter 3, we focus on a common throughput (less than 100MHz) frame synchronizer and use IEEE 802.11a WLAN for case study. In Chapter 4, we focus on a high throughput (greater than 500MHz) frame synchronizer and use LDPC-COFDM

and MB-OFDM UWB system for case study. The simulation result and performance analysis of our proposed design will be discussed individually in Chapter 5 for three different system platforms introduced in Chapter 2. Chapter 6 shows the architecture of proposed design and its hardware implementation result. Finally, conclusion and future work will be given in Chapter 7.



## **CHAPTER 2**

## System Platform

In this chapter, system platforms used for our case study will be introduced. The first is constructed according to IEEE 802.11a physical layer (PHY), finalized by IEEE 802.11 Wireless LAN committee in November 1999. It is an indoor wireless local area work (LAN) data communication in the 5GHz band. Others belong to OFDM based UWB system, including LDPC-COFDM system [8] and MB-OFDM system [7]. The system specifications of the two system platforms will be introduced individually.

# 2.1 IEEE 802.11a PHY

### 2.1.1 System Platform



The system platform diagram of our IEEE 802.11a transceiver PHY is shown as FIG 2.1. The transmitter contains two main function blocks : OFDM modulation and forward-error correction (FEC) coding. The OFDM modulation has 64-point DFT with 4 kinds modulation methods listed in TABLE 2.1. The FEC coding supports three coding rates: 1/2, 2/3 and 3/4. The receiver contains three main function blocks: synchronization, OFDM demodulation and FEC decoding. Synchronization compensates the received signals degraded by channel effects. The detail channel effects will be discussed in section 2.3. After synchronization, the OFDM demodulation transfers time domain signals into frequency domain sub-carriers and FEC decoding corrects the error data caused by channel effects.



The major system parameters of IEEE 802.11a PHY are listed as TABLE 2.1. It required 20MHz bandwidth to transfer data. With 4 kinds modulations and 3 coding rates, the supported data rates are from 6M bits/s to 54M bit/s. The detail modulation parameters of supported data rates are listed as TABLE 2.2. For each transferred OFDM symbol, it has 48 data sub-carriers and 4 pilot sub-carriers, total 52 used sub-carriers modulated by 64–point FFT/IFFT. The last 16 points of IFFT outputs will be appended to the OFDM symbol as guard interval to retain the cyclic prefix property of FFT symbol. The performance requirement is less than 10% packet error rate (PER) according to the IEEE 802.11a SPEC.

| Required bandwidth                             | 20MHz                             |
|------------------------------------------------|-----------------------------------|
| Date rate (Mbits/s)                            | 6, 9, 12, 18, 24, 32, 48, 54      |
| Modulation method                              | BPSK, QPSK, 16QAM, 64QAM          |
| Error correct code                             | K=7(64 states convolutional code) |
| FEC coding rate (R)                            | 1/2, 2/3, 3/4                     |
| FFT size (N)                                   | 64                                |
| Number of used sub-carriers (N <sub>ST</sub> ) | 52                                |
| Number of data carriers (N <sub>SP</sub> )     | 48                                |
| Number of pilot carriers (N <sub>SD</sub> )    | 4                                 |
| OFDM symbol duration                           | 4.0 us                            |
| IFFT/FFT period (T <sub>FFT</sub> )            | 3.2 us                            |
| GI duration (T <sub>GI</sub> )                 | 0.8us (T <sub>FFT</sub> /4)       |
| Packet Error Rate (PER) performance            | $\leq 10\%$                       |

TABLE 2.1 IEEE 802.11a PHY system parameters

|          |            |        |              | 8 -                  |              |          |
|----------|------------|--------|--------------|----------------------|--------------|----------|
| Data     |            | En.    | Codedse      | Coded bits           | Data bits    |          |
| rato     | Modulation | Coding | bits per     | per OFDM             | per OFDM     | Required |
|          | wouldton   | Rate   | subcarrier   | symbol               | symbol       | SNR      |
| (IVID/S) |            |        | $(N_{BPSC})$ | (N <sub>CBPS</sub> ) | $(N_{DBPS})$ |          |
| 6        | BPSK       | 1/2    | 1            | 48                   | 24           | 9.7      |
| 9        | BPSK       | 3/4    | 1            | 48                   | 36           | 10.7     |
| 12       | QPSK       | 1/2    | 2            | 96                   | 48           | 12.7     |
| 18       | QPSK       | 3/4    | 2            | 96                   | 48           | 14.7     |
| 24       | 16-QAM     | 1/2    | 4            | 192                  | 72           | 17.7     |
| 36       | 16-QAM     | 3/4    | 4            | 192                  | 144          | 21.7     |
| 48       | 64-QAM     | 2/3    | 6            | 288                  | 192          | 25.7     |
| 54       | 64-QAM     | 3/4    | 6            | 288                  | 216          | 26.7     |

 TABLE 2.2 IEEE 802.11a PHY data rate dependent parameters

### 2.1.2 Frame Format

FIG 2.2 shows the format of the PLCP protocol data unit (PPDU) used for IEEE 802.11a PHY. It comprises PLCP preamble, PLCP header and data field. The PLCP preamble is used for synchronization, including 10 short symbols and 2 long symbols. The short symbols are used for automatic-gain control (AGC), coarse timing detection and coarse frequency offset estimation. The long symbols are used for fine timing detection, fine frequency offset estimation and channel estimation. The detail PLCP preamble format and its timing parameters are shown in FIG 2.3 and TABLE 2.3. After the PLCP preamble is the PLCP header. It conveys information about coding rate, modulation type and the data length of PLCP service data unit (PSDU). The last component is data field contains variable number of OFDM symbols by the PSDU length.

| PLCP header   |                                             |  |  |  |                                     |   |  |     |
|---------------|---------------------------------------------|--|--|--|-------------------------------------|---|--|-----|
|               | RATEReservedLENGTHParityTailSERVICEPSDUTail |  |  |  |                                     |   |  | Pad |
|               | Coded/OFDM<br>(1/2,BPSK)                    |  |  |  | Coded/OFDM<br>(indicated in SIGNAL) |   |  |     |
| PLCP Preamble | SIGNAL                                      |  |  |  | DATA FIELI                          | ) |  |     |

FIG. 2.2 PPDU frame format of IEEE 802.11a PHY



FIG. 2.3 PLCP preamble format

| T <sub>PREAMBLE</sub> : PLCP preamble duration        | 16 us $(T_{SHORT} + T_{LONG})$                                            |
|-------------------------------------------------------|---------------------------------------------------------------------------|
| T <sub>SHORT</sub> : Short training sequence duration | 8 us (10 × $T_{FFT}/4$ )                                                  |
| T <sub>LONG</sub> : Long training sequence duration   | 8 us $(T_{GI2} + 2 \times T_{FFT})$                                       |
| $t_1 \sim t_{10}$ : Short symbol duration             | 0.8 us (10 × $T_{FFT}$ )                                                  |
| T1~T2: Long symbol duration                           | $3.2 \text{ us} (\text{T}_{\text{GI2}} + 2 \times \text{T}_{\text{FFT}})$ |
| T <sub>GI2</sub> : Training symbol GI duration        | 1.6 us (T <sub>FFT</sub> /2)                                              |

| TABLE 2.3 Timing parameters of | PLCP preamble |
|--------------------------------|---------------|
|--------------------------------|---------------|

## 2.2 Ultra-Wideband System

### 2.2.1 System Platform

In recent years, UWB communication has received much attention as a high speed, low cost wireless LAN implementation in short distance. To promote UWB technology, FCC allowed spectrum from 3.1GHz to 10.6GHz, total 7.5GHz band for UWB devices in 2002. Since UWB system has not been standardized; two baseband systems has been proposed. One is impulse radio based, transmitting nano-second time domain pulses over a wide bandwidth [17~18]. The other is OFDM based, dividing spectrum into several sub-bands and use one OFDM modulation to transfer data. In this paper, we focus on two OFDM based UWB systems for case study. The first is LDOC-COFDM system, having 528MHz bandwidth, 128 point FFT, and low density parity check (LDPC) codec with 120Mb/s~480Mb/s data rates. The detail system spec and system requirement for 8% PER are listed in TABLE 2.4 and TABLE 2.5. The second is MB-OFDM system, transmitting OFDM symbols across three time-interleaved sub-bands. An example of

timing-frequency coding (TFC) for the MB-OFDM system is shown as FIG 2.4. TABLE 2.6 lists the SPEC of MB-OFDM system and TABLE 2.7 lists the system requirement for 8% PER.

| Data Rate (Mb/s) | FFT       | Bandwidth (MHz) | FEC Coding Rate | Spreading Gain |
|------------------|-----------|-----------------|-----------------|----------------|
| 120              | 128-point | 528             | 3/4             | 4              |
| 240              | 128-point | 528             | 3/4             | 2              |
| 480              | 128-point | 528             | 3/4             | 1              |

| Data Rate (Mb/s) | Required Distance (m) | Required Eb/N0 (dB) | Required SNR (dB) |
|------------------|-----------------------|---------------------|-------------------|
| 120              | 10                    | 12.91               | 7.55              |
| 240              | 4                     | 18.35               | 16                |
| 480              | 2                     | 20.5                | 21.1              |

TABLE 2.5 Requirement for 8% PER of LDPC-COFDM system



FIG. 2.4 An example of MB-OFDM system for TFC  $(1 \cdot 2 \cdot 3 \cdot 1 \cdot 2 \cdot 3)$ 

| Data Rate (Mb/s) | FFT       | Bandwidth (MHz) | FEC Coding Rate | Spreading Gain |
|------------------|-----------|-----------------|-----------------|----------------|
| 53.3             | 128-point | 528             | 1/3             | 4              |
| 80               | 128-point | 528             | 1/2             | 4              |
| 110              | 128-point | 528             | 11/32           | 2              |
| 160              | 128-point | 528             | 1/2             | 2              |
| 200              | 128-point | 528             | 5/8             | 2              |
| 320              | 128-point | 528             | 1/2             | 1              |
| 400              | 128-point | 528             | 5/8             | 1              |
| 480              | 128-point | 528             | 3/4             | 1              |

### TABLE 2.6 MB-OFDM system SPEC

| Data Rate (Mb/s) | Required Distance (m) | Required Eb/N0 (dB) | Required SNR (dB) |
|------------------|-----------------------|---------------------|-------------------|
| 110              | 10                    | 12.9                | 7.1               |
| 200              | 4                     | 18.34               | 15.2              |
| 480              | 2                     | 20.5                | 21.1              |

TABLE 2.7 Requirement for 8% PER of MB-OFDM system

The system block diagram of OFDM based UWB system shown as FIG 2.5. It comprises transmitter, channel model and receiver. Transmitter sends transferred signals meet system SPEC. Channel model simulates channel interference and RF effects. At receiver, frame synchronizer detects the valid packet and FFT-window boundary. Then received signals are sent to demodulation, FEC decoder and finally it's sent back to MAC.



FIG. 2.5 System block diagram of OFDM based UWB system



### 2.2.2 Frame Format



FIG. 2.6 Frame format of MB-OFDM UWB system

The Frame format of OFDM based UWB system is shown as FIG 2.6 [19]. One packet is constructed from PLCP preamble, PLCP header, and data field. The PLCP preamble duration is 9.375ns. It has 30 sync symbols, including 21 packet-sync symbols (PS), 3 frame-sync symbols (FS), and 6 channel estimation symbols (CES). One sync-symbol can be divided into pre guard

interval, sync sequences and post guard interval. The sync sequences have one hundred and twenty-eight points with constant amplitude (1 or -1). The pre guard interval is the cyclic prefix of sync sequences with 32 points. The guard interval is inserted for transmitter and receiver to switch the carrier frequency to next sub-band.



FIG. 2.7 Channel model data flow of simulation platform

## 2.3 Simulated Channel Model

The channel effects flow of our platforms in simulation are shown as FIG 2.7, including multipath fading channel, additive white Gaussian noise (AWGN), carrier frequency offset effect, and sampling clock offset (SCO) effect. We will introduce how these channel effects distorts the transferred data in detail as follows:

### 2.3.1 Multi-Path Fading Channel

In wireless communication, the transmitting signals may collide with some obstacles and result other time-delay, power-decay reflected paths received by antenna. It is called multi-path interference, as shown in FIG 2.8. In time domain, the multi-path interference causes inter-symbol interference (ISI) from succeeding symbols; and in frequency domain, it causes frequency-selective fading when delay spread is longer than symbol period. In our platform, we model the multi-path interference by the linear convolution of corresponding channel impulse responses as

$$y(t) = h(t) \otimes x(t) = \sum_{N} h(t - N\Delta) \times x(t)$$
,  $h(t) = impulse \ response$ 

. In IEEE 802.11a PHY, the channel impulse response is established from the IEEE 802.11a channel model [20]. An example of the IEEE channel impulse response for 100 ns RMS delay spread is shown in FIG 2.9. In UWB system, we use Intel channel model [21] for LDPC-COFDM system and IEEE 802.15.3a channel environment from CM1 to CM4 model [22] for MB-OFDM system. FIG 2.10 shows an example of the UWB channel impulse of Intel channel model for 9ns RMS delay spread.



FIG. 2.8 Multi-path interference and ISI effect



FIG. 2.10 UWB channel impulse response

### 2.3.2 AWGN Model

At receiver antenna, the transferred signals will be interfered by non-predicted noise. In our platform we use AWGN model to simulation the non-predicted noise. The AWGN signal w(t) is generated by MATLAB as follows:

 $w(t) = rand(1, L) \times RMS + j \times rand(1, L) \times RMS$ 

Where L is the length of data signals and RMS is the normalized root mean square power defined as:

$$RMS = \frac{10^{(P_{data} - SNR)/20}}{\sqrt{2}}$$

Where  $P_{data}$  is the power of data signal and SNR is the SNR ratio between data signals and AWGN signals.

### 2.3.3 Carrier Frequency Offset Model

Carrier frequency offset (CFO) is happened due to the difference of carrier frequency between transmitter RF and receiver RF. The CFO effect in time domain can be represented as follows:

$$y(t) = x(t) \times e^{-j \times 2\pi (f_1 - f_2)T \times t}$$

Where  $f_1$  is the carrier frequency of transmitter and  $f_2$  is the carrier frequency of receiver. The parameter T is the period of sample clock. In IEEE 802.11a PHY, the sample clock rate is 20MHz and T equals to 50ns. In OFDM-based UWB system the sample clock rate is 528MHz and T equals to 1.894ns .It clearly shows CFO effect will cause linear phase shift in time domain as Fig 2.11.



FIG. 2.11 Linear phase shift caused by CFO

With the linear phase shift in time domain decaying the orthogonality of subcarriers, CFO induces inter-carrier interference (ICI) in frequency domain by moose's law [23]. ICI effect can be represented as follows :

$$Y[k] = H[k]X[k] \times \left[\frac{\sin(\pi\Delta f)}{N_{FFT}} \cdot \sin(\pi\Delta f/N_{FFT})\right] \cdot \exp(j\pi\Delta f(N_{FFT}-1)/N_{FFT})$$

$$+\sum_{\substack{m=-k\\m\neq k}}^{k} H[m]X[m] \times \left[ \frac{\sin(\pi\Delta f)}{N_{FFT}} \bullet \sin\left(\frac{\pi(\Delta f + m - k)}{N_{FFT}}\right) \right] \bullet \exp(j\pi\Delta f(\frac{N_{FFT} - 1}{N_{FFT}})) \bullet \exp(j\pi(\frac{m - k}{N_{FFT}})) \quad \text{ICI}$$

### 2.3.4 Sampling Clock Offset Model

As shown in FIG 2.7, sample clock offset (SCO) is caused by the variances of sampling frequency between digital to analog converter (DAC) in transmitter and analog to digital converter (ADC) in receiver. In time domain, SCO results time shift from practical sampled points and ideal
sampled points. Without compensating SCO effect, the time shift error will be accumulated. It leads ADC to sample the received signal at wrong time and fails receiver behavior. The SCO distortion also makes a linear phase error in frequency domain as FIG 2.12. Thus, we use pilot sub-carriers to estimate the linear phase error caused by SCO to recovery the transferred data.



FIG. 2.12 SCO effect in Time domain and frequency domain

# CHAPTER 3

# A Low Complexity Frame Synchronizer for OFDM Application

In this chapter, a low complexity frame synchronizer used for OFDM system is proposed. It mainly chooses the most-significant taps of matched filter used for FFT window detection to reduce correlation complexity of frame synchronizer. To explain our study clearly, the IEEE 802.11a PHY introduced in chapter 2 is selected as our system platform. The detail algorithm, analysis and simulation results will be shown in the following.





FIG. 3.1 Frame synchronizer data flow

The data flow of proposed frame synchronizer is shown in FIG 3.1. In the initial, packet detection detects the valid packet through normalized auto-correlation algorithm in short preamble. A decision threshold is chosen to compare with the normalized auto-correlation value. The valid

packet will be asserted when the normalized auto-correlation value is greater than decision threshold. Then, coarse frequency compensation uses residue short training symbols to compensate CFO  $\leq \pm 4$ ppm( $\pm 20$ KHz). At the same time, frame synchronization detects the end of short preamble by another decision threshold. Next, FFT window detection finds out start boundary of FFT window by comparing with one long training sequence (cross-correlation algorithm). After deciding the FFT window boundary, fine frequency compensation compensates remain CFO  $\leq 0.8$ ppm(4KHz) and channel equalizer estimates channel response by another long training sequence.

# **3.1.1** Packet Detection



In 802.11a PHY, the valid packet can be detected by depending the periodic data property of PLCP preamble. As mentioned in 2.1.2, short preamble is constructed by ten repeating short symbols and each short symbol has period ' $T_s$ ' (0.8us). Thus we make a comparison of received signals R(t) and R(t+T<sub>s</sub>) by the normalized auto-correlation scheme [24-25] depicted as follows:

$$C_{k} = \sum_{m=0}^{N-1} r_{k+m} \times r_{(k+m)+N}^{*}$$

$$P_{k} = \sum_{m=0}^{N-1} \left| r_{(k+N+m)} \right|^{2}$$

$$\lambda_{k} = \frac{\left| C_{k} \right|^{2}}{P_{k}^{2}}$$
(Eq 3.1)

In the above equation:  $C_k$  is the auto-correlation value and  $P_k$  is the corresponding symbol power. The parameter 'N' is the number of sample points in a short period 'T<sub>s</sub>' equaling to 16. Normalizing the auto-correlation value  $C_k$  with symbol power  $P_k$ , we can get a new decision value  $\lambda_k$ . The normalized auto-correlation value  $\lambda_k$  can detect the valid packet independent with receiver power level. Thus packet detection begins working without AGC turning the correct RF receiver gain. In IEEE 802.11a PHY, AGC, packet detection, diversity selection and Coarse CFO estimation are required to be complete in short preamble duration. The number of short symbols needed for packet detection should be as less as possible. In our design, since AGC and packet detection can work simultaneously, they can share short symbols with each other and get longer estimation time to increase performance. The proposed decision value  $\Lambda_k$  are defined as following equation: it uses three short symbol pairs for normalized auto-correlation algorithm.





FIG. 3.2 Example of Packet Detection in Proposed Design

FIG 3.2 shows an example of packet detection. Noise signals with 5us are added before the valid packet. The testing channel condition is SNR=0dB, CFO=200KHz(40ppm) and multipath delay spread=150 ns. The vertical axis is the proposed normalized auto-correlation value  $\Lambda_k$ . To detect the valid packet, a pre-defined threshold is needed to compare with  $\Lambda_k$ . Once the normalized correlation value is greater than pre-defined threshold, detection of packet will be asserted. It is clearly under low SNR regions, the normalized auto-correlation value of noise signal varies extremely. To reduce the error rate of false announcement, a decision window is defined to test packet assertion. When  $\Lambda_k$  is greater than pre-defined threshold, the decision window starts to check the following correlation values. Packet detection only announce when all correlation values in decision window are also greater than the pre-defined threshold. If not, the packet assertion will be canceled and packet detection returns the initial state, as shown in FIG 3.2.

#### **3.1.2 FFT Window Detection**

In our proposed design, FFT window detection finds the correct FFT window boundary by the known-data property [26]. It compares the received data with the ideal long training symbol data in a pre-defined searching window. The data comparison is based on the cross-correlation algorithm shown as follows :

$$\Delta(k) = \left| \sum_{n=0}^{L_n - 1} R_{(k+n)} \times C_n^* \right|^2$$
(Eq 3.3)

In the above equation, 'R' is the received data from ADC, 'C' is the corresponding compared element of long training symbol. 'L<sub>n</sub>' is the total number of elements in one long training symbol. In 802.11a standard, L<sub>n</sub> is the same as FFT size equaling to 64.  $\Delta(k)$  is correlation value of the kth index of pre-defined searching window. Thus the maximum cross-correlation value represents which most similar to the ideal long training symbol, declared as the FFT window boundary.



FIG. 3.3 FFT window detection in AWGN and multi-path channel

An example of FFT window detection in AWGN channel and multi-path channel with 150 ns RMS delay spread is shown as FIG 3.3. It is clearly in the AWGN channel, the maximum cross-correlation index will be the start of FFT window as we expected. However in the multi-path channel, the delay spread of other arrival paths makes the maximum cross-correlation value locate in the later samples compared with the ideal FFT window boundary, and the correct FFT window boundary becomes the 2th or 3th peak cross-correlation value in the searching window. A common resolution is choosing the index earlier N points (N is an integer modified by designer) than the maximum cross-correlation value index as preferred FFT window boundary. However, the early catching will reduce the effective GI and degrades system performance in severe multi-path channel [27]. To solve this problem, the TOP 'M' pre-cursor searching scheme in [3] was referenced. It defines the index of maximum 'M' cross-correlation values as boundary candidates. The 'N' samples before the peak cross-correlation value is pre-cursor window. If there are more than one boundary candidates locating in the pre-cursor window, chooses the earlier index as our preferred FFT window boundary. Otherwise, chooses the peak cross-correlation value index as our preferred FFT window boundary. FIG 3.4 is the FFT window boundary distribution between using pre-cursor searching scheme (In our design, M=5 and N=5) and conventional design (without pre-cursor searching scheme) in multi-path channel with RMS delay spread=150 ns. For the perfect boundary cutting (index=0 at FIG 3.4), using pre-cursor searching scheme has correct probability twice the conventional design. Also the boundary distribution of pre-cursor searching scheme is more centralized, meaning less early catching points needed to retain effective GI. Comparing the simulation curves in SNR=0dB and SNR=10dB, since increasing SNR can't reduce

multi-path interference, the boundary distribution of conventional design choosing the maximum correlation value in different SNR region are almost the same. However, SNR improvement can reduce probability of error boundary candidates in pre-cursor searching scheme caused by AWGN noise. Thus SNR improvement of pre-cursor searching scheme leads to better boundary distribution centralization (index=0) and less early catching (index from –4 to -1).



FIG. 3.4 FFT window detection in AWGN and multi-path channel

# 3.2 Proposed Algorithm

# 3.2.1 Most-Significant Taps Scheme

In 802.11a PHY, the most hardware cost of frame synchronizer is FFT-window detection. To

implement the cross-correlation scheme (Eq 3.3), matched filter with 64 taps are used to calculate the timing metric  $\Delta$  (k), meaning 64 complex multipliers(each complex complier has four multipliers and two adders) are needed. Therefore, the most efficient approach for hardware saving is reducing required taps compared in FFT window detection. However, matched filter is based on ML estimation, its compared accuracy has positive relation with input data power. And decreasing tap number of matched filter may result in performance degradation. To reduce required taps of matched filter with the least performance loss, the most-significant taps schemes is proposed.

$$\Delta(k) = \left| \sum_{m=1}^{N} R_{(k+S[m])} \times C_{S[m]}^{*} \right|^{2}$$
(Eq. 3.4)

In Eq 3.4, the parameter C is the matched-filter coefficient from  $C_0$  to  $C_{63}$ , corresponding to the 64 taps. S is the index-sorting matrix from the maximum element of C to the minimum element. For example,  $S_{[1]}$  represents index of the  $I_{st}$  maximum element of C and  $S_{[2]}$  represents index of the  $2_{nd}$  maximum element. The parameter N is the number of used taps modified by user in demand. FIG 3.5 shows the power distribution of matched-filter coefficients in time domain and reorders them by power ratio.



FIG. 3.5 Power distribution of  $C_0 \sim C_{63}$  and  $S_{[1]} \sim S_{[64]}$ 

The contents of index-sorting matrix S is listed as follows:

$$S = \{15 \cdot 51 \cdot 1 \cdot 33 \cdot 25 \cdot 41 \cdot 30 \cdot 36 \cdot 46 \cdot 20 \cdot 54 \cdot 12 \cdot 35 \cdot 31 \cdot 39 \cdot 27; (1_{st} \sim 16_{th}) \\ 59 \cdot 7 \cdot 62 \cdot 4 \cdot 45 \cdot 21 \cdot 26 \cdot 40 \cdot 2 \cdot 64 \cdot 16 \cdot 50 \cdot 3 \cdot 63 \cdot 55 \cdot 11; (17th \sim 32_{th}) \\ 8 \cdot 58 \cdot 60 \cdot 6 \cdot 28 \cdot 38 \cdot 48 \cdot 18 \cdot 43 \cdot 23 \cdot 57 \cdot 9 \cdot 34 \cdot 32 \cdot 49 \cdot 17; (33th \sim 48_{th}) \\ 44 \cdot 22 \cdot 19 \cdot 47 \cdot 53 \cdot 13 \cdot 14 \cdot 52 \cdot 42 \cdot 24 \cdot 37 \cdot 29 \cdot 10 \cdot 5 \cdot 61\}$$
(49th ~ 64th)



In 802.11a standard, the matched-filter coefficients are generated from the long OFDM training symbol transferred into time domain, resulting great power ratio variance between the coefficients. In the most-significant taps scheme, the least power ratio coefficients will be seen as redundant taps and removes from matched-filter. Thus the most-significant taps scheme can reduce correlation-complexity with less performance degradation. FIG 3.6 plots the total number of taps used for most-significant taps scheme versus its containing power ratio. The matched filter in [28] proposed using first 32 matched filter coefficients for low-power synchronizer design. It has 50 % power ratio from the conventional design (with total 64 taps). However in most 32 significant taps scheme, 50% correlation complexity from conventional 64 taps is saved as [28] with 32 taps, but

the proposed design still containing 72.4% power ratio from conventional design. Therefore it can get better performance than [28]. On the other hand, the most significant taps scheme only requires 20 taps to reach 50% power ratio, saving 37.5% complexity from [28].

# 3.2.2 Quantization Approach

Another effective approach to reduce complexity of cross-correlation was proposed in [29]. The proposed correlation scheme quantized the matched filter coefficients into the value composed of  $\{0 \land \pm 2^0 \land \pm 2^{-1} \land \pm 2^{-2} \ldots \pm 2^{-q}\}$ . By the quantized 2<sup>-q</sup> - level coefficients, multiply function of cross-correlation scheme can be replaced with q-bit shifting function. Thus multipliers used for correlation can be simplified into q-bit shifters. In IEEE 802.11a standard, the time domain long training symbol can be quantized into  $\{0 \land \pm 2^{-3} \land \pm 2^{-4} \land \pm 2^{-5} \land 2^{-6}\}$ . The drawback of this approach is serious quantization error, as FIG 3.7 shown.



FIG. 3.7 Tap power analysis of quantized approach

We use signal to quantization error ratio (SQNR) to estimate the quantization error (Eq 3.5) :

$$SQNR = 10 \log \left\{ \frac{\sum_{m=1}^{64} Cm^2}{\sum_{m=1}^{64} (Cm - Qm)^2} \right\}$$
(Eq3.5)

Parameter C is the original matched filter coefficient and Q is coefficient after quantized. The SQNR of quantization approach is 14.86dB. Although the SQNR ratio is some worse, FIG 3.8 shoes the FER simulation in multipath channel with 150 ns RMS delay spread and CFO =100KHz under perfect packet detection. The SNR loss between original 64 taps and quantized 64 taps is only 0.5 dB for 1% FER.





FIG. 3.8 FER between conventional and quantization approach

Finally, we proposed a low complexity cross-correlation design for FFT-window detection by combining the most-significant taps scheme and the quantization approach. The algorithm is shown as follows:

$$\Delta(k) = \left| \sum_{m=1}^{N} R_{(k+S[m])} \times Q_{S[m]}^{*} \right|^{2}$$

$$Q_{X} = \arg \min_{T} \{ \operatorname{Re}[C_{X}] - T \} + j \times \arg \min_{T} \{ \operatorname{Im}[C_{X}] - T \}$$

$$T \in \{ 0, \pm 2^{-3}, \pm 2^{-4}, \pm 2^{-5}, \pm 2^{-6} \}$$

$$S \equiv index \text{ sortting matrix of most - significant taps scheme}$$
(Eq 3.6)

Similar to Eq 3.4, parameter 'R' is the received signals and 'N' is the number of used taps. The parameter 'N' to reduce complexity while still maintaining performance is different with channel condition and user's concern. In chapter 5, we will show the simulation results between channel model, complexity, and performance in our 802.11a system platform.

111111

# **CHAPTER 4**

# A Low Complexity and High Throughput Frame Synchronizer for OFDM-Based UWB System

In this Chapter, a novel frame synchronizer is proposed for OFDM-based UWB system. Integrating the tap-reduction scheme, register-sharing algorithm and dynamic threshold, the proposed design can save over 50% area cost and power consumption from the conventional design power with an acceptable performance loss. Moreover, the proposed design can achieve 528MS/s throughput for 120~480Mb/s data rates UWB system in 0.18µm CMOS process.

# 4.1 Motivation



For OFDM-based UWB system, Frame synchronizer requires over hundreds of Mega samples per second throughput. Conventional frame synchronizer using single matched filter is not efficient to achieve high throughput by the long critical path of complex multiplier used for matched filters. On the other hand, parallel approaches with multiple matched-filters [9-10] to achieve such high throughput will lead to high area cost and high power consumption. To solve this problem, reducing matched filter complexity becomes the main concern to implement our design. In a matched-filter, tap number and required throughput dominate design complexity. Thus we proposed a tap-reduction scheme to reduce tap number for low-complexity improvement. Furthermore, another register-sharing algorithm cooperates with the tap-reduction scheme to save required size of register-files for parallel architecture. Finally, dynamic threshold design is adopted to enhance frame error rate performance from the conventional fixed-threshold design. The platform of our OFDM-based UWB system has been introduced in section 2.2. In the following, we first introduce the proposed algorithm based on LDPC-COFDM system to reach 528MS/s high throughput, including tap-reduction scheme, register-shaing algorithm, and dynamic threshold design. Then we apply the proposed algorithm for MB-OFDM system and add another dynamic searching window algorithm to detect RF switching of the three time-interleaved sub-bands. The performance analysis and simulation result of proposed design will be shown in chapter 5.

# 4.2 LDPC-COFDM Design

In LDPC-COFDM system, transmitter sends the valid data at one fixed sub-band with 528MHz bandwidth. Without time-interleaving the OFDM symbols, the TFC of RF will maintain constant. Thus frame synchronizer needn't to consider the correct switching time between the sub-bands.

# 4.2.1 Frame Synchronizer Flow

FIG 4.1 is the data flow of proposed frame synchronizer for LDPC-COFDM UWB system. In the initial, Packet detection detects the valid packet from the received signals through auto-correlation scheme. After packet announcement, FFT window detection finds the correct FFT window boundary by matched filters. Then preamble timing detection distinguishes three kinds of sync symbols (PS, FS, CES) in preamble. Finally, by the control signals from three main blocks, FFT symbol gate cuts OFDM data symbols to FFT for frequency domain transformation.



FIG. 4.1 Frame synchronizer flow

#### 4.2.1. 1 Packet Detection

Noise signals and valid packet will be distinguished by using periodic packet sync symbols.

The normalized auto-correlation scheme of packet detection is shown as follows:

$$A_{X} = \sum_{n=0}^{N-1} r_{X^{*}N+n} \times r_{(X+3)^{*}N+n}^{*}$$

$$P_{X} = \sum_{n=0}^{N-1} \left| r_{(X+3)^{*}N+n} \right|^{2}$$

$$\lambda_{X} = \frac{\left| A_{X} \right|^{2}}{P_{X}^{2}}$$
(Eq 4.1)

In Eq 4.1, the parameter 'r' is the received signals from ADC. Before valid packet announcement, the received signals will be divided into several received symbols with 312.5ns time duration (equal to one OFDM symbol duration). The parameter 'X' is the index number of the received symbols, and 'N' is the total length of samples in one received symbol. The calculated result ' $A_X$ ' represents the auto-correlation value of the  $X_{th}$  received symbol, ' $P_X$ ' represents the power estimation of  $X_{th}$  received symbol, and  $\lambda_x$  represents the normalized auto-correlation value of  $X_{th}$  received symbol. In [30], it proposed that AFC estimates CFO effect by the phase of auto-correlation value for OFDM symbol pair with three symbols duration. To share auto-correlation value with AFC, packet detection calculate auto-correlation value between received symbol  $X_{th}$  and  $(X+3)_{th}$  as FIG 4.2. Moreover, to prevent false announcement, packet detection asserts the valid packet at index k when both  $\lambda_x$  and  $\lambda_{X-1}$  are higher than the pre-defined threshold.



FIG. 4.2 Packet detection flow

#### 4.2.1. 2 FFT Window Detection

After packet detection, FFT Window detection finds FFT window boundary by comparing sync sequences in packet sync symbol. It also based on the cross-correlation algorithm and matched-filter. Section 2.2.1 refers that sync sequences has 128 points. Thus the tap number of matched-filter is 128. The cross-correlation algorithm is shown as follows :

$$\Lambda(m) = \left| \sum_{n=0}^{L_s - 1} r_{(m+n)} \times s_n^* \right|^2$$
(Eq 4.2)

In Eq 4.2, parameter 'r' is the received data from ADC, 's' is the corresponding sync sequences used as matched-filter coefficients, 'Ls'=128 is the total tap number, and 'm' is the index of pre-defined searching window with 312.5ns time duration (equal to one OFDM symbol duration).

#### 4.2.1. 3 Preamble Timing Detection

In the proposed frame synchronizer, FFT window detection only finds the FFT window. We still need preamble timing detection to divide preamble from received data. The decision scheme of preamble timing detection is shown as follows:

$$D_{Y} = \sum_{n=0}^{L_{s}-1} r_{Y^{*}N+n} \times r_{(Y+1)^{*}N+n}^{*}$$

$$P_{Y} = \sum_{n=0}^{L_{s}-1} \left| r_{(Y+1)^{*}N+n} \right|^{2} \qquad (Eq 4.3)$$

$$\left| D_{Y} + D_{Y-1} \right|^{2} \ge \Gamma \times \left( P_{Y} + P_{Y-1} \right)^{2}$$

In the above equation, 'D<sub>Y</sub>' is the auto-correlation value of the sync sequences in  $Y_{th}$  packet sync symbol, 'P<sub>Y</sub>' is the corresponding symbol power, and ' $L_s$ '=128 is the total points in sync sequences. Preamble timing detection is also based on the auto-correlation scheme and ' $\Gamma$ ' is the parameter of compared threshold. From the proposal [19], frame sync symbol equals packet sync symbol multiplying –1. The auto-correlation value between the last packet sync symbol and first frame sync symbol will be negative to auto-correlation value of other sync symbol pairs. Thus preamble timing detection decides first sync symbol by Eq 4.3 shown as FIG 4.3. Since before preamble timing detection, FFT window boundary has been detected. We can remove cyclic prefix interfered by ISI from sync symbols and only use sync sequences for correlation estimation.







#### **Proposed Algorithm** 4.2.2

# 4.2.2. 1 Tap-Reduction Scheme

As mentioned earlier, parallel approaches to achieve 528MS/s throughput leads to high hardware cost and power consumption. For low complexity improvement, reducing tap number of matched filter was proposed [10]. The trade off is performance degradation of frame synchronizer. According to the UWB system proposal [19], the power of sync sequences is constant for every sample point. We can't apply the most-significant taps scheme introduced in section 3.2.1 to reduce tap number of matched filter. Therefore, we proposed a tap-reduction scheme to reduce correlation complexity by down sampling the received signals because of the average power distribution property of sync sequences. The proposed tap-reduction scheme can also apply for auto-correlation scheme. In the following, we show the modified functions of Eq  $4.1 \sim$  Eq 4.3:

Packet Detection :

$$A_{X} = \sum_{n=0}^{\lfloor (N-1)/w \rfloor} r_{X^{*}N+n\times w} \times r_{(X+3)^{*}N+n\times w}^{*}$$

$$P_{X} = \sum_{n=0}^{\lfloor (N-1)/w \rfloor} \left| r_{(X+3)^{*}N+n\times w} \right|^{2}$$

$$\lambda_{X} = \frac{\left| A_{X} \right|^{2}}{P_{X}^{2}}$$
(Eq 4.4)

FFT Window Detection :

$$\Lambda(m) = \left| \sum_{n=0}^{\lfloor (L_s-1)/w \rfloor} r_{(m+n\times w)} \times s_{n\times w+j}^* \right|^2 \quad (\text{Eq 4.5})$$
Preamble Timing Detection :
$$D_Y = \sum_{n=0}^{\lfloor (L_s-1)/w \rfloor} r_{Y*N+n\times w} \times r_{(Y+1)*N+n\times w}^*$$

$$P_Y = \sum_{n=0}^{\lfloor (L_s-1)/w \rfloor} \left| r_{(Y+1)*N+n\times w} \right|^2 \quad (\text{Eq 4.6})$$

$$\left| D_Y + D_{Y-1} \right|^2 \ge \Gamma \times \left( P_Y + P_{Y-1} \right)^2$$

In Eq 4.4 ~Eq 4.6, the parameter ' $\omega$ ' is a reduction factor controlling correlation complexity and tap number for each function block.

Differing from conventional down-sampling scheme having only  $1/\omega$  throughput rate of input data, the tap-reduction scheme still has the same throughput rate (528MS/s) with input data to keep timing resolution of FFT window detection. Sync sequences used as matched-filter taps

are also divided into ' $\omega$ ' groups ( $S_{n \times w+j}$   $j \in \{0, 1, 2, ..., w-1\}$ ). By the average power distribution property of sync sequences, any one of the ' $\omega$ ' groups chosen as matched-filter taps has equal performance. The detail performance simulation of tap-reduction scheme will be shown in section 5.2.1. By the simulation result, we proposed ' $\omega$ '=4 for our frame synchronizer. The data flow of conventional design and design using tap-reduction scheme (with ' $\omega$ '=4, 'j'=3) are shown in the following :



FIG. 4.4 Data flow of conventional design with 128 taps (*w*=1)



FIG. 4.5 Data flow of tap-reduction scheme with 32 taps (w = 4, j = 3)

FIG 4.4 is the conventional design with 128 taps (' $\omega$ '=1). The register-files storing received samples for cross-correlation are 128 words. FIG 4.6 is the tap-reduction scheme with 32 taps (' $\omega$ '=4 'j'=3). Comparing FIG 4.4 and FIG 4.5, tap-reduction scheme reduces 75% correlation complexity and register-files length of conventional design from 128 taps to 32 taps. However, when applying parallel architecture for high-throughput matched-filter design, the register-files should be parallelized, too. To resolve the increasing size of register-files for parallelism, we proposed another register-sharing algorithm. It can cooperate with the tap-reduction scheme to share received samples for the parallel matched-filters to reduce required size of register-files.



#### Assume: 8-tap matched-filter, parallelism=2, (0)=2

FIG 4.6 shows an example of 8-tap matched-filter. With  $\omega$ =2, register-files used for stored received data were reduced to 4 words. However, when we use parallel 2 architecture, 2 suits register-files are needed corresponding with 2 suits matched-filter, increasing the required size of register-files. Thus we proposed a register-sharing algorithm to solve this problem. By rescheduling the received data and compared taps, the 2 suits matched-filters can share the same received data with only one register-files, reducing hardware cost of register-files.

The register-sharing algorithm is shown as Eq 4.7. The left side is tap-reduction scheme from Eq 4.5, and the right side is the proposed register-sharing algorithm by rescheduling the index of received data and compared taps as Eq 4.7.

$$\Lambda(m) = \left| \sum_{n=0}^{\lfloor L_s/\varpi \rfloor - 1} r_{(n\varpi+m)} \times s_{n\varpi}^* \right|^2 \cong \left| \sum_{n=0}^{\lfloor L_s/\varpi \rfloor - 1} r_{n\omega+\omega \lfloor m/\omega \rfloor} \times s_{n\omega-m+\omega \lfloor m/\omega \rfloor}^* \right|^2$$
**Tap-reduction scheme Data rescheduling**
(Eq. 4.7)

The detail derivation of register-sharing algorithm is shown as Eq 4.8:

$$\Lambda(m) = \left| \sum_{n=0}^{L_{s}-1} r_{(m+n)} \times s_{n}^{*} \right|^{2} = \left| \sum_{L=0+m}^{L_{s}-1+m} r_{L} \times s_{L-m}^{*} \right|^{2}$$

$$\approx \omega \left| \sum_{n=0}^{L_{s}/\omega} r_{n\omega+\omega} m_{\omega} \right|^{2} \times s_{n\omega+\omega}^{*} m_{\omega} + \omega m_{\omega}^{*} + \omega m_{\omega}^{*} + \omega m_{\omega}^{*} \right|^{2}$$

$$\xrightarrow{tap-reduction} \left| \sum_{n=0}^{L_{s}/\omega} r_{n\omega+\omega} m_{\omega}^{*} \right|^{2} \times s_{n\omega+\omega}^{*} m_{\omega} + \omega m_{\omega}^{*} + \omega m$$

### > Partition factor ' $\omega$ ' = 2

 $\triangleright$ 



FIG. 4.7 Data flow example of the proposed design

As shown in FIG 4.7, conventional design use received data 1~8 comparing with compared taps 1~8 at K=0, and use received data 2~9 comparing with compared taps 1~8 at K=1. For ' $\omega$ '=2, the tap-reduction scheme divide information of conventional design into two data-partition groups. Without proposed algorithm, matched-filter only uses one data-partition group of compared taps  $(1 \times 3 \times 5 \times 7)$  to compute matched-filter power. Thus register-files need to refresh at every sample cycle. However with proposed algorithm, matched-filter use all data-partition group of compared taps to compute matched-filter power for different sample cycle in order. Thus the register-files in FIG 4.7 can share received data for K=0 and K=1. When we apply the register-sharing algorithm for parallel architecture, required size of register-files can be reduced as FIG 4.6.

By using different tap groups to compare with shared received data, the register-sharing algorithm should cooperate with tap-reduction scheme. Furthermore, it is only suitable for matched-filter coefficients with constant power distribution because all the tap groups having the same power ratio makes correlation result in equivalent. The data flow of register-sharing algorithm with the proposed reduction factor ' $\omega$ '=4 is shown as FIG 4.8. The access ratio between only tap-reduction scheme and with register-sharing algorithm is computed as Eq 4.9 by comparing FIG 4.5 and FIG 4.8. The conventional design accesses 32 words for first cycle and re-accesses 32 words for every proceeding cycles? The proposed design accesses 32 words for the first 4 cycles but re-accesses 1 word for every 4 proceeding cycles. The parameter 'N' is the searching window length of FFT window detection equaling to samples in one OFDM symbol.

$$R_{access=} \frac{proposed}{original} = \frac{32 + 1*\lfloor (N-4)/4 \rfloor}{32 + 32*(N-1)} \cong 1.37\%$$
(Eq 4.9)



#### 4.2.2. 3 Dynamic Threshold Design

In general, a pre-defined threshold is needed to compare with the estimation result for detection using auto-correlation scheme. But in low SNR regions, received data seriously distorted by AWGN alters the optimized threshold value for auto-correlation scheme. Therefore, a dynamic threshold was proposed to generate the compared threshold automatically according to different channel conditions. We apply the dynamic threshold design for preamble timing detection in our proposed frame synchronizer. The decision function of preamble timing detection (Eq 4.6) has a pre-defined threshold ' $\Gamma$ ', and we calculate ' $\Gamma$ ' by dynamic threshold design as (Eq 4.10) :

$$\boldsymbol{\Gamma} = \frac{\left| D_{(Y-1)} + D_{(Y-2)} \right|^2}{\left| P_{(Y-1)} + P_{(Y-2)} \right|^2} \times \boldsymbol{\mathcal{E}}$$
 (Eq 4.10)

In (Eq 4.10), definition of parameter 'D' and 'P' is the same as (Eq 4.6), the parameter ' $\varepsilon$ ' is a constant factor modified by users according to simulation results. In our design, the first threshold ' $\Gamma$ ' for comparison is calculated by the normalized auto-correlation value of the valid packet announcement. Then threshold ' $\Gamma$ ' of other sync symbols is calculated by multiplying the normalized auto-correlation value of its previous sync symbols multiplying the constant factor ' $\varepsilon$ '.

# 4.3 Multi-Band OFDM Design

Different from LDPC-COFDM UWB system, MB-OFDM system used three sub-bands to transfer data. Therefore, baseband frame synchronizer of MB-OFDM system needs to control RF receiver detecting the selected sub-band and changes it at correct time. FIG 4.9 and FIG 4.10 show the received data of LDPC-COFDM system and MB-OFDM system individually. Before frame synchronizer detect the sub-bands successfully, RF receiver of MB-OFDM system will fix its bandwidth at one sub-band to transfer data. Thus only data at the selected sub-band can be transferred and data at other two sub-bands will be filtered as shown in FIG 4.10, meaning that the effective preamble length of MB-OFDM frame synchronizer will be reduced to only 1/3 of S LDPC-COFDM frame synchronizer. This requires frame synchronizer using packet sync symbol for band detection as less as possible. To reach this demand, we modified the shared auto-correlator by adding its correlation complexity. This approach can improve the accuracy of packet detection and save the number of used packet sync symbol. The trade-off is doubling the area cost and power consumption of the shared auto-correlator. To maintain low-power feature, another low-cost dynamic searching window is proposed for band detection. It can provide estimated power information for AGC and reduce turn on probability of auto-correlator and matched-filter to save power consumption. For AGC, the spent packet sync symbol will be saved also by adding a training packet and using the estimated power of dynamic searching window for tuning correct RF gain.



FIG. 4.9 Baseband Received Data of LDPC-COFDM system



# 4.3.1



FIG. 4.11 Frame synchronizer flow of MB-OFDM UWB system

FIG 4.11 is the frame synchronizer flow of MB-OFDM UWB system. In the initial, control FSM fixes RF receiver at sub-band 1 to transfer data, and AGC tunes correct RF gain of noise signals. After AGC tunes the RF gain stably, packet detection uses auto-correlation scheme to the valid packet. At the same time band detection decides the correct switching time of time-interleaved OFDM symbols transferred at the three sub-bands. Once packet valid is asserted, control FSM changes sub-bands of RF receiver at corresponding tine duration by the band boundary information of band detection. Then FFT window detection finds FFT window boundary during band boundary ±16 sample cycles. Finally, preamble timing detection distinguishes three kinds of sync symbols (PS, FS, CES) in preamble and controls FSM cutting OFDM data symbols for FFT.



# 4.3.2 Proposed Algorithm

#### 4.3.2. 1 Training AGC

In our system platform, we assume that the variable gain amplifier (VGA) of RF receiver can tune gain from 0 to 70 dB and implement AGC block by signal power measurement algorithm. For low cost consideration, we used the estimated power information of band detection and build up one AGC lookup table with effective range from –10 to 10 dB to tune VGA gain. The drawback of the AGC lookup table is long searching time under high SNR condition. In LDPC-COFDM system, there are sufficient packet sync symbols for AGC tuning VGA gain. However, MB-OFDM

system enormously reduces the available sync symbols for AGC, and too long AGC time under high SNR region will fails frame synchronizer because of insufficient sync symbols. To solve this problem, we proposed the training AGC with binary search to tune VGA gain. Before transferring the valid data, transmitter sends a training packet for receiver and AGC tunes the correct gain at most 4 effective packet sync symbol (12 OFDM symbol duration). The tune valid gain of noise signal and data signal in training packet will be stored as training gain. When transferring the valid data, AGC will reference the training gain and tunes VGA gain finely by AGC lookup table. Thus only one effective packet sync symbol (3 OFDM symbol duration) will be cost by AGC. The algorithm of AGC is shown as follows :

$$if(|GAIN_{est}| <= 10)$$

$$GAIN_{next} = GAIN_{now} + GAIN_{est}$$

$$else if(GAIN_{est} > 10)$$

$$GAIN_{next} = GAIN_{now} + (GAIN_{max} - GAIN_{now})/2 \quad (Eq \ 4.11)$$

$$else$$

$$GAIN_{next} = GAIN_{now} - (GAIN_{now} - GAIN_{min})/2$$

In (Eq 4.11),  $GAIN_{est}$  is the estimated gain from AGC lookup table with effective range from -10 to 10 dB ;  $GAIN_{max}$  and  $GAIN_{min}$  is the possible maximum and minimum VGA gain (In our design,  $GAIN_{max}$ =70dB and  $GAIN_{min}$ =0dB) ;  $GAIN_{now}$  is the VGA gain at now time and  $GAIN_{next}$  is the computed gain of next time. The detail data flow of training AGC is shown as FIG 4.12.



FIG. 4.12 Detail data flow of training AGC

#### 4.3.2. 2 Band Detection

In MB-OFDM system, band detection must decide the correct switching time of sub-bands to receive time-interleaved OFDM symbols, like FIG 2.4. FIG 4.10 shows that before band detection, only 242.4ns (128 samples) has data for every 937.5ns period (3 OFDM symbols). If we accumulate the power of received signal for continuous 128 samples, the accumulated value will reach a local maximum value in time domain for every 937.5ns period. FIG 4.13 shows the accumulated power distribution in time domain and apparently the end of sub-band 1 locates at the index of local maximum value.



FIG. 4.13 Accumulated power of continuous 128 samples

To detect the end of sub-band 1, we use a dynamic searching window to find the corresponding index of local maximum value. It compares the accumulated power at two different samples with 'm' sample distance to get a sub-detection flag. When all the sub-detection flags satisfy the pre-defined condition, the valid searching window will be announced. Eq 4.12 shows the algorithm of proposed dynamic searching window :

$$D(k) = \sum_{n=0}^{n=127} |r(k+n)|^2 - \sum_{n=0}^{n=127} |r(k-m+n)|^2$$

$$= \sum_{n=127-(m-1)}^{n=127} |r(k+n)|^2 - \sum_{n=0}^{n=m-1} |r(k-m+n)|^2$$

$$if(D(k) > 0)$$

$$f(k) = 1$$

$$else$$

$$f(k) = 0$$
(4.12-1)

Detection valid asserts at index 'k' when f(k+2) = f(k+6) = f(k+10) = f(K+14) = 0 & (Eq 4.12-2) f(k+18) = f(k+22) = f(k+26) = f(K+30) = 0 & f(k-2) = f(k-6) = f(k-10) = f(K-14) = 1 & f(k-18) = f(k-22) = f(k-26) = f(K-30) = 1

In Eq 4.12, D(k) means the accumulated power between 'm' samples and 'm'=8 in our proposed design. The compared result flag f(k) represents the increasing trend (f(k)=1) and decreasing trend (f(k)=0) for accumulated power. Thus the local maximum value and can be detected. When any searching window has been cut, packet detection will calculate the corresponding auto-correlation value and compare with pre-defined threshold to prevent false announcement at low SNR condition. Also the detected end index of sub-band1 may has some variations from the true band boundary because of white noise and multi-path interference. Thus the dynamic searching window detects the boundary coarsely. After matched-filter detects the FFT window boundary, band detection will adjust band boundary finely by the FFT window boundary information.

#### 4.3.2. 3 Other Function Block

Besides band detection, the other three function blocks (packet detection, FFT window detection, and preamble timing detection) can be implemented by applying the same design algorithms of LDPC-COFDM system (from Eq 4.4 to Eq 4.6). However, they still have some differences from LDPC-COFDM system design:

(1) Packet Detection: In LDPC-COFDM system, reduction factor ' $\omega$ ' of normalized auto-correlation algorithm (Eq 4.4) is set to '4' and used two packet sync symbols pairs for threshold comparison. In MB-OFDM system, to save required packet sync symbols, we proposed reduction factor ' $\omega$ '=2 to improve auto-correlator accuracy and only one packet sync symbol pair are sufficient (However, matched-filter still use ' $\omega$ '=4 with 32-tap). Moreover, auto-correlator of LDPC-COFDM packet detection needs to turn on during noise signals until valid packet coming. But by using dynamic searching window of MB-OFDM system, only some possible window candidates cut by band detection are needed to turn on auto-correlator to check valid packet. We

normalized the turn on time of SB-OFDM system to '1' and measures the turn on ratio by adding noise signals with 30 OFDM symbols time duration. In our simulation, the turn on ratio of MB-OFDM system is only 4.76% compared with LDPC-COFDM system.

(2) FFT Window Detection: In LDPC-COFDM system, FFT window detection uses matched-filter to find FFT window boundary during one OFDM symbol time duration (312.5ns). But for MB-OFDM system, band detection by using dynamic searching window decides band boundary coarsely. Thus matched-filter only need to find the FFT boundary during ±16 samples (60.6ns) from the coarse band boundary, meaning that the turn on time ratio of MB-OFDM system is only 20% of LDPC-COFDM system. Some simulation shows FFT boundary variation probability for coarse timing synchronization (only band boundary information, without matched-filter) and fine time synchronization (FFT-window boundary information, with matched-filter) chapter 5.

(3) Preamble Timing Detection: In MB-OFDM system, time interleaving at three sub-bands makes effective packet sync symbols reduce to 1/3 from LDPC-COFDM system. Therefore, the turn on ratio of MB-OFDM system is only 33% from LDPC-COFDM system also.
# CHAPTER 5

# Simulation Result and Performance Analysis

In this chapter, the simulation result and performance analysis in 802.11a and OFDM-based UWB system is shown. Based on the system platform introduced in chapter 2, framer error rate (FER) of proposed design and packet error rate (PER) of our system between perfect case (FER=0) and proposed design will be compared.

# 5.1 Simulation of IEEE 802.11a System

Performance of the proposed low-complexity correlation scheme versus number of correlation taps for IEEE 802.11a system is shown in this section. To illustrate the influence of tap reducing, we simulate FER of FFT window detection by assuming perfect packet detection. In wireless communication, frame synchronization may dominate system performance in the lowest data rate. Thus, we claim our FER performance requirement by FIG 5.1. It simulates PER in 6M data rate under perfect frame synchronization (FER=0) in AWGN and IEEE-fading channel, with CFO=200KHz, SCO=40ppm, and delay spread RMS=100ns. FIG 5.1 shows PER reaches 10% requirement at SNR=2.7dB in multi-path channel, and SNR=0.95 dB in AWGN channel. Therefore, we claim FER=1%  $\leq$  SNR=1dB in AWGN channel and FER=1% less than SNR=3dB in multi-path channel.



(IEEE-fading channel channel: RMS =100ns, CFO=200KHz, SCO=40ppm)



FIG. 5.2 FER of pure AWGN channel, CFO=0KHz, RMS=0ns



FIG. 5.4 FER of IEEE-fading channel: RMS=150ns, CFO=0KHz



FIG. 5.6 FER of AWGN channel with CFO=100KHz



FIG. 5.8 FER of IEEE-fading channel: CFO=20KHz, RMS=100ns







FIG 5.2 is the FER performance versus tap number from N=64 to N=8 in AWGN channel. It shows that 32 most significant taps reaches FER=1% at SNR=-5dB and 16 most significant taps reaches 1% at SNR= -2.4dB. Since our system needn't work at -5dB, 'N'=16 is sufficient to FFT window detection in AWGN channel. For 'N'=8, FER=1% at SNR=2dB may degrade system performance in AWGN channel. Furthermore, the SNR loss between 'N'=16 and 'N'=8 is more than 4dB. Thus 'N'=16 is an efficient and reasonable tap number in AWGN channel.

Next, we take multi-path fading channel effect into account. FIG 5.3 and FIG 5.4 are simulation results in IEEE-fading channel with RMS delay spread=100ns and 150ns. As mention earlier, multi-path influences cross-correlation scheme a lot by different arrival paths. The most performance degradation happens to 'N'=8. Only using most-significant 8 taps for matched-filter to detect FFT window is insufficient to resist multi-path effect, leading FER saturation in 10%. FIG 5.3 shows 'N'=16 is sufficient to reach 1% FER in 2.4 dB. But for 150 RMS delay spread, 16 most-significant taps is not robust enough, thus we proposed 'N'=20 to resist 150ns RMS delay with FER=1% in 2.8dB.

Besides, enormous CFO effect may eliminate correlation characteristics between received signal and matched-filter coefficient because of large linear phase error in time domain. In our receiver data flow, coarse AFC will roughly compensate CFO effect before FFT window detection. Thus, we simulate FER versus tap number under three different CFO conditions from FIG.5.5 to FIG 5.7. In FIG 5.5, it shows FER versus tap number for a typical coarse AFC compensating CFO

effect from 200KHz to  $\leq$  20KHz as our system platform required. The performance curve of and 20KHz CFO (FIG 5.5) is very similar to 0KHz CFO (FIG 5.2), and SNR loss of 'N'=16 from CFO=0KHz to 20KHz is only 0.1 dB in FER=1%. It means that with a typical coarse AFC, CFO degrades only a little for the proposed low-complexity correlation scheme. FIG 5.6 is using a coarse AFC with worse performance that compensates CFO effect to ≤100KHz. In 802.11a system, performance of cross-correlation algorithm start degrading when CFO effect  $\geq$  50KHz [31]. Thus SNR loss of 'N'=16 from CFO=20KHz to 100KHz will be enlarged to 1 dB. But it still can resist CFO=100KHz to reach FER=1% at -1.2dB, fitting  $\leq 1$ dB requirement in AWGN channel. FIG 5.7 simulates CFO=200KHz without coarse AGC. In this figure, all the simulated curves can not reach our performance requirement (FER=1% with SNR $\leq$ 1dB). Without coarse AFC, serious linear phase error distorts received signals and fails FFT-window detection. Furthermore, tap number 'N' from 24 to 16 have better performance than N=64 to 32 in CFO=200KHz. Since received data corresponding to less-significant taps are distorted by CFO, thus correlation result will be interfered. Only using 16 most-significant taps may obtain more accurate correlation value than using all the taps of matched filter at this circumstance.

| TAP Number        | 64   | 56   | 48   | 40   | 32   | 24   | 20  | 16  | 8    |
|-------------------|------|------|------|------|------|------|-----|-----|------|
| CFO=0KHz (dB)     | -3.4 | -3.4 | -3   | -2.4 | -1.7 | -0.2 | 1   | 2.3 | Fail |
| CFO=20KHz(dB)     | -3.4 | -3.2 | -2.9 | -2.2 | -1.5 | -0.1 | 1.1 | 2.6 | Fail |
| SNR loss (FER=1%) | 0    | 0.2  | 0.1  | 0.2  | 0.2  | 0.1  | 0.1 | 0.3 | X    |

TABLE 5.1 SNR loss from CFO=0 to 20KHz at IEEE fading channel delay spread=100ns.

| TAP Number        | 64   | 56   | 48   | 40   | 32   | 24  | 20  | 16 | 8    |
|-------------------|------|------|------|------|------|-----|-----|----|------|
| CFO=0KHz (dB)     | -2.4 | -2.3 | -2.1 | -1.3 | -0.5 | 1   | 2.8 | 5  | Fail |
| CFO=20KHz (dB)    | -2.4 | -2.2 | -1.9 | -1.2 | -0.3 | 1.1 | 3.1 | 5  | Fail |
| SNR loss (FER=1%) | 0    | 0.1  | 0.2  | 0.1  | 0.2  | 0.1 | 0.3 | 0  | X    |

TABLE 5.2 SNR loss from CFO=0 to 20KHz at IEEE fading channel delay spread=150ns

| TAP Number        | 64   | 56   | 48                 | 40   | 32   | 24 | 16 | 8    |
|-------------------|------|------|--------------------|------|------|----|----|------|
| CFO=0KHz (dB)     | -2.4 | -2.3 | -2.1               | -1.3 | -0.5 | 1  | 5  | Fail |
| CFO=100KHz (dB)   | 0    | 0.1  | 0.6                | 2    | 3.1  | 5  | 10 | Fail |
| SNR loss (FER=1%) | 2.4  | 2.4  | 2.7 <sup>S</sup> ) | 3.3  | 3.6  | 4  | 5  | X    |

TABLE 5.3 SNR loss from CFO=0 to 100KHz at IEEE fading channel delay spread=150ns

We simulate FER versus tap number by considering both CFO and multi-path effect from FIG 5.8 to FIG 5.10. FIG 5.8 is the typical case for IEEE fading channel delay spread=100ns and residue CFO=20KHz compensated by coarse AFC. It shows that 'N'=16 can achieve 1% FER requirement at 2.6dB. FIG 5.9 simulates condition of worse multi-path effect with RMS delay spread=150ns. This case is similar to FIG 5.4 that resisting RMS delay=150 ns with N='20'. Maximum SNR loss between CFO=0KHz and CFO=20KHz listed in TABLE 5.1 (RMS delay=100ns) and TABLE 5.2 (RMS delay=150ns) is  $\leq$  0.3 dB. FIG 5.10 is the simulation in worst case of our system for IEEE fading channel delay spread=150ns and residue CFO=100KHz with a low performance coarse AFC. In such serious interference, performance degradation is highly

dependent on tap number 'N', as TABLE 5.3. We suggest 'N'=32 for our proposed scheme to reach FER=1% in SNR=3.1dB in such harmful channel condition.

FIG 5.11 and FIG 5.12 is the PER simulation of our system with CFO=200KHz, SCO=40ppm, and IEEE fading channel for different RMS delay spread. It shows that for 100ns RMS delay spread (FIG 5.11), SNR loss between perfect synchronization (FER=0) and using the most 16 significant taps is 0.35dB in 10% PER. Comparing simulation curve of total 64 taps and the most 16 significant taps, 75% correlation complexity can be reduced with 0.3dB SNR loss for 10% PER as trade off. However with 150ns RMS delay spread (FIG 5.12), SNR loss between perfect synchronization and the most 16 significant taps increases to 1dB and begin to dominate system performance. Another curve for using 20 most significant taps only results 0.5dB loss for 10% PER compared with the ideal case. Thus N=20 is more suitable than N=16 to resist RMS delay spread=150ns, similar to our FER simulation result.

Although the proposed scheme can efficiently save complexity of matched-filter used for FFT window detection with less performance degradation, it still decreases the ability to resist multi-path interference. According to our simulation result, the most 16 significant taps is sufficient to resist IEEE fading channel with RMS delay=100ns. Considering design margin in general, we proposed using the most 20 significant taps scheme since it is sufficient for IEEE fading channel with RMS delay=150ns for only 0.5dB SNR loss for 10% PER.

# 5.2 Simulation Result of LDPC-COFDM System

#### 5.2.1 Frame Error Rate of Tap-Reduction Scheme

In the proposed design, tap-reduction scheme reduces correlation taps for low-complexity improvement. The trade-off is performance degradation for synchronizer FER. Therefore, a reasonable reduction factor ' $\omega$ ' is needed to save computation cost with acceptable performance loss. In this section, we first discuss FFT window detection versus different number of taps used for matched-filter, then adding auto-correlation scheme (packet detection and preamble timing detection) for our analysis.



FIG. 5.13 Tap number versus FFT window detection

Condition: Multi-path channel RMS delay spread=5ns [21], CFO=400KHz

FIG 5.13 is the frame error rate of FFT window detection under perfect packet detection and preamble timing detection. Similar to 802.11a, we require FER=1% for SNR  $\leq$  3 dB in multi-path channel. The simulation circumstance is Intel channel model proposed in [21] with RMS delay spread=5 ns, CFO=400KHz with AFC compensating to  $\leq$  100KHz. It shows that ' $\omega$ '=4 reaching 1% FER in 0 dB SNR can meet our requirement. For ' $\omega$ '=8, FER converges a little slowly and reaches 2% FER in 5dB; ' $\omega$ '=16 fails the detection with too much performance loss.



FIG. 5.14 Tap-number versus Framer Synchronizer Condition: Multi-path channel RMS delay spread=5ns [21], CFO=400KHz

FIG 5.14 is the performance of frame synchronizer versus tap-reduction scheme for different parameter ' $\omega$ ' (with AFC compensating to  $\leq 100$ KHz). The main function of packet detection and preamble timing detection is auto-correlation scheme, which is sensitive to high AWGN noise, and it degrades FER about 2 to 3 dB compared with FIG 5.13. However, the

proposed design with ' $\omega$ '=4 still can reach our performance requirement in FER=1% at SNR=2.1dB. For ' $\omega$ '=8, it will be a failed design since frame error rate converges slowly and seriously degrades system performance. The FER degradation from the conventional ('w'=1) to the proposed (' $\omega$ '=4) design is about 2.5dB in FER=1%. But for our system, the proposed frame synchronizer results only a little loss for 8% PER as FIG 5.15. From FIG 5.15, simulation curve of ' $\omega$ '=8 results in synchronization loss more than 3dB SNR for 8% PER from simulation curve of ' $\omega$ '=4, seriously degrades system platform. On the other hand, synchronization loss of ' $\omega$ '=4 to reduce matched-filter complexity. By setting ' $\omega$ '=4, the proposed design can save 75% correlation complexity which greatly reduce area cost of matched-filter and auto-correlator from conventional design.



FIG. 5.15 Tap-number versus PER

Multi-path channel RMS delay spread=5ns [21], CFO=400KHz, SCO=40ppm



FIG. 5.17 Performance of dynamic and fixed threshold design Multi-path channel: RMS delay spread=5ns [21], CFO=400KHz  $_{72}$ 

#### 5.2.2 Performance of Dynamic Threshold

In preamble timing detection, since auto-correlation result has different probability distribution in different channels, the pre-defined threshold to reach lowest error probability will vary with environment conditions. In FIG 5.16, we simulate the threshold value to reach lowest error probability (curve with squarer mark) in multi-path channel with RMS delay =5ns and CFO400KHz from -3 to 5 dB. Because auto-correlation scheme is highly sensitive to AWGN, the simulated threshold value changes a lot with different SNR regions. Another simulation curve is the mean value of proposed dynamic threshold (10000 packets for each SNR). It shows that by tuning the constant factor ' $\epsilon$ ' properly in Eq 4.10, the dynamic threshold design can approach the simulated threshold value of lowest error probability with different SNR regions. Therefore, the dynamic threshold will have better performance than the conventional fixed threshold design by automatic varying its threshold according to channel conditions.

FIG 5.17 is the error decision probability of preamble timing detection for fixed threshold in some values and proposed dynamic threshold. In fixed-threshold design, each threshold value curve has low error decision rate only in a few SNR regions. For example, "threshold value=0.04" has low error decision rate from -3 to -1 dB (low SNR region). On the other hand, "threshold value=0.1" has low error decision rate from 4 to 5 dB (high SNR region). However, the proposed dynamic threshold can have lower error decision rate from -3 to 5 dB (suitable for both high or low SNR region). Thus it can have lower error decision probability for much larger SNR regions compared with the conventional fixed threshold design setting at certain value.

#### 5.2.3 System Performance

Simulation of our system for perfect synchronization and proposed design (' $\omega$ '=4) is shown as FIG 5.18 and FIG 5.19. We focus on 8% PER [8] for performance loss comparison at the three supported data rates (120M, 240M, 480M b/s). FIG 5.18 simulates in pure AWGN channel and performance loss between perfect synchronization and proposed design is less than 0.1dB SNR for the three supported data rates. FIG 5.19 simulates in multi-path channel with 5ns RMS delay spread [21], CFO=40ppm(400KHz), and SCO=40ppm. Performance loss of 120Mb/s is a little more than 240Mb/s and 480Mb/s because frame synchronizer has more influence on system performance at low data rates. The detail SNR loss between perfect synchronization and proposed design for 8% PER is listed at TABLE 5.4. The maximum SNR loss (120M b/s) is still less than 0.4dB. Since the maximum SNR loss of platform simulation is an acceptable value, the proposed design is suitable for 480Mb/s LDPC-COFDM UWB system with 528MS/s throughput.

| SNR loss [dB]           | 120 Mb/s | 240 Mb/s | 480 Mb/s |
|-------------------------|----------|----------|----------|
| AWGN channel            | 0.05     | 0.04     | 0.04     |
| Multi-path channel [21] | 0.38     | 0.31     | 0.25     |



FIG. 5.19 Multi-path channel RMS delay spread=5ns [21], CFO=400KHz, SCO=40ppm

### 5.3 Simulation Result of MB-OFDM System

#### 5.3.1 Boundary Variation Distribution

In this section, we simulate the boundary variation of FFT window detection in both AWGN channel and multi-path channel specified by the 802.15.3a channel modeling sub-committee report [22] for low SNR condition (SNR=2dB) and high SNR condition (SNR=20dB) with CFO=400KHz. In the following figures, "matched-filter off" represents only turning on dynamic searching window (coarse band detection) without matched-filter (FFT window detection) to find boundary; "matched-filter 128 taps" represents conventional matched-filter using all 128 compared taps to find boundary.

In AWGN channel environment, as long as the estimated FFT window boundary is located during pre guard intervals (32 sample indexes), circular convolution property can be obtained for FFT transforming received data into frequency domain. FIG 5.20 and FIG 5.21 are the boundary variation in AWGN channel. It shows that only using dynamic window is sufficient to converge FFT window boundary in 16 variation samples (-8 to 7) with FER<1%, and boundary variation of dynamic window will be greatly improved by increasing SNR as shown in FIG.5.21.

But in multi-path channel environment, FFT window boundary will not always locate at the estimated index of dynamic window because of multi-path interference. Moreover, boundary variation effects the switching time of received time-interleaved OFDM symbols, degrading circular convolution property of received OFDM symbols and reducing the ability to resist multi-path interference. The loss of multi-path energy not captured in cyclic prefix (CP) will results in ICI effect. To find the correct FFT window boundary as possible, matched-filter is needed for FFT window detection. Since the 802.15.3a channel modeling sub-committee report specified four multi-path channel characteristics and corresponding measured model parameters from CM1 to CM4, and TI has proposed using 90<sub>th</sub> percentile channel realization (the best 90% channel model) of CM1~CM4 environment for performance evaluation [7]. We measure the boundary variation of CM1~CM4 for original and the 90<sub>th</sub> percentile channel realization from FIG 5.22 to FIG 5.37.

# CM1 channel model is based on a 0~4m line of sight (LOS) channel environment with an RMS delay spread of 5ns. From FIG 5.22 to FIG 5.25, it shows that only turn on dynamic window detection has boundary variation of 12 indexes in 90<sub>th</sub> percentile CM1 channel, and the residual effective CP length in worst variation case becomes only 62.5% from the original CP length (60.6ns). But by adding matched-filter for FFT window detection, residual effective CP can increase to 93.75% from the original CP for conventional 128-tap matched-filter and 87.5% for proposed 32-tap matched-filter in SNR=2dB.

CM2 channel model is based on a 0~4m non line of sight (NLOS) channel environment with an RMS delay spread of 8ns. It is usually seen as the typical environment for UWB channel model. FIG 5.26 to FIG 5.29 shows the boundary variation of  $90_{th}$  percentile CM2 channel. The residual effective CP of dynamic window in worst variation case becomes only 50% from the original CP and distorts circular convolution property seriously. Conventional 128-tap matched-filter still maintain effective CP of 93.75% from the original CP, and for proposed 32-tap matched-filter, effective CP is 81.25% from the original CP which has a little degradation from CM1 channel.

CM3 channel model is based on a 4~10m NLOS channel environment with an RMS delay spread of 14ns and boundary variation distribution shows in FIG 5.30 to FIG 5.33. Boundary variation of dynamic window extends to 19 indexes. Thus in worst variation case residual effective CP is less than 40% from the original CP. Conventional 128-tap matched-filter has effective CP of over than 90% from the original CP; and proposed 32-tap matched-filter has effective CP of 78.125% from the original CP.



CM4 channel model is generated to fit a 25 ns RMS delay spread to represent an extreme **TGGO** NLOS channel environment and is the worst case of UWB channel model. FIG 5.34 to FIG 5.37 shows the boundary variation distribution. The serious multi-path interference makes dynamic window varies 25 sample indexes (from -10 to +15) in 90<sub>th</sub> percentile CM4 channel. Thus we must use matched-filer researching FFT window boundary of  $\pm 16$  sample indexes from the estimated boundary of dynamic window. Conventional 128-tap matched-filter converges boundary variation to 4 sample indexes, still maintaining effective CP of 87.5% from the original CP. On the other hand, proposed 32-tap matched-filter has effective CP of 71.125% from the original CP. But it still can resolve CM4 channel model with acceptable SNR loss in PER simulation compared with perfect synchronization.



FIG. 5.20 Boundary variation in AWGN channel with CFO=400KHz, SNR=2dB



FIG. 5.21 Boundary variation in AWGN channel with CFO=400KHz, SNR=20dB



FIG. 5.22 Boundary variation in original CM1 channel, CFO=400KHz, SNR=2dB



FIG. 5.23 Boundary variation in best 90% CM1 channel, CFO=400KHz, SNR=2dB



FIG. 5.24 Boundary variation in original CM1 channel, CFO=400KHz, SNR=20dB



FIG. 5.25 Boundary variation in best 90% CM1 channel CFO=400KHz, SNR=20dB



FIG. 5.26 Boundary variation in original CM2 channel, CFO=400KHz, SNR=2dB



FIG. 5.27 Boundary variation in best 90% CM2 channel, CFO=400KHz, SNR=2dB



FIG. 5.28 Boundary variation in original CM2 channel, CFO=400KHz, SNR=20dB



FIG. 5.29 Boundary variation in best 90% CM2 channel, CFO=400KHz, SNR=20dB



FIG. 5.31 Boundary variation in best 90% CM3 channel, CFO=400KHz, SNR=2dB 84



FIG. 5.32 Boundary variation in original CM3 channel, CFO=400KHz, SNR=20dB



FIG. 5.33 Boundary variation in best 90% CM3 channel, CFO=400KHz, SNR=20dB



FIG. 5.35 Boundary variation in best 90% CM4 channel, CFO=400KHz, SNR=2dB



FIG. 5.36 Boundary variation in original CM4 channel, CFO=400KHz, SNR=20dB



FIG. 5.37 Boundary variation in best 90% CM4 channel, CFO=400KHz, SNR=20dB

#### 5.3.2 System Performance

In this section, performance of our MB-OFDM system is shown by PER simulation, and performance degradation between perfect synchronization and proposed frame synchronizer will be discussed also. In wireless communication, frame synchronizer usually dominates system performance at low SNR condition when system operates at low data rates. Thus we simulate PER at data rate=110Mb/s for the specified 802.15.3a channel models by using perfect frame synchronizer, conventional 128-tap matched-filter, and proposed 32-tap matched-filter for CM1 to CM4 environments. Finally, the performance of proposed design will be shown for 110Mb/s~480Mb/s data rates with the worst required CM channel environment.

FIG 5.38~5.41 are PER simulation at 110Mb/s data rate for CM1~CM4 channel environments and TABLE 5.5 lists the required SNR for PER=8% at 110Mb/s data rate. In the simplest CM1 channel environment (FIG 5.38), the simulated curve of proposed design is very similar to conventional design and perfect synchronization. In the typical CM2 channel environment (FIG 5.39), the proposed design degrades system performance from perfect synchronization with 0.1dB SNR loss for 8% PER, while conventional design degrades 0.05dB SNR loss. In the worse CM3 channel environment (FIG 5.40), performance degradation of proposed design increases to 0.16dB and conventional design increases to 0.1dB. The degradation of reducing tap number from128 taps to 32 taps is almost the same at CM2 or CM3 channel. In the worst CM4 channel environment (FIG 5.41), proposed design has 0.45dB SNR synchronization design. FIG 5.42 shows PER of proposed design at CM4 110 Mb/s, CM4 200Mb/s, and CM2 480 Mb/s data rate. The maximum SNR synchronization loss is 0.45 SNR for 8% PER at 110Mb/s data rate in CM4 channel environment. System required SNR and proposed design performance for 8% PER are listed at TABLE 5.6. It shows that proposed design can meet system SNR requirement for 110M~480Mb/s data rates. FIG 5.43 is the transmission distance of proposed design and system requirement are listed at TABLE 5.7. The transmission distance of proposed design also meets system requirement.





FIG. 5.39 PER of CM2 channel at data rate=110 Mb/s, CFO=400KHz, SCO=40ppm



FIG. 5.41 PER of CM4 channel at data rate=110 Mb/s, CFO=400KHz, SCO=40ppm

| SNR (dB) for PER=8%            | CM1  | CM2  | CM3  | CM4  |
|--------------------------------|------|------|------|------|
| Perfect (FER=0)                | 5.08 | 5.26 | 5.64 | 6.27 |
| Conventional (128 taps)        | 5.12 | 5.31 | 5.74 | 6.49 |
| Proposed (32 taps)             | 5.13 | 5.36 | 5.80 | 6.72 |
| SNR Loss (perfect vs 128 taps) | 0.04 | 0.05 | 0.1  | 0.22 |
| SNR Loss (perfect vs 32 taps)  | 0.05 | 0.1  | 0.16 | 0.45 |
| SNR Loss (128 taps vs 32 taps) | 0.01 | 0.05 | 0.06 | 0.23 |

TABLE 5.5 Required SNR for PER=8% of CM1~CM4 at 110Mb/s data rate



FIG. 5.42 PER at 110 $\sim$ 480 Mb/s data rate for required worst CM channel

CFO=400KHz, SCO=40ppm

| Date Rate (Mb/s) | CM Channel | SNR loss(dB) | Required SNR(dB) | Proposed SNR(dB) |
|------------------|------------|--------------|------------------|------------------|
| 110              | CM4        | 0.45         | 7.1              | 6.72             |
| 200              | CM4        | 0.44         | 15.2             | 15.13            |
| 480              | CM2        | 0.105        | 21.1             | 19.16            |

TABLE 5.6 Performance of proposed design for 8% PER of 90th percentile CM channel realization



FIG. 5.43 PER versus transmission distance at CFO=400KHz, SCO=40ppm

| Date Rate (Mb/s) | CM Channel | Required Distance (m) | Proposed Distance (m) |
|------------------|------------|-----------------------|-----------------------|
| 110              | CM4        | 10                    | 10.53                 |
| 200              | CM4        | 4                     | 4.05                  |
| 480              | CM2        | 2                     | 2.49                  |

TABLE 5.7 Transmission distance of proposed design
# **CHAPTER 6**

# Hardware Implementation and Measured Result

In this chapter, the architecture of the proposed low complexity frame synchronizer with 528MS/s throughput for 120M/bs~480Mb/s data rates UWB system will be introduced. Some measured result of hardware implementation, including area cost, power consumption, and CHIP micro-photo will be shown also.

## 6.1 Design Architecture

FIG 6.1 is architecture of the proposed frame synchronizer. It comprises a shared auto-correlator (used for packet detection and preamble timing detection), a tap-reduction matched-filter (used for FFT window detection), address-based register-files and a control unit. Through 5-bit ADC working at 528MHz clock rate, the received signals will be divided into 4-parallel data paths. Thus each path transfers signals at 132MHz clock rate. Then shared auto-correlator starts to detect valid packet from noise signals. By using tap-reduction scheme with reduction factor ' $\omega$ '=4, only one data path is needed to sent to the shared auto-correlator at 132MHz clock rate. And control unit controls a 4 to1 MUX to change the selected path in order for every quarter symbol time to balance multi-path interference. The pre-defined threshold of packet detection can be modified according to user's requirement to maintain flexibility. By applying the register-sharing algorithm, address-based register-files were proposed to replace the conventional

FIFO. It only updates one word data for every one cycle at 132MHz clock rate. After tap-reduction matched-filter finds out FFT window boundary, shared auto-correlator switches on dynamic-threshold calculator to decide first frame sync symbol and cuts the appropriate data for FFT.



FIG. 6.1 Architecture of proposed frame synchronizer



FIG. 6.2 Detail architecture of tap-reduction matched-filter

#### 6.1.1 Detail Architecture of Tap-Reduction Matched-Filter

FIG 6.2 shows the detail architecture of tap-reduction matched-filter for proposed frame synchronizer. When FFT window detection begins, register-files send stored data in parallel to compare with the matched-filter taps. To maintain time resolution of FFT window detection, tap-reduction matched-filter requires 528MS/s throughput. We achieve such high throughput by parallel architecture with 4 sub matched-filters and use tap-reduction scheme to eliminate enormous hardware cost resulted from parallel architecture. Thus each sub matched-filter only works at 132MHz clock rate. With a reduction factor ' $\omega$ '=4, tap number will be reduced from 128 to 32. And register-files with 32 words are sufficient for the proposed design. Furthermore, register-files also need to work at 528MHz clock rate by using conventional parallel algorithm. It is impossible for .18um CMOS process and parallel 4 suits register-files with 132MHz clock rate is required, causing much gate-count cost and power consumption of register accessing. This problem also can be resolved by using register-sharing algorithm. By using the register-sharing algorithm, only one suit register-files with 32 words working at 132 MHz is needed, since stored data can be shared for the 4 sub matched-filters in parallel. TABLE 6.1 lists register-files requirement of the conventional parallel design and the proposed design.

|              | Data bit | Tap bit | Data cost (I/Q) | Tap cost | Total cost |
|--------------|----------|---------|-----------------|----------|------------|
| Conventional | 4        | 1       | 1024(bit)       | 32(bit)  | 1056(bit)  |
| Proposed     | 4        | 1       | 256(bit)        | 128(bit) | 385(bit)   |

TABLE 6.1 Register-files cost of the conventional and the proposed design

To implement the cross-correlation algorithm used by matched filter, amount of complex multipliers is needed. But in our design, since sync sequences have constant amplitude with opposite polarities (1 or -1). The complex multiplier will be simplified to adder/substractor and we use the corresponding matched-filter taps to control function of adder/substractor. Then the 32 sub cross-correlation value of 32 compared taps will be summed and an squarer computes the cross-correlation power. Finally cross-correlation power of the 4 sub matched-filters sends to the 4-input peak sorter finding out the correct FFT window boundary. The peak sorter implements the TOP '5' pre-cursor searching scheme [3] to resist multi-path interference.

# 6.1.2 Detail Architecture of Shared Auto-Correlator

FIG 6.3 shows the detail architecture of shared auto-correlator. At first, the complex **1996** multiplier computes sub auto-correlation value from incoming data selected by 4 to 1 MUX and previous data stored by register-files. Then the accumulator calculates the summation of each symbol (A(X) in Eq 4.4 or D(Y) in Eq 4.6 ). When packet detection works, an squarer computes the power of auto-correlation result and AGC sends the estimated power of each symbol(P(X) in Eq 4.4) for normalization. The normalized auto-correlation power will be compare with the pre-defined threshold to detect the valid packet. When preamble timing detection works, a register will delay D(Y) one symbol time to get D(Y - I), and the squarer computes the power of D(Y - I) adding D(Y - I). At the same time, dynamic threshold calculator generates the dynamic threshold  $\Gamma$  for comparator. To simplify our architecture, decision function (Eq 6.1) will be modified as Eq

$$\frac{\left|D_{Y}+D_{Y-1}\right|^{2}}{\left|P_{Y}+P_{Y-1}\right|^{2}} \ge \frac{\left|D_{Y-1}+D_{Y-2}\right|^{2}}{\left|P_{Y-1}+P_{Y-2}\right|^{2}} \times \mathcal{E}$$
(Eq 6.1)
$$\left|D_{Y}+D_{Y-1}\right|^{2} \ge \left(\left|D_{Y-1}+D_{Y-2}\right|^{2} >> 2\right)$$
(Eq 6.2)

Since AGC tunes the correct RF gain and holds it after valid packet detection, the estimated symbol power will be almost constant  $(P_Y^2 \cong P_{Y-1}^2 \cong P_{Y-2}^2)$ . Thus the denominator of Eq 6.1 can be eliminated. Moreover, the constant factor ' $\varepsilon$ ' of Eq 4.10 is set to 1/4 and can be replaced with a bit shifter to shift right 2 bits. Thus the dynamic threshold calculator can be implemented for one bit shifter and one delay register.



FIG. 6.3 Detail architecture of shared auto-correlator

#### 6.1.3 Address-Based Register-Files



FIG. 6.4 Architecture of Address-Based Register-Files

In the proposed design, FFT window detection needs register-files to work as FIFO for the vector operations of cross-correlation algorithm. The simplest FIFO constructed by shift registers is low gate-count cost. However, all used registers of FIFO shift their data at every cycle results in much dynamic power dissipation. For low power consideration, we replace FIFO with the address-based register-files as FIG 6.4, where "N" is the word length of register-files. When user asserts "EN" pin, an upper counter counts the "Addr" signals from 0 to N-1 repeatedly until "EN" disabled. By comparing "Addr" signal, only one of the "N" register-files can access the input data to update its value and other N-1 register-files still hold their stored value. Therefore, it can save 1/N dynamic power dissipation of register-files from the shifter-based FIFO. The address-based

register-files are similar to RAM but it can support parallel data operations required by cross-correlation algorithm.

## 6.2 Hardware Measured Result

TABLE 6.2 shows the hardware synthesis results for proposed frame synchronizer in 0.18um cell library at clock rate=166MHz (clock period=6ns). The most area cost components of proposed design is the 4 sub matched-filters in parallel. Although 75% correlation complexity has been reduced from the conventional design by tap-reduction scheme, Parallel 4 sub matched-filters still involves 46% area of the proposed design. TABLE 6.3 is the area cost comparison between the proposed and the conventional design. It shows that the proposed design can save 65.65% area cost from the conventional design.

| Function               | Sub-module                     | Gate Count (K) |
|------------------------|--------------------------------|----------------|
| Memory                 | Address-based Register-files   | 7.2K           |
| Sharad Auto Correlator | Auto-Correlator                | 3К             |
| Shared Auto-Correlator | Squarer                        | 2.8K           |
|                        | Parallel 4 Sub Matched-Filters | 23K            |
| Tap-Reduction          | Peak Sorter                    | 4.4K           |
| Matched-Filter         | Squarer                        | 5.6K           |
| Control Unit & Others  |                                | 3.6K           |
| Total                  |                                | 49.6K          |

TABLE 6.2 Gate-count cost of the proposed frame synchronizer

| Gate-Count              | Memory | Matched-Filer | Auto-Correlator & Others | Total  |
|-------------------------|--------|---------------|--------------------------|--------|
| The Conventional Design | 28.8K  | 106K          | 9.6K                     | 144.4K |
| The Proposed Design     | 7.2K   | 33k           | 9.4K                     | 49.6K  |
| Reduced Percentage      | 14.96% | 50.55%        | 0.14%                    | 65.65% |

 TABLE 6.3 Area cost comparison (0.18um cell library)

| Power Consumption (mW)  | Memory | Matched-Filer | Auto-Correlator & Others | Total  |
|-------------------------|--------|---------------|--------------------------|--------|
| The Conventional Design | 26.1   | 15.4          | 7.5                      | 49.0   |
| The Proposed Design     | 6.5    | 6.7           | 7.3                      | 20.5   |
| Reduced Percentage      | 40%    | 17.76%        | 0.4%                     | 58.16% |

TABLE 6.4 Power consummation comparison (post-layout simulation)

TABLE 6.4 is the power consummation table of post-layout simulation in 528MS/s throughput. Combining tap-reduction scheme and register-sharing algorithm, the proposed design saves 58.16% power consumption power consumption from the conventional design.

### 6.3 OFDM-Based UWB Baseband Transceiver

Our OFDM-based UWB Baseband transceiver was implemented by 0.18um standard CMOS process and tested completely. FIG 6.5 shows the microphoto of all Baseband Processor and the zoom-in microphoto of the proposed frame synchronizer. Core size of the proposed frame synchronizer is 2.35mm<sup>2</sup>, and it is only 6.53 % of the baseband transceiver core. The CHIP summary and measured result are listed in TABLE 6.5.



FIG. 6.5 Microphoto of the UWB transceiver CHIP in 0.18um process

| and room in Microp | hoto of the propo | sed frame synchro | nizer |
|--------------------|-------------------|-------------------|-------|
|                    |                   | E                 |       |

2.

| Technology               | 0.18um CMOS, 1P6M             |  |  |
|--------------------------|-------------------------------|--|--|
| Supply Voltage           | 1.8V Core, 3.3V I/O           |  |  |
| Package                  | 208-pin CQFP                  |  |  |
| Die Size(including PADs) | $6.5 \times 6.5 \text{ mm}^2$ |  |  |
| Core Size                | 6.05 x 6.05 mm <sup>2</sup>   |  |  |
| Gate-Count               | 1.064M                        |  |  |
| Maximum working Freq.    | 264MHz                        |  |  |
| Core Power Consumption   | 523mW/575mW                   |  |  |
| at 480Mb/s (TX/RX)       |                               |  |  |

#### TABLE 6.5 UWB transceiver CHIP summary

# CHAPTER 7

# **Conclusion and Future Work**

According to the algorithm representation and performance analysis, a low complexity frame synchronizer for OFDM system applications is proposed. It comprises three main features: taptap-reduction scheme, register-sharing algorithm and dynamic threshold design. Tap-reduction scheme uses a reduction factor 'w' to reduce redundant computation and saves design complexity to  $1/\omega$  by reducing the received data in spread. Register-sharing algorithm resolves the growing size of register-files in linear when using parallel approaches. It shares the received data by data-rescheduling the compared taps of matched-filter in parallel. The proposed dynamic threshold improves frame error rate by varying the compared threshold with channel condition properly. To evaluate performance of proposed design, we simulate system packet error rate in multi-path channel for different platforms. In LDPC-COFDM system, synchronization loss for 8% PER in Intel channel model with RMS delay spread=5ns is 0.25~0.38 dB SNR at 120~480Mb/s data rates. In MB-OFDM system, synchronization loss for 8% PER of the 90th percentile channel realization in IEEE 802.15.3a CM channel is 0.105~0.45 dB SNR at 110~480Mb/s data rates. The transmission distance of proposed design is 2.49~10.53 meters at 110~480Mb/s data rates, meeting system requirement of MB-OFDM-based UWB systems.

Among the proposed design, matched-filters save 50% gate-count and 18% power consumption by applying tap-reduction scheme to reduce the number of parallel complex multipliers to one quarter of conventional design. And register- files save 15% gate-count and 40%

power consumption by applying register-sharing algorithm to reduce required size of register-files to one quarter of conventional design. Overall, proposed design can save 65.65% gate-count and 58.16% power consumption from conventional parallel approaches with 128-tap matched-filter.

Although 58%~65% hardware cost is saved from conventional parallel approaches by tap-reduction scheme and register-sharing algorithm, matched-filter still dominates 66% gate count and 33% power consumption of proposed design If we can do FFT window detection by replacing matched-filter, a novel frame synchronizer with lower hardware cost can be implemented. At present, coarse band detection without matched-filters can complete frame synchronization successfully in AWGN channel. But for multi-path environment, serious boundary violation reduces the effective CP length and degrades the ability to resist ISI effect. Therefore, we will focus on improving the searching accuracy of FFT-window boundary to propose another matched-filter free frame synchronizer in the future.

# **Bibliography**

- Salzberg, B.R, "Performance of an efficient parallel data transmission system," *IEEE Trans. Comm.*, Vol. COM-15, pp.805-813, Dec. 1967.
- [2] Rechard Van Nee, and Ramjee Prasad, "OFDM for Wireless Multimedia Communications", pp.20-51, 2000.
- [3] Chun-Chi Chen, "A SUCCESSIVE TIMING SYNCHRONIZATION METHOD FOR OFDM-BASED WIRELESS LOCAL AREA NETWORK," M.S. thesis, National Chiao Tung University, Summer 2003.
- [4] Wei-Che Chang, Lin-Hung Chen, Wan-Chun Liao, Hsuan-Yu Liu, and Chen-Yi Lee, "An Area and Power Efficient Frame Synchronizer for 480Mb/s OFDM-based UWB System" VLSI-TSA-DAT, April 2005
- [5] IEEE 802.11, IEEE Standard for Wireless LAN Medium Access Control and Physical Layer Specifications, Nov. 1999.
- [6] ESTI TS 101 475 "Broadband radio access network (BRAN); Hiperlan type 2; Physical layer," April 2001.
- [7] A. Batra, J. Balakrishnan, G.R. Aiello, J. R. Foerster, A. Dabak, "Design of A Multiband OFDM System for Realistic UWB Channel Environments," *IEEE Transactions on Microwave Theory andTechniques*, pp.2123-2138, Sept. 2004.
- [8] Hsuan-Yu Liu, Chien-Ching Lin, Yu-Wei Lin Ching-Che Chang, Kai-Li Lin, Wei-Che Chang, Lin-Hong Chen, Hsie-Chia Chang, and Chen-Yi Lee, "A 480Mb/s KDPC-COFDM-based UWB Baseband Transceiver in 0.18um CMOS Process," *ISSCC*, Feb 2005
- [9] Marian Verhelst, Wim Vereecken, Michiel Steyaert, and Wim Dehaene, "Architecture for Low Ultra-Wideband Radio Receivers in The 3.1-5GHz Band for Data Rates <10Mbps, "International Symposium on Low Power Electronics And Design, August 2004.</p>
- [10] D. O'Donnell, S. W. Chen, B. T. Wang, and R. W. Brodersen "An Integrated, Low Power, Ultra-Wideband Transceiver Architecture for Low-Rate Indoor Wireless System," *IEEE CAS Workshop on Wireless Communications and Networking*, Sep. 2002.
- [11] Chia-Hsiang Yang, Yu-Hsuan Lin, Shih-Chun Lin, Tzi-Dar Chiueh, "Design of a low-complexity receiver for impulse-radio ultra-wideband communication systems," *Circuits* and Systems, 2004(ISCAS '04), Proceedings of the 2004 International Symposium on Volume

4, 23-26 Page(s):IV - 125-8 Vol.4, May 2004

- [12] Keller, T., Piazzo, L., Mandarini, P., Hanzo, L., "Orthogonal frequency division multiplex synchronization techniques for frequency-selective fading channels," *Selected Areas in Communications*, IEEE Journal on Volume 19, Issue 6, Page(s):999 – 1008, June 2001
- [13] L. Schwoerer, "VLSI Suitable Synchronization Algorithms and Architecture for IEEE 802.11a Physical Layer," *IEEE International Symposium on Circuits and Systems*, vol. 5, pp. 721-724, May 2002.
- [14] ESTI EN 300 401 "Radio broadcasting systems; digital audio broadcasting (DAB) to mobile; portable and fixed receivers," May 2001.
- [15] ESTI EN 300 744 "Digital vedio broadcasting (DVB); framing structure, channel coding and modulation for signal digital terrestrial television," Jan. 2001.
- [16] Weinstein, S.B. and P.M. Ebert, "Data Transmission by Frequency Division Multiplexing Using the Discrete Fourier Transform," *IEEE Trans. Comm.*, Vol. COM-19, pp.628-634, Oct. 1971.
- [17] Win, M.Z., Scholtz, R.A., "Impulse radio: how it works," *Communications Letters*, IEEE Volume 2, Issue 2, Page(s):36 38, Feb. 1998
- [18] Chia-Hsiang Yang, Yu-Hsuan Lin, Shih-Chun Lin, Tzi-Dar Chiueh, "Design of a low-complexity receiver for impulse-radio ultra-wideband communication systems," *Circuits* and Systems, 2004(ISCAS '04), Proceedings of the 2004 International Symposium on Volume 4, 23-26 Page(s):IV - 125-8 Vol.4, May 2004
- [19] A. Batra et al., "Multi-band OFDM Physical Layer Proposal," Submitted to *IEEE 802.15 TG3a*, Sep. 2003.
- [20] Bob O'Hara, Al Petrick, "The IEEE 802.11 Handbook", New York. IEEE press, 1999.
- [21] J. Foerster and Q. Li, "UWB Channel Modeling Contribution from Intel," *IEEE P802.15-02/279-SG3a*, June 2002.
- [22] J. Foerster, Ed., "Channel Modeling sub-committee report final,",IEEE802.1f-02/490
- [23] P. H. Moose, "A Technique for OFDM frequency offset correction", IEEE TRANS. COMMUN, vol.42, Oct. 1994, pp2908-2914.
- [24] T. M. Schmidl, D. C. Cox, "Robust Frequency and Timing Synchronization for OFDM," *IEEE Transactions on Communication*, vol. 45, no. 12, Dec. 1997.
- [25] Kabulepa, L.D., Garcia Ortiz, A., Glesner, M., "Power reduction techniques for an OFDM burst synchronization core,"; *Circuits and Systems, 2002. (ISCAS 2002)*, IEEE International

Symposium on Volume 1, 26-29, Page(s):I-265 - I-268 vol.1, May 2002

- [26] Krstic, M., Troya, A., Maharatna, K., Grass, E., "Optimized low-power synchronizer design for the IEEE 802.11a standard," *Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)*, IEEE International Conference on Volume 2, 6-10 Page(s):II 333-6 vol.2, April 2003
- [27] Pollet, T., Van Bladel, M., Moeneclaey, M, "BER sensitivity of OFDM systems to time synchronization error," Time Synchronization Error", *Communications, IEEE Transactions* on, Volume: 43 Issue: 2 Page(s): 191 -193, Feb/Mar/Apr 1995
- [28] Krstic, M., Troya, A., Maharatna, K., Grass, E., "Optimized low-power synchronizer design for the IEEE 802.11a standard," *Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)*, IEEE International Conference on Volume 2, 6-10 Page(s):II -333-6 vol.2, April 2003
- [29] Taehyeun Ha, Seongjoo Lee, Jaseok Jim, "Low-complexity correlation system for timing synchronization in IEEE802.11a wirelessLANs,", *Radio and Wireless Conference, 2003*, RAWCON '03. Proceedings 10-13 Page(s):51 54, Aug. 2003
- [30] Lin-Hung Chen, Wei-Che Chang, Hsuan-Yu Liu, and Chen-Yi Lee, "A 528MS/s Frequency Synchronizer for OFDM-based UWB System" VLSI-TSA-DAT, April 2005
- [31] Fort, A., Weijers, J.-W., Derudder, V., Eberle, W., Bourdoux, A., "A performance and complexity comparison of auto-correlation and cross-correlation for OFDM burst synchronization," *Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP* '03) IEEE International Conference on Volume 2, 6-10, Page(s):II - 341-4 vol.2, April 2003

自傳

敝人在民國 70 年出生於台北市,出生後即居住於台北縣中和市。畢業於台 北縣永和市立網溪國小,台北縣永和市立永和國中,台北市立建國中學後,就 讀於國立交通大學電機與控制工程學系。92 年經由推薦甄試進入交通大學電子 研究所系統組,指導教授為李鎮宜博士。小學時期即對中國古典音樂養成濃厚 興趣,曾進入網溪國小國樂班就讀,並於高中時期加入建中國樂社。大學時期 就讀於交通大學電機與控制工程學系時,曾獲得大一上至大三下各學期之書卷 獎,畢業成績為系上第一名,獲得中華民國斐陶斐榮譽學會榮譽會員。社團活 動以交通大學國樂社為主,參加過兩屆交通大學寒假國樂研習營與三屆北區大 專組國樂合奏比賽,並與團友多次參加學術晚宴伴奏。從大二修課後發現自己 對數位積體電路設計極感興趣,故大學畢業後推徵進入交通大學電子工程研究 所系統組,也獲得碩一下學期系統組書卷獎。研究領域為無線通訊接收端之基 頻框架同步器,其中以 IEEE 802.11a 規格與基於 OFDM 技術之 UWB 系統尤有心 得。碩士論文為應用於正交分頻多工技術為基礎之低複雜度接收端基頻框架同 步器( Study on Low Complexity Baseband Frame Synchronization for OFDM **Applications**) •

109