# **Chapter 3**

# **Physical Layer Specification of DMT-based VDSL**

Here we introduce the physical layer specification of ANSI DMT-based VDSL, T1E1.4 [3][4]. A suitable initialization procedure and transceiver architecture for the ANSI VDSL system is described.

# 3.1 System Parameters and Requirements

Table 3.1 lists key parameters in ANSI VDSL physical layer specification.

| Number of Sub-carriers        | 256 / 512 / 1024 / 2048 / 4096                |  |  |
|-------------------------------|-----------------------------------------------|--|--|
| <b>Block Size of IDFT/DFT</b> | 512 / 1024 / 2048 / 4096 / 8192               |  |  |
| <b>Cyclic Extension</b>       | $40/80/160/320/640$ samples (18.12 $\mu$ s)   |  |  |
| <b>Carrier Spacing</b>        | 4.3125kHz                                     |  |  |
| Signal Bandwidth              | 12MHz                                         |  |  |
| Constellation                 | $BPSK~2^{15}QAM$                              |  |  |
| <b>Symbol Rate</b>            | 4kHz                                          |  |  |
| Duplex Method                 | <b>FDD</b>                                    |  |  |
| Data Rate                     | Asymmetric:<br>13~52Mbps (Downstream)         |  |  |
|                               | 1.6~6.4Mbps (Upstream)                        |  |  |
|                               | $13 - 26Mbps$<br>Symmetric:                   |  |  |
| Forward Error Correction      | Reed-Solomon Code                             |  |  |
|                               | Supporting $(n,k) = (144,128)$ or $(240,224)$ |  |  |
| <b>Required BER</b>           | ${<}10^{-7}$                                  |  |  |

Table 3.1 Parameters and requirements for ANSI VDSL standard

The cyclic extension duration is  $18.12 \mu s$ , which is long enough to combat ISI in subscriber lines less than 1500 meters. The duplex scheme is chosen as frequency-division duplex (FDD), which effectively eliminates NEXT. ANSI Band allocation plan for downstream and upstream band is illustrated in Figure 3.1.

|       |      | $\left 1^{st}$ Downstream $\right 1^{st}$ Upstream $\left 2^{nd}$ Downstream $\right 2^{nd}$ Upstream |     |                      |
|-------|------|-------------------------------------------------------------------------------------------------------|-----|----------------------|
| 0.138 | 3.75 |                                                                                                       | 8.5 | $12 \, (\text{MHz})$ |

Figure 3.1 Band allocation plan of ANSI DMT VDSL

# 3.2 Link Activation Procedures and Frame Formats [4]

The VDSL link activation process shall establish a VDSL link with required transmission parameters between the VDSL transceiver units of central office side (VTU-O) and remote side (VTU-R). Either VTU-R or VTU-O shall be able to initiate the activation process. An overview of the initialization procedure is shown in Figure 3.2. Following the initial handshake procedure, a full duplex link between VTU-O and VTU-R is established. During the training phase, timing errors are estimated. During the channel analysis and exchange state, the two modems measure the characteristics of the channel and agree on a contract that thoroughly defines the communication link.

VTU-O:

| Training<br>Handshake Procedures |          | Channel analysis & Exchange |  |  |  |  |  |
|----------------------------------|----------|-----------------------------|--|--|--|--|--|
| $VTU-R:$                         |          |                             |  |  |  |  |  |
| Handshake Procedures             | Training | Channel analysis & Exchange |  |  |  |  |  |
|                                  |          | Time                        |  |  |  |  |  |

Figure 3.2 Overview of DMT VDSL initialization procedures

The transition between states are made following completion of the current state rather than at fixed times. The whole initialization procedure should be completed within  $0.2 \sim 1.0$  second.

### 3.2.1 Handshake Procedures

Based on ITU-T Recommendation G.994.1 (G.hs), the following parameters should be transmitted during the handshake phase:

1) The size of the IDFT/DFT,

2) The length of cyclic extension,

3) Flags indicating the use of the optional band, 25~138 kHz.

All carrier sets defined in G.994.1 are simultaneously modulated with the same data **SERRA** bits using differentially encoded binary Phase Shift Keying (DPSK). After the handshake phase, the VTU-O will initiate the start of the training phase.

## 3.2.2 Training State





An O-P-TRAINING symbol is made of all allowed downstream tones modulated in 4QAM. Odd-indexed tones of an O-P-TRAINING type symbol are used to carry information about PSD mask, frequency band allocation and location of RFI bands. Only even-indexed tones of O-P-TRAINING are modulated with known data. Therefore, timing information for acquisition must be extracted from even-indexed tones. Also, the measurement of channel attenuation should be performed on even-indexed tones. By interpolation, coarse channel estimation on odd-indexed tones would be easily obtained.

An O-P-SYNCHRO symbol is also made of all allowed downstream tones modulated

in 4QAM. The overall duration of O-P-SYNCHRO is 15 symbols. After the 15 O-P-SYNCHRO symbols, system would step into the Channel Analysis & Exchange State. Since the values modulated on all allowed downstream tones are known, they can be used to start fine channel estimation.

## 3.2.3 Channel Analysis & Exchange State

Channel estimation, SNR estimation and calculation of bit-loading are accomplished during this state. VTU-O will transmit O-P-MEDLEY type symbols and O-P-SYNCHRO type symbols. Figure 3.4 shows the timing of messages and symbol types during this state. Definitions of these messages are specified in the standard. An O-P-MEDLEY symbol is made of all allowed downstream tones modulated in 4QAM. Most tones carry information about bit allocation, noise margin and some other settings, except those tones with tone indexes equaling to multiples of five. Tones with tone indexes equaling to multiples of five carry some known data. After the 15 O-P-SYNCHRO symbols, system would step into the ShowTime State. Actual data transmission or reception starts during this state.



# 3.3.1 Transmitter

During initialization phase, a training sequence is generated by a pseudo-random bit generator and a rotator. Whenever the training procedure is done, transmitted data bits should be scrambled, encoded and then interleaved before they are mapped to those upstream tones. The pseudo-random bit generator and rotator are used to avoid high peak-to-average ratio (PAR) problem [2]. The scrambler is used to randomize the transmitted bit stream and remove contiguous strings of (0,1)s. Reed-Solomon (RS) code is chosen for forward-error-correction (FEC). By rearranging the coded bytes, the interleaver disperses burst errors among different RS codewords and improves burst error correction capacity. The transmitter architecture is shown in Figure 3.5.



Figure 3.5 Transmitter architecture of a VDSL system

### 3.3.2 Receiver

The receiver suffers from channel impairments and timing errors. The whole receiver behavior and the initialization procedure include five tasks: symbol synchronization, sampling clock acquisition, channel estimation, sampling clock tracking and channel tracking. Symbol boundary estimator is responsible for locating the DFT window. Timing error detector estimates the sampling clock offset and feeds it to a loop filter to obtain an estimate that is less noisy. Utilizing the filtered timing error estimate, timing error correction is then performed. Channel impairments should be estimated and compensated by the frequency domain equalizer (FEQ). The receiver architecture is shown in Figure 3.6. Details about synchronization and channel analysis are described in the following sections.



Figure 3.6 Receiver architecture of a VDSL system

### 3.3.2.1 Symbol Synchronization

During initialization, the first task is symbol synchronization. A proper selection of the DFT window has a great performance impact on all post DFT operations. Since

there is no pre-FFT training data in the VDSL system, a non-data-aided algorithm (NDA) should be used. Based on the cyclic-extension nature of DMT symbols, a simple correlator-based approach is considered [6,7]. The correlator output can be formulated in Eq. (3.1).

$$
Corr(i) = \sum_{m=0}^{Ng-1} r^{*}(i+m-N)r(i+m)
$$
\n(3.1)

$$
\hat{i}_{boundary} = \arg\max_{i} Corr(i)
$$
\n(3.2)

where r(.) is the time-domain received signal, *N* is the block size of FFT and Ng is the cyclic extension length. The estimate of symbol boundary  $\hat{i}_{boundary}$  is described by Eq. (3.2). With large sampling frequency offset, the detection might fail. A possible solution is to implement three correlators with delays of *N*-1, *N* and *N*+1 [7]. The symbol-timing estimate is obtained by a maximum search over (*N*+*Ng*) consecutive correlator outputs. If the maximum correlation value is founded at the output of  $(N-1)$ -delay or  $(N+1)$ -delay correlator, the sampling frequency offset is still too large. This detection mechanism should be repeated until the maximum correlation is founded at the output of *N*-delay correlator and sufficient accuracy is achieved.

The hardware complexity of a symbol boundary detector is dominated by the conjugate complex multiplications, registers (or memory) storing entire DMT symbol and the maximum correlation search. Necessary word length for multiplications should be carefully chosen to reduce the storage size for received signal and the complexity of complex multipliers. Since correlations are calculated by complex multiplications and summations over a *Ng*-point sliding window (see Eq. (3.1)), we can merely subtract some old product from the Corr(i) and then add the new one to get the Corr( $i+1$ ), as shown in Eq. (3.3).

$$
Corr(i + 1) = \sum_{m=0}^{Ng-1} r^*(i + 1) + m - N)r((i + 1) + m)
$$
  
\n
$$
= \sum_{m'=1}^{Ng} r^*(i + m' - N)r(i + m')
$$
  
\n
$$
= \left\{ \sum_{m'=0}^{Ng-1} r^*(i + m' - N)r(i + m') \right\} - r^*(i - N)r(i) + r^*(i + 1 + Ng - N)r(i + 1 + Ng)
$$
  
\n(3.3)

Therefore, only one complex multiplication and two additions are needed per sample at the cost of storage for Ng old products. For finding the maximum correlation output and clock timing offset, we can on-line compare the three correlator outputs with a given threshold. Then we compare the maximum one of those exceeding the threshold with a few maximum values of all obtained from the previous correlation outputs up to now. If the current maximum value is larger than one of those maximum values, then it replaces the smallest one of those maximum values. Note that at the same time we have to update the smallest one of those maximum values. The same process is repeated for each of the rest of correlation operations. Doing so has the advantages of small memory size requirement for the storage of intermediate maximum values and low computational complexity for searching and comparing the maximum values. If the maximum correlation value comes from the (N+1)-correlator or the (N-1)-correlator, it means the clock offset is large and symbol boundary tracking is needed. The corresponding block diagram of a symbol boundary detector is shown in Figure 3.7. Figure 3.8 shows the outputs of three correlators versus sample index. The incorrect symbol synchronization will cause phase shifts on all sub-carriers after FFT. Figure 3.9 shows phase rotations versus tone index when the detected boundary exceeds the exact position by one or two samples. This phenomenon of the phase shift could be recovered by FEQ.



[Note] The *Corr.* blocks perform the operations defined by Eq. (3.3)

Figure 3.7 A symbol boundary detector under large clock offset



Figure 3.8 Outputs of the N-delay, (N-1)-delay and (N+1)-delay correlators



Figure 3.9 Phase rotations versus tone index under incorrect symbol synchronization

3.3.2.2 Sampling Timing Acquisition

Sampling clock offset results in a slow shift of sampling instants, which rotates data carried by sub-carriers. Details of the effect will be shown in Section 4.1. The n<sup>th</sup> received symbol with clock offset ∆*t* can be simplified as:

$$
R_{n,k} = H_k X_{n,k} e^{j2\pi kt_{\Delta} n} \frac{N + Ng}{N} + N_{n,k}
$$
 (3.4)

where  $T$  is the sampling period of transmitter side,  $T'$  is the sampling period of receiver side and  $t_{\Delta}$  equals to  $\frac{T'-T}{T}$ . The method adopted here for acquisition uses the phase difference between two consecutive training symbols [8]. In Eq. (3.5), by conjugating the  $(n-1)$ <sup>th</sup> symbol and multiply it with the  $n<sup>th</sup>$  symbol, the channel phase will be eliminated and the phase difference can be calculated. With the information of the phase difference, the frequency offset can be estimated using the information with various feasible ways. Eq. (3.6) shows a simple estimation scheme.

$$
Z_{n,k} = R_{n,k} R^{*}_{n-1,k}
$$
  
\n
$$
= H_{k} X_{n,k} e^{j2\pi kt_{\Delta} n} \frac{N + Ng}{N} \left( H_{k} X_{n-1,k} e^{j2\pi kt_{\Delta} (n-1) \frac{N + Ng}{N}} \right)^{*}
$$
  
\n
$$
= |H_{k}|^{2} |X_{n,k}|^{2} e^{j(\phi_{n,k} + 2\pi kt_{\Delta} n} \frac{N + Ng}{N} e^{-j(\phi_{n-1,k} + 2\pi kt_{\Delta} (n-1) \frac{N + Ng}{N})}
$$
(3.5)  
\n
$$
= |H_{k}|^{2} |X_{n,k}|^{2} e^{j(\phi_{n,k} - \phi_{n-1,k} + 2\pi kt_{\Delta} \frac{N + Ng}{N})}
$$
  
\n
$$
\hat{t}_{\Delta} = \underset{k_{1},k_{2} \text{ is the same as } k \neq k_{2}}{\operatorname{average}} \left[ \frac{N}{2\pi (k_{2} - k_{1}) (N + Ng)} ((\angle Z_{n,k_{2}} - \phi_{n,k_{2}} + \phi_{n-1,k_{2}}) - (\angle Z_{n,k_{1}} - \phi_{n,k_{1}} + \phi_{n-1,k_{1}})) \right] (3.6)
$$

where  $X_{n,k} = |X_{n,k}| e^{j\phi_{n,k}}$ ,  $X_{n-1,k} = |X_{n-1,k}| e^{j\phi_{n-1,k}}$  and  $|X_{n-1,k}| = |X_{n,k}|$ . During the training state, there are O-P-TRAINING type symbols for timing offset estimation and compensation. Note that only even-indexed tones of O-P-TRAINING symbols معتقلتند can be used for sampling frequency offset estimation, because only even-indexed tones are carrying known constellation points for each O-P-TRAINING symbol. With the knowledge of clock offset estimate  $\hat{t}_\text{A}$ , one may try to correct the sampling timing via a feedback control loop, which is called "Phase-locked loop (PLL)". By adjusting the loop parameters, such as "loop order" or "loop bandwidth", we can achieve stable conditions for the feedback control loop. The feedback control loop is illustrated in Figure 3.10.



Figure 3.10 Architecture for sampling timing acquisition

The task of acquisition is to lock the clock offset within a small range, as shown in Figure 3.11. The loop filter provides a "cleaner" clock offset estimate  $\tilde{t}_\Delta$  through low-pass filtering. Here we use a second-order loop filter [9,10] to provide a sufficient narrow bandwidth and a fast settling time. As shown in Figure 3.12, a  $2<sup>nd</sup>$  –order loop filter consists of a  $1<sup>st</sup>$  –order low-pass filter and an integrator. As suggested in [8], timing acquisition should be done in 0.2 seconds. Therefore, the designed PLL must settle down in hundreds of symbols. As in Figure 3.11, the residual clock offset approaches to zero in 200 symbols. After timing acquisition is done, channel estimation starts.



Figure 3.11 Residual clock offset in the acquisition phase and tracking phase



Figure 3.12 A  $2<sup>nd</sup>$ -order loop filter

### 3.3.2.3 Channel Estimation

Frequency-domain equalization is an important benefit of DMT system when comparing with a single carrier system. In the VDSL system, time-domain equalizer (TEQ) is not needed when the guard interval is longer than the channel length. Therefore, only frequency-domain equalizer (FEQ) is adopted to combat the channel distortion. After the synchronization training, coarse channel estimation is started with O-P-TRAINING symbols in the end of training state.

By using the pilot tones, an inverse channel response could be estimated with the received signal  $R_k$  in frequency domain. Since the residual clock offset becomes very small after sample timing acquisition, the received signal  $R_k$  becomes

$$
R_k = H_k X_k + N_k \tag{3.7}
$$

where  $X_k$  is the known  $k^{th}$  subcarrier signal,  $N_k$  is the noise factor on the  $k^{th}$  subcarrier. The coarse inverse channel frequency response can be estimated by

$$
\hat{G}_k = \frac{1}{H_k} = \frac{X_k R_k^*}{|R_k|^2}
$$
\n(3.8)

However, since the pilot tones are only available in the even tones, the receiver has to

perform interpolations and get channel information on the odd tones. To get accurate channel estimation, channel estimation can be obtained by utilizing the decision-directed data of O-P-MEDLEY symbols in the channel analysis & exchange state. An effective adaptive FEQ is adopted in the receiver architecture. The coefficient updating procedure is based on the least mean square (LMS) algorithm, which is given by

$$
\hat{G}_{n+1,k} = \hat{G}_{n,k} + \mu \cdot R_{n,k} \cdot e_{n,k}^* \tag{3.9}
$$

where  $e_{n,k}$  is error signal between the equalized signal  $\hat{X}_{n,k}$  and the decided signal  $\widetilde{X}_{n,k}$ . A simple block diagram of LMS adaptive FEQ is shown in Figure 3.13.





Figure 3.13 A simple block diagram of LMS adaptive FEQ

If the initial estimation of  $\hat{G}_k$  by division is omitted, the division circuit can be

eliminated. However, the LMS algorithm will converge slowly. Here a coarse inverse channel frequency response is estimated, and then more accurate coefficients are updated by LMS algorithm to ensure the convergence speed and accuracy. Figure 3.14 shows a constellation chart of 1024-QAM DMT signals, which is equalized by a LMS adaptive FEQ.



Figure 3.14 The equalized 1024-QAM DMT signal

## 3.3.2.4 Sampling Timing Tracking

As soon as the training of FEQ is completed, actual data transmission starts. Sampling timing error should be tracked during data transmission. Since the residual timing error is very small, decision-feedback scheme is applicable for timing error tracking [8], which is shown in Figure 3.15.



Figure 3.15 Receiver architecture for the tracking of sampling timing

