# Design and Implementation of Low-Power ANSI S1.11 Filter Bank for Digital Hearing Aids

Yu-Ting Kuo, Tay-Jyi Lin, Member, IEEE, Yueh-Tai Li, and Chih-Wei Liu

Abstract—Due to well matching the frequency characteristics of human ears, ANSI S1.11 1/3-octave filter bank is popular in acoustic applications, such as acoustic analyzers and equalizers. It is also desirable in hearing aids because the famous hearing aid prescription formula, NAL-NL1, prescribes its gains at ANSI 1/3-octave frequencies. However, the high computation complexity limits its usage, in which the power consumption is a critical concern. To address this issue, a low-power design and implementation of ANSI S1.11 filter bank for digital hearing aids is present. We first develop the complexity-effective multirate FIR filter bank algorithm. And, a systematic coefficient design flow is elaborated for the proposed filter bank to minimize the order of the FIR filter thereof. In an 18-band digital hearing aid with 24-kHz sampling rate, the proposed algorithm saves about 96% of multiplications and additions, comparing that with a straightforward FIR filter bank. Moreover, various low-power VLSI design techniques are investigated in detail and applied on our design. The proposed complexity-effective ANSI S1.11 FIR filter bank has been implemented in the TSMC 0.13-µm CMOS technology with an area-efficient architecture. The test chip consumes only 87  $\mu$ W, which is 30%–79% of that of the others available in the literature. The proposed lowpower ANSI 1/3-octave bank makes itself being able to precisely apply the prescribed gains obtained by NAL-NL1 prescription formula for hearing-impaired people.

Index Terms—Hearing aid, filter bank, low power.

## I. INTRODUCTION

**H** EARING AIDS [1] compensate the hearing loss with the auditory compensation algorithm and improve the speech intelligibility with the echo cancellation, the noise reduction, and the speech enhancement algorithms. The block diagram of an advanced digital hearing aid is illustrated in Fig. 1, which comprises the four mentioned function blocks. The auditory compensation algorithm makes up for the perceptual distortion, such as the raised hearing thresholds and the squeezed hearing dynamic ranges, by performing the frequency-dependent and nonlinear amplification on the input sound. As shown

Manuscript received January 12, 2009; revised May 28, 2009 and August 03, 2009; accepted August 27, 2009. Date of publication December 18, 2009; date of current version July 16, 2010. This work was supported in part by the Nation Science Council, Taiwan, under Grant NSC97-2220-E-009-021 and by a grant of making the chip in the National Chip Implementation Center (CIC), Taiwan. This paper was recommended by Associate Editor Y. Lian.

Y.-T. Kuo and C.-W. Liu are with the Department of Electronics Engineering, National Chiao Tung University, Hsinchu, Taiwan 300. (e-mail: cwliu@twins.ee.nctu.edu.tw).

T.-J. Lin was with the Department of Electronics Engineering, National Chiao Tung University, Hsinchu, Taiwan. He is now with the Microelectronics and Information Systems Research Center, National Chiao Tung University, Hsinchu 300, Taiwan 300.

Y.-T. Li was with the Department of Electronics Engineering, National Chiao Tung University, Hsinchu, Taiwan. He is now with MediaTek Inc., Hsinchu, Taiwan 300.

Digital Object Identifier 10.1109/TCSI.2009.2033539

in Fig. 1, the filter bank decomposes the input signal into different bands so that the prescribed insertion gains can be applied to compensate the raised hearing thresholds. Then, the compressor is applied to reduce the signal's dynamic range for fitting the output sound into the diminished and limited hearing range of the hearing-impaired person. Since the degree of hearing loss varies a lot for different frequencies, the multi-channel compressor, as shown in Fig. 1, is recommended. Modern hearing aids typically have a filter bank with more than 16 bands and a dynamic range compressor with three or four channels.

The auditory compensation, especially the filter bank, is the most important, but power-hungry, function in digital hearing aids because it makes the sound audible for hearing-impaired people. To the best of our knowledge, designing the filter bank for the hearing instruments in the literature can be classified into two categories: uniform filter banks [2], [3] and nonuniform filter banks [4]–[9]. A 32-band discrete Fourier transform (DFT) filter bank is designed in [2], whereas an 8-band filter bank with equal bandwidth finite-impulse response (FIR) filters is implemented in [3]. The main drawback of the uniform filter banks is that they would need more computation complexities to meet the non-uniform frequency resolution requirement of human hearing system. Consequently, the nonuniform filter banks, which can go a step further to be classified into octaveband [4]–[6], critical-band [7], and 1/3-octave-band [8]–[10] filter banks, are more suitable.

A 7-band octave filter bank has been designed in [4] and [5], respectively, by using the interpolated FIR (IFIR) filter techniques. On the other hand, Lian and Wei [6] proposed an 8-band octave filter bank with the IFIR and frequency-response masking (FRM) techniques to reduce the computational complexity. In general, the 8-band filter bank can make good compensation for typical presbycusis hearing loss, whose characteristic is flatter. But more and more people have the noise-induced hearing losses, which may have great losses in some narrow frequency ranges. This kind of hearing losses needs more bands to compensate. A 16-band critical-like filter bank is designed in [7]. The critical bands match the human hearing perception; however, the irregular property of the critical bands makes implementation difficult. The design in [7], for example, develops the 16 critical bands by applying 110-tap FIR filters, which have the significant computation complexity.

Even though the critical-band filter bank is good for hearing aids, the desired filter bank should be designed such that each insertion gain obtained by hearing aid prescription formulas can be applied on it accordingly. The well-known hearing aid prescription formula, NAL-NL1 [11], for example, prescribes its insertion gains on 1/3-octave frequencies defined by ANSI S1.11 standard [10]; for Chinese, Chang *et al.*[12] have designed a Mandarin-specific hearing aid prescription formula,



Fig. 1. Functional block diagram of the advanced hearing aid.

which prescribes gains also on the ANSI S1.11 1/3-octave frequencies. Consequently, the design of ANSI S1.11 filter bank will be necessary for the auditory compensation algorithms that are based on these hearing aid prescription formulas.

Most existing ANSI S1.11 filter banks are implemented by infinite-impulse response (IIR) filters [8], [9]. Indeed, the researches in psychoacoustics had shown that human ear is not sensitive to phase-distortion. The filter bank with IIR filters may be a good design with low computation complexity; however, FIR filters are still preferred and adopted in [3]–[7], not only for their linear phase but also for the stability and regular structure. The round-off error of FIR filters is easier to analyze and control, which allows designers use simpler hardware [13], [14]. Although normal human ears are not sensitive to phase-delay, to design the filter bank of the auditory compensation with exact linear phase still has some advantages regarding the development of advanced hearing aids, which not only target at processing speech but also music signals. It has been shown that listeners are sensitive to phase relationships, particularly for tones of low frequency and rich harmonic structure, and hence, the phase has important effects on the perception of music signals [15]. Furthermore, the linear-phase property can make the processed sound more similar to the sound that users were used to. This could improve the speech intelligibility. In addition, preserving the phase cues helps the bilateral hearing aid users' sound localization ability [16] and is also important for acoustic cancellation algorithms [7].

Recently, an efficient method for the design of FIR FRM filters with reduced pass-band group delays is proposed in [17]. By incorporating the group delay constraint into the overall design, where the group delay of the filter and its gradient are strongly related to filter coefficients, the optimum FRM filters with minimum pass-band group delay error can be obtained [17]. Besides, a straightforward approach to design FIR filters with piecewise-polynomial impulse responses is proposed in [18]. Compared to IFIR filter designs, the algorithm would have slightly less computation complexity for some cases. But the hardware cost and computation complexity increase greatly if the polynomial degree is higher than four [18]. Besides, due to the complicated design procedures of these two FIR filter design method, it is difficult to apply them for designing a low-complexity ANSI S1.11 filter bank. In order to reduce the power and overall hardware cost of the ANSI S1.11 1/3-octave filter bank, a multi-stage, multirate FIR filter bank design is proposed [19].

In this paper, not only the algorithm development but also the architecture design as well as the silicon implementation of the ultra-low power ANSI S1.11 1/3-octave filter bank is present. Several low-power VLSI techniques are evaluated exhaustively and then applied on our design. The proposed filter bank has been fabricated in TSMC 0.13-µm CMOS technology and the test chip consumes only 87  $\mu$ W, which shows that the proposed design is more energy-efficient than other filter banks for hearing aids in the literature. The rest of this paper is organized as follows. For readers' reference, the ANSI S1.11 standard will be briefly introduced in Section II. Section III demonstrates the proposed multirate algorithm for the efficient design of ANSI S1.11 1/3-octave filter bank. Then, Section IV presents a systematic design flow for determining the filter coefficients with minimized filter orders to meet the ANSI S1.11 standard. The hardware architecture and several applicable lowpower VLSI techniques are considered in Section V. Finally, Section VI summaries the silicon implementation results and, some concluding remarks can be found in Section VII.

### II. ANSI S1.11 STANDARD

The ANSI S1.11 standard [10] defines 43 1/3-octave bands covering the frequency range of 0–20 kHz. Each 1/3-octave band is specified by its mid-band frequency  $f_m$  (or central frequency) and the bandwidth  $\Delta f$ . The mid-band frequency of the *n*th band, denoted by  $f_m(n)$ , is defined by

$$f_m(n) = 2^{\left(\frac{n-30}{3}\right)} \times f_r \tag{1}$$

where  $f_r$ , the reference frequency, is set to 1 kHz. For example, the mid-band frequency of the 22nd 1/3-octave band  $f_m(22)$  is 157 Hz and, the mid-band frequency of the 39th band  $f_m(39)$ is 8 kHz. With the mid-band frequency  $f_m(n)$ , two band-edge frequencies  $f_1(n)$  and  $f_2(n)$  of the *n*th band are determined by

$$f_1(n) = f_m(n) \times 2^{\frac{-1}{6}}$$
, and  $f_2(n) = f_m(n) \times 2^{\frac{1}{6}}$ .

Then, the bandwidth of the n th band can be calculated by

$$\Delta f(n) = f_2(n) - f_1(n).$$



Fig. 2. ANSI S1.11 class-2 filter specification [10].

In the ANSI S1.11 standard, three classes of filters are described, i.e., class-0, class-1, and class-2. For each class of filters, some parameters regarding performance requirements are concerned, which include the relative attenuation, the linear operating range, the environment sensitivities (e.g., humidity and temperature), the maximum output signal, the terminating impedance, and so on. Fig. 2 illustrates the ANSI S1.11 class-2 filter specification on the n th 1/3-octave band, where  $M_n(\omega)$ and  $m_n(\omega)$  denote the limits on the minimum and maximum attenuations of the *n*th band filter, respectively. As shown in Fig. 2, the passband ripple is allowed to be less than or equal to 1 dB, while the filter should have at least 60dB attenuation at frequencies smaller than  $f_1'(=f_{\rm m} \times 0.184)$  and at frequencies greater than  $f'_2(=f_m \times 5.434)$ . The relative attenuation is an important design parameter from the hardware implementation point of view, because it strongly relates to the filter order.

Before leaving this section, we note that the specifications in either class-0 or class-1 filters are stricter than that of class-2 filters. However, the ANSI S1.11 class-2 1/3-octave band is chosen in our design because it has the comparable stopband attenuation requirement compared with other filter banks for hearing aids [3], [7]. On the other hand, the sampling rate is set to 24 kHz in order to provide the good sound quality.

#### III. MULTIRATE ANSI S1.11 FILTER BANK

In this section, we present the complexity-effective design of an 18-band ANSI S1.11 filter bank, which implements the 22nd-39th 1/3-octave bands (denoted by  $F_{22} \sim F_{39}$ ) for hearing aids. Fig. 3(a) illustrates the desired auditory compensation system based on this filter bank, in which the symbols x and y stand for the input and output sequences, respectively. The 18 bands  $F_{22} \sim F_{39}$  cover six octaves, which are indexed by k = 1, 2, ..., and 6, starting from high frequencies toward low frequencies. With the filter bank, the input x(n) is first decomposed into 18 frequency-selected outputs,  $y_{22} \sim y_{39}$ , as shown in Fig. 3(a). They are separately amplified by the prescribed insertion gains and then processed by the compressor. Finally, the outputs  $y'_{22} \sim y'_{39}$  will be combined to form the output sequence y(n).

For simplicity, if the filter  $F_n$ ,  $22 \le n \le 39$ , is ideal, then we have the following magnitude response

$$|F_n(e^{j\omega})| = \begin{cases} 1, & f_1(n) \le \omega \le f_2(n) \\ 0, & \text{otherwise} \end{cases}$$



Fig. 3. Derivation of the proposed multirate 18-band ANSI S1.11 1/3-octave filter bank for auditory compensation. (a) 18 parallel filters. (b) Multirate filter bank of 18 filters. (c) Reduced synthesis bank.

where  $f_1(n)$  and  $f_2(n)$  are the upper and lower band-edge frequencies of the *n*th 1/3-octave band as described in Section II. According to the specification of ANSI S1.11 1/3-octave bands, we conclude that  $f_2(36)$ , the highest frequency in the 2nd octave, is below  $\pi/2$  when the sampling rate is set to 24 kHz. Since each 1/3-octave band in the 2nd octave is band-limited, we can down-sample the 2nd octave by 2 to reduce the computation complexity. Similarly, the highest frequency  $f_2(33)$  in



Fig. 4. Illustrations of IFIR implementation and noble identity.

the 3rd octave is below  $\pi/4$ , and so on. In other words, the bandwidth of the *k*th octave only covers the frequencies below  $\pi/2^{(k-1)}$ . Hence, the three 1/3-octave bands in the *k*th octave can be down-sampled by  $2^{(k-1)}$ , as shown in Fig. 3(b). By the theory of multirate systems [20], a synthesis bank with up-samplers and interpolation filters are necessary for reconstructing the signals.

The interpolation filters, i.e.,  $F'_{22} \sim F'_{36}$  in Fig. 3(b), will filter out all imaging distortions due to up-sampling. Recall that the bandwidths of the three 1/3-octave bands in the *k*th octave cover frequencies below  $\pi/2^{(k-1)}$ . This fact implies that their imaging distortions only appear at frequencies higher than  $\pi/2^{(k-1)}$ . Then, an ideal low-pass filter  $I_k$ , whose response is

$$|I_k(e^{j\omega})| = \begin{cases} 1, & 0, \le \omega \le \pi/2^{(k-1)}; \\ 0, & \text{otherwise} \end{cases}$$
(2)

can be applied to remove the imaging distortions after up-sampling. Consequently, one can simplify the synthesis bank in Fig. 3(b) by replacing the three interpolation filters in the *k*th octave with an ideal low-pass filter  $I_k$ , as shown in Fig. 3(c). The multirate filter bank in Fig. 3(c) saves computations by reducing the sampling rates on the band-limited bands.

Now, let's consider the 18 1/3-octave filters, i.e.,  $F_{22}-F_{39}$ , in which  $F_{22}, F_{23}$ , and  $F_{24}$  are very narrow-band filters. For reducing the computation complexity, the interpolated FIR filter technique is applied in the following. Assume that the filter Dis an ideal low-pass filter with the response

$$|D(e^{j\omega})| = \begin{cases} 1, & 0, \le \omega \le \pi/2; \\ 0, & \text{otherwise.} \end{cases}$$
(3)

Hereafter, the z-transform is used for convenience. From (1) and (3), we have

$$|F_n(z)| = |D(z)F_{n+3}(z^2)|.$$
(4)

Then, by applying (4) iteratively, the filter  $F_{22}$  can be determined by

$$F_{22}(z) = D(z)D(z^2)\dots D(z^{16})F_{37}(z^{32}).$$
 (5)

Similarly, assuming the filter I to be an ideal low-pass filter with the same response in (3), we can implement the filters  $I_2 \sim I_6$  in (2) by cascading the filter I. That is,

$$I_{2}(z) = I(z)$$

$$I_{3}(z) = I(z)I(z^{2})$$

$$I_{4}(z) = I(z)I(z^{2})I(z^{4})$$

$$I_{5}(z) = I(z)I(z^{2})I(z^{4})I(z^{8})$$

$$I_{6}(z) = I(z)I(z^{2})I(z^{4})I(z^{8})I(z^{16})$$
(6)

Consequently, with (5) and (6), by the IFIR techniques and the noble identity [20], we can implement the filters  $F_{22}$ – $F_{36}$  and  $I_2$ – $I_6$  in Fig. 3(c) with filters  $F_{37}$ ,  $F_{38}$ ,  $F_{39}$ , D, and I, as demonstrated in Fig. 4. Moreover, we can restructure the filter bank in Fig. 3(c) to a complexity-effective multirate filter bank, which is displayed in Fig. 5. To be brief, the proposed filter bank comprises an analysis bank and a synthesis bank, where the former contains filters  $F_{37}$ ,  $F_{38}$ ,  $F_{39}$ , and the decimation filter D, while the latter contains the interpolation filter I. With the developed multirate filter bank architecture, the hardware complexity is greatly reduced because we do not suffer from implementing those narrow-band filters, i.e.,  $F_{22}$ ,  $F_{23}$ , and  $F_{24}$ .

## IV. DESIGN & OPTIMIZATION OF FILTER COEFFICIENTS

To realize the multirate filter bank in Fig. 5, the orders of filters  $F_{37}, F_{38}, F_{39}, D$ , and I should be determined as small as possible for reducing the computation complexity. Here we use  $H_n$  to denote the frequency response of the desired nth



Fig. 5. 18-band multirate auditory compensation system. (a) Analysis bank.(b) Dynamic range compressor and synthesis bank.

class-2 ANSI S1.11 1/3-octave band. To meet the specification, as shown in Fig. 2, the filter design constraint is

$$m_n(\omega) \le |H_n(e^{j\omega})| \le M_n(\omega), \quad \text{for } n = 22, 23, \dots, 39.$$

For simplicity, we apply the Parks-McClellan algorithm [21] to design the coefficients of FIR filters.

Due to the complicated multirate structure, it is difficult to design these filters' coefficients at the same time indeed. The most arduous part is how to determine the filters' passband ripple, since both filters D and I as well as filters  $F_{37}$ ,  $F_{38}$ , and  $F_{39}$ would influence the 18 bands' frequency responses with each other.

An efficient 2-step filter coefficient design flow, which is outlined in Fig. 6, is proposed to design and optimize the filter coefficients. In the first step, as shown in Fig. 6(a), we consider the design of filters  $F_{37}$ ,  $F_{38}$ , and  $F_{39}$  as well as the design of filters D and I independently. That is, when designing filters  $F_{37}$ ,  $F_{38}$ , and  $F_{39}$ , we assume filters D and I are  $Y_{36}$  ideal, and vice versa. The proposed algorithm explores the feasible solutions to make the filter orders as low as possible. After that, the determination of the filters' passband ripples will be considered in the second step, i.e., Fig. 6(b). Each filter's ripple has to be refined and adjusted to make all 18 bands' frequency responses meet the ANSI S1.11 specification. The following details each step of the proposed algorithm.

# A. Step1a: Design $F_{37}$ , $F_{38}$ , and $F_{39}$

Consider the design parameters rp, rs, fs1, fs2, fp1, and fp2, respectively, of the band-pass filter, as defined in Fig. 7(a).

According to the ANSI S1.11 specification, we set rp = 1dB and rs = 60 dB. Then, we need to find the values of fs1, fs2, fp1, and fp2 for filter  $F_{37}, F_{38}$ , and  $F_{39}$ , respectively, which however, cannot be straightforwardly derived because the ANSI S1.11 standard also describes the required attenuation in the transition regions (i.e.,  $f'_1 < f < f_1$  and  $f_2 < f < f'_2$ ). Instead, we consider the following two transition bandwidths, defined by TBW<sub>1</sub>  $\equiv fp1 - fs1$  and TBW<sub>2</sub>  $\equiv fs2 - fp2$  for each filter  $F_{37}, F_{38}$ , and  $F_{39}$ .

The estimation of  $TBW_1$  and  $TBW_2$  of each band-pass filter is conducted as follows. Note that to maximize  $TBW_1$ (and  $TBW_2$ ) as best as possible leads to minimizing the filter order thereof. For simplicity, we first decompose the ANSI S1.11 class-2 band-pass filter specification into the low-pass and high-pass specifications, respectively, as illustrated in Fig. 8. We then determine the value of  $TBW_1$  by designing a high-pass filter satisfying the high-pass specification with the maximal transition bandwidth. Similarly, we find the value of TBW<sub>2</sub> by designing a low-pass filter satisfying the low-pass specification with the maximal transition bandwidth. Given these estimated values of  $TBW_1$  and  $TBW_2$ , we then check their feasibility. That is, we explore the possible values of fs1and  $fs_2$ , respectively, to find out whether, with parameters  $TBW_1$  and  $TBW_2$ , there exists feasible fs1 and fs2 that make the designed band-pass filter meets the ANSI S1.11 specification. If the pair  $(TBW_1, TBW_2)$  is not feasible, it will be decreased by 5% each time, i.e.,

 $Next(TBW_1, TBW_2) = (TBW_1 \times 0.95, TBW_2 \times 0.95)$ 

until the designed band-pass filter meets the specification.

# B. Step1b: Design D and I

Now, for filter D and I, let's consider the design parameters rp, rs, fp and fs, respectively, of the low-pass filter, as defined in Fig. 7(b). According to the IFIR implementation shown in Fig. 4(a),  $F_{36}$  is implemented by cascading  $F_{39}(z^2)$  and D(z). So we should find a filter D such that the response of  $F_{39}(z^2)D(z)$  satisfies the specification of the 36th band. Note that among the three bands in the second octave, here we only need to consider the 36th band because it would poses the strictest constraints on the filter D. Besides, other octaves will automatically meet the specification if the second octave does, due to the octave-related bandwidth of this filter bank. Hence, if the 36th band meet the specification, the filter D would also make  $F_{22}$ - $F_{35}$  meet the specifications. Here we denote the passband and stopband frequencies of the filter D as  $fp_D$  and  $fs_D$  respectively, and assume  $F_{39}$  to be an ideal bandpass filter. Then we have the following constraints according to Fig. 9:

$$fp\_D \ge f_2(36) \text{ and } fs\_D \le \pi - f_2(36)$$
 (7)

Then, we consider 36th band's output,  $y_{36}$ . By the theory of multirate systems [20], we have

$$Y_{36}(z) = \frac{1}{2} [X(z)F_{39}(z^2)D(z)I(z)] + \frac{1}{2} [X(-z)F_{39}(z^2)D(-z)I(z)].$$
(8)



Fig. 6. Filter coefficient design flow. (a) Step 1: to design  $F_{37}$ - $F_{39}$  and D&I independently. (b) Step 2: to fine-tune and to explore the optimal ripple distribution to jointly consider the  $F_{37}$ - $F_{39}$  as well as D&I.

Hence, D(-z)I(z) should be designed to suppress the distortion term, i.e., the second term of the right hand side of (8). According to the ANSI S1.11 class-2 filter specification, we conclude that |D(-z)I(z)| < -60 dB. On the other hand, the desired term  $X(z)F_{39}(z^2)D(z)I(z)$ , i.e., the first term of the right hand side of (8), is subjected to the specification of the 36th ANSI S1.11 class-2 band. Suppose that the passband frequency of the filter I is denoted by  $fp_II$ , from the above discussion, we have following constraints:

$$|D(-z)I(z)| < -60 \text{ dB} \text{ and } fp_{-}I \ge f_2(36).$$
 (9)

From (7) and (9), we can set the  $fp_D$  and  $fp_I$  to  $f_2(36)$ . But we still need to explore the values of  $fs_D$  and  $fs_I$  to make |D(-z)I(z)| < -60 dB.

We explore  $fs_D$  and  $fs_I$ 's values with the flow shown in the right part of Fig. 6(a). The stopband attenuation, i.e., rs of the filters D and I, is first set to 60.5 dB, where the 0.5 dB is the design margin for compensating the influence of  $F_{39}$ 's passband ripple. And, we tentatively set the passband ripple, i.e., rp of these two filters, to 1 dB. Then, we now want to find the values of  $fs_D$  and  $fs_I$ , respectively. The values of  $fs_D$ and  $fs_I$  should be small enough to make |D(-z)I(z)| smaller than -60 dB. On the other hand, the values of  $fs_D$  and  $fs_I$  should be as large as possible so that the orders of filters D and I are minimized. Therefore, we explore the values of  $fs\_D$  and  $fs\_I$ , repectively, by gradually decreasing their values from  $\pi - f_2(36)$  until |D(-z)I(z)| < -60 dB is satisfied, as depicted in Fig. 6(a). The final result shows that  $fs\_D = 0.54 \pi$  and  $fs\_I = 0.52 \pi$ , respectively.

# C. Step2: Ripple Adjustment

In this step, we need to fine tune the passband ripples of filters  $F_{37}$ ,  $F_{38}$ , and  $F_{39}$ , respectively, with the influences of filters D and I considered. As shown in Fig. 6(b), the ripple adjustment algorithm decreases the passband ripples of  $F_{37}$ ,  $F_{38}$  and  $F_{39}$  from 1 to 0.1 dB with 0.1-dB stepping. Then, for each possible passband ripple, we explore how small the passband ripple of filters D and I should be to make all the 18 1/3-octave bands meet the ANSI S1.11 class-2 filter specification.

Table I summarizes the exploration results of the proposed algorithm. Generally speaking, the ripples of the filters D and Iwould decrease if that of the filters  $F_{37}$ – $F_{39}$  increase. Nevertheless, if the peaks of the ripples of the filters  $F_{37}$ – $F_{39}$  are located at the positions where the filters D and I's ripples have valleys, these filters' ripples would be cancelled by each other and the overall ripple would be reduced. That is why in some cases, as



Fig. 7. Input parameters rp, rs, fs1, fs2, fp1, and fp2 for designing: (a) band-pass and (b) low-pass filters.



Fig. 8. Illustration of dividing the band-pass filter specification in Fig. 2 into the low-pass and high-pass filter specifications, respectively.

shown in Table I, both the ripples of filters  $F_{39} \sim F_{39}$  and that of filters D and I increase (e.g., 0.4 dB for  $F_{37} \sim F_{39}$ , while 0.3 dB for D & I).

We conclude from Table I that the optimal tap-lengths of filters  $F_{37}$ ,  $F_{38}$ ,  $F_{39}$ , D, and I are 41, 33, 26, 35, and 41, respectively. However, for saving the hardware complexity further, it is desirable that the tap-lengths of filters  $F_{37}$ ,  $F_{38}$ ,  $F_{39}$ , D,



Fig. 9. Suppression of the imaging distortion in the 36th band.

TABLE I EXPLORATION FOR OPTIMAL RIPPLE DISTRIBUTION

| ripple (dB)                      |       | # taps   |          |          |    | complexity |       |       |
|----------------------------------|-------|----------|----------|----------|----|------------|-------|-------|
| F <sub>37</sub> ~F <sub>39</sub> | D & I | $F_{37}$ | $F_{38}$ | $F_{39}$ | D  | Ι          | # mpy | # add |
| 0.1                              | 0.20  | 57       | 46       | 36       | 32 | 37         | 172   | 346   |
| 0.2                              | 0.30  | 56       | 45       | 36       | 32 | 37         | 170   | 342   |
| 0.3                              | 0.10  | 53       | 44       | 33       | 35 | 41         | 168   | 335   |
| 0.4                              | 0.30  | 52       | 41       | 33       | 32 | 37         | 160   | 321   |
| 0.5                              | 0.10  | 52       | 41       | 33       | 35 | 41         | 164   | 328   |
| 0.6                              | 0.10  | 48       | 38       | 33       | 35 | 41         | 156   | 314   |
| 0.7                              | 0.10  | 47       | 37       | 30       | 35 | 41         | 152   | 304   |
| 0.8                              | 0.10  | 42       | 33       | 26       | 35 | 41         | 138   | 278   |
| 0.9                              | 0.10  | 41       | 33       | 26       | 35 | 41         | 138   | 276   |
| 1.0                              | 0.02  | 40       | 33       | 25       | 41 | 48         | 142   | 285   |

and I are all odd (or all even) so that these filters' delay-lines can be shared. For this purpose, we redesign  $F_{39}$  with only odd tap-lengths allowed. Then, the tap-lengths of filters  $F_{37}, F_{38}, F_{39}, D$ , and I are 41, 33, 27, 35, and 41, respectively, in our design.

# V. LOW-POWER VLSI IMPLEMENTATION

This section considers the low-power VLSI implementation of the proposed multirate 18-band ANSI S1.11 1/3-octave filter bank. Here we only consider the analysis bank, since some applications, such as audiometers or acoustic analyzers, do not need the synthesis bank. Moreover, in the hearing aid SoC, the synthesis bank will likely be merged with the dynamic range compressors for further optimization.

# A. Folded Filter Bank Architecture

The proposed ANSI S1.11 1/3-octave filter bank contains six octaves and each octave is based on the identical set of filters  $F_{37}, F_{38}, F_{39}$ , and D, as illustrated in Fig. 5(a). For an area-efficient implementation, it is nature to fold these six octaves, as demonstrated in Fig. 10; however, this incurs the scheduling problem. A good scheduling algorithm should avoid computation conflicts (or stalls) and minimize the required storage elements. In this paper, the recursive pyramid algorithm (RPA) [22] is considered. Suppose that each sampling period is divided into two time slots. Then, by RPA, the first octave is calculated every other slot, the second octave is calculated every four slots, the third every eight slots, and so on. Table II lists the computation



Fig. 10. Area-efficient filter bank with recursive structure [19].

| Time slot | Samples being calculated                  | # octave |
|-----------|-------------------------------------------|----------|
| T=0       | $y_{37}[0], y_{38}[0], y_{39}[0], x_2[0]$ | 1        |
| T=1       | $y_{34}[0], y_{35}[0], y_{36}[0], x_3[0]$ | 2        |
| T=2       | $y_{37}[1], y_{38}[1], y_{39}[1]$         | 1        |
| T=3       | $y_{31}[0], y_{32}[0], y_{33}[0], x_4[0]$ | 3        |
| T=4       | $y_{37}[2], y_{38}[2], y_{39}[2], x_2[1]$ | 1        |
| T=5       | $y_{34}[1], y_{35}[1], y_{36}[1]$         | 2        |
| T=6       | $y_{37}[3], y_{38}[3], y_{39}[3]$         | 1        |
| T=7       | $y_{28}[0], y_{29}[0], y_{30}[0], x_5[0]$ | 4        |
| T=8       | $y_{37}[4], y_{38}[4], y_{39}[4], x_2[2]$ | 1        |
| T=9       | $y_{34}[2], y_{35}[2], y_{36}[2], x_3[1]$ | 2        |
| T=10      | $y_{37}[5], y_{38}[5], y_{39}[5]$         | 1        |
| T=11      | $y_{31}[1], y_{32}[1], y_{33}[1]$         | 3        |
| T=12      | $y_{37}[6], y_{38}[6], y_{39}[6] x_2[3]$  | 1        |
| T=13      | $y_{34}[3], y_{35}[3], y_{36}[3]$         | 2        |
|           |                                           |          |

TABLE II RPA SCHEDULING

scheduling of the folded multirate filter bank, in which the variables  $y_n$  and  $x_k$  are the output of the *n*th band and input of the *k*th octave, respectively, as illustrated in Fig. 5(a). With RPA scheduling, only *KL* storage elements are required, where *K* is the number of octaves and *L* is the number of filter taps. (Note that K = 6 and L = 41, respectively, in our design.)

Based on the RPA scheduling, we design the proposed filter bank chip with the hardware block diagram shown in Fig. 11, in which the chip is consisting of modules of the system controller, the memory controller (mem\_ctrl), the memory block, the multiply-and-accumulate (MAC) unit, the serializer, and the de-serializer. This chip is designed to operate at the 24-kHz sampling rate with 16-bit wordlength. Besides, in addition to the reset and clock signals (i.e., rst, and clk), there are 3-wire serial interfaces for input signals, i.e., sdi, sdisel, and sdiclk, as well as for output, i.e., sdo, sdosel, and sdoclk, in order to reduce the total pin count. The sdi and sdo represent the serial data bits of the input and output samples, respectively. The sdisel (or sdosel) and sdiclk (or sdoclk) pins are the select and word synchronization signals, respectively. The functionality and behavior of each module would be described in the following.

The system controller coordinates the data flow according to the RPA scheduling algorithm and handles the input and output interfacing. The memory block contains a read-only memory (i.e., the coefficient ROM) that stores the coefficients of filters  $F_{37}, F_{38}, F_{39}$ , and D, respectively, as well as a random access memory (i.e., the data RAM) that maintains the delay-line data of the six octaves. Although each octave's computation involves four filters, these four filters can share a single delay-line since all of them have odd taps. Thus, only six delay-lines need to be implemented. Besides, since the longest filer's tap-length is 41, the minimum size of the data RAM is  $6 \times 41 \times 16$  bits.

The input data of the data RAM may come from two sources: the external input sample and the output of the filter D, as shown in Fig. 10. The former is written into the delay-line of the first octave at the beginning of each sampling period, while the latter is written into the delay-line of the (k + 1)th octave when the kth octave's computation is finished. The input data selection is handled by the memory controller, which also decodes the control signals from the system controller to generate appropriate addresses, i.e., mem\_addr, and write enable signal, i.e., mem\_wen, for the memory block.

The datapath of the MAC unit is detailed in Fig. 12, which contains a multiplier, an adder, and a 'tmp' register together with four accumulators, namely  $acc_{37}$ ,  $acc_{f_{38}}$ ,  $acc_{f_{39}}$ , and acc\_d, respectively. The MAC unit performs the FIR filtering computations of the four filters, i.e.,  $F_{37}$ ,  $F_{38}$ ,  $F_{39}$ , and D, in an interleaved style. For instance, the computations of the *i*th tap of these four filters are carried out as follows. One delay-line element is first read out from the data RAM and stored in the tmp register. Then, in order to exploit the coefficient symmetry property of linear phase FIR filters, another delay-line element is read out and accumulated into the one stored in the tmp register. During the third to the sixth cycles, the *i* th coefficients of the four filters are consecutively read out from the coefficient ROM. They are then multiplied with the value stored in the tmp register and the results are, respectively, accumulated into the corresponding accumulator registers:  $acc_{37}$ ,  $acc_{538}$ ,  $acc_{539}$ , and acc\_d. Hence, after performing the computations of each tap, the four accumulators,  $acc_{37}$ ,  $acc_{538}$ ,  $acc_{539}$ , and  $acc_d$ would contain the outputs of the filters  $F_{37}, F_{38}, F_{39}$  and D, respectively.

Note that each tap's computation requires 6 cycles. And, since the longest tap-length of the filter is 41, as indicated in Section IV.C, there are at most 21 coefficients need to be multiplied through the filtering operations. Consequently, the proposed architecture requires 126 cycles for each octave's computation. On the other hand, two octaves' computations will be scheduled in each sampling period with the RPA scheduling algorithm. The clock period can then be determined by dividing the sampling period by 253, which includes 252 cycles for two octaves' computations and one more cycle for writing the external input samples into the data RAM. The timing constraint of the chip is thus equal to 163 ns and a 6.13 MHz clock is used.

## B. Low-Power Optimizations

The power consumption is indeed a critical issue for hearing aids. In this subsection, we investigate the low-power design techniques, available in the literature, for digital circuits.

1) Selective Coefficient Negation: The selective coefficient negation technique [23] is intended to reduce the power of the multiplier in the MAC unit by minimizing its switching activity, which is directly affected by the Hamming distance of the two successive inputs. Because the MAC operation can be either implemented as  $A + (B \times C)$  or  $A - (-B \times C)$ , where we denotes the addend, multiplier, and multiplicand as A, B and C,



Fig. 11. Hardware architecture of the proposed filter bank chip.



Fig. 12. Datapath of the MAC unit.

respectively, one can selectively negate the input B to have the chance of reducing the Hamming distance of the data at input B. However, this requires the MAC unit being able to perform the multiply-and-subtract operations. For each coefficient, says h, either h or -h will need to be stored in the coefficient memory, depending on which value would cause the smaller Hamming distance. The overhead is one additional bit associated with each coefficient, indicating that the coefficient is negated or not.

2) Computation Reordering: In the FIR filtering computation, we can exploit the commutative and associate properties of the summation operation to reorder the computation of the coefficient products. Thus, it is possible to reduce the Hamming distance in the multiplier's inputs through exploring the design space of the accumulation orders [23]. Besides, in our design, we can further investigate the computation order with which the four filters  $F_{37}$ ,  $F_{38}$ ,  $F_{39}$ , and D are calculated in an interleaved style. However, because allowing the coefficient products being accumulated in arbitrary orders may make the system control and address generation complicated. In this paper, only the order in which the four filters are interleaved is explored to minimize the Hamming distance in the multiplier's inputs.

3) Clock Gating: Considering the datapath of the MAC unit in Fig. 12, we can see that the four accumulators,  $acc_{137}$ ,  $acc_{138}$ ,  $acc_{139}$ , and  $acc_{14}$ , totally have  $4 \times 33 + 17 = 149$  bits. These accumulator registers consume significant power indeed. Since the register dissipates power at each transition of the clock signal, it would waste power if its value needs not to be updated. Furthermore, in our design, the computations of the four filters are processed in an interleaved manner, so each accumulator will be updated every 6 cycles. Hence, we can ideally save the accumulators' power by 83%



Fig. 13. Multi- $\mathrm{V}_\mathrm{DD}$  implementation for power reduction.

with the clock-gating technique. We note that the clock-gating technique can also reduce the power on the clock tree network [24].

4) Operand Isolation: The aforementioned optimizations mainly try to reduce the power of the MAC unit. Here we describe the operand isolation technique [24], which is adopted in our design to save the power of the memory block. Recall that the data RAM and the coefficient ROM are exclusively accessed. That is, values stored in the former can be maintained unchanged when the latter is accessed, and vice versa. Hence, the operand isolation technique can be applied, so that the data RAM dissipates no power when the coefficient ROM is accessed, and vice versa.

5) Multi- $V_{DD}$  Implementation: The last low-power optimization technique applied on our design is using multi- $V_{DD}$ implementation. As well known, to lower the supply voltage of the circuit naturally improves the power performance, since the dynamic power dissipation of the CMOS circuit is directly proportional to the square of the power supply voltage. Recently, different blocks having different voltage supplies have been integrated in SoC. High voltages can be applied to the circuits on timing critical paths to maintain system performance, while the rest of the chip runs in the lower voltage for saving power. However, this would increase power planning complexity and some voltage level shifters are necessary for the interface across different power domains.

Fig. 13 shows the multi- $V_{DD}$  implementation of our design for further power reduction. Since the reliability of memory block severely degenerates when the supply voltage decreases, we still maintain the supply voltage  $V_{DD}$ 1 for the memory block at the nominal value, while  $V_{DD}$ 2, for the other block as shown in Fig. 13, is decreased.



Fig. 14. Magnitude response of the proposed 18-band 1/3-octave filter bank.

## VI. SILICON IMPLEMENTATION RESULTS

## A. Filter Response and Computation Complexity

The frequency response of the 18-band multirate FIR filter bank is shown in Fig. 14. For comparison purpose, we have also designed the following three ANSI S1.11 1/3-octave class-2 filter banks: (1) 18 parallel IIR Butterworth filters [12]; (2) the multirate filter bank with IIR Butterworth filters [8]; (3) 18 parallel FIR filters. All filter coefficients are generated by using the MATLAB filter design toolbox (version 7.2). We note here that though Elliptic filters may have smaller orders than Butterworth filters, the difference can be neglected, since the orders of both filters are quite small. The 24-kHz sampling rate is sufficient to meet the Nyquist sampling theorem for the 22nd–39th ANSI S1.11 1/3-octave bands.

We have compared the frequency response of the narrowest band of the proposed multirate FIR filter bank with that of the other three filter banks [19]. Fig. 15 shows the magnitude responses of the four filter banks'  $F_{22}$  bands, respectively. We note that all the four filter banks meet the ANSI S1.11 class-2 specification. Moreover, the FIR-based implementations of  $F_{22}$ , as shown in Fig. 15(b) and (d), require a sharper slope in the transition region comparing that with IIR-based ones, as shown in Fig. 15(a) and (c). Because of the influences from decimation and interpolation filters, the ripple of  $F_{22}$  in Fig. 15(d) is slightly greater than that in Fig. 15(b). Besides, the image distortions, the circled part in Fig. 15(c) and (d), are present in the responses of the two multirate filter banks.

Table III summarizes the numbers of multiplications and additions required for a single input sample being processed through the four filter banks. To meet the specifications of the ANSI S1.11 standard, in our simulation results, with the parallel structure, the FIR filter bank requires filters of order up to 1488, whereas the IIR filter bank requires only an 8-order Butterworth filter for  $F_{39}$  as well as 6-order Butterworth filters for  $F_{22}$ - $F_{38}$ . To sum up the computation complexity, the straightforward parallel FIR and IIR filter banks require totally 3270 and 192 multiplications, respectively (the coefficient symmetry is considered for FIR filters). On the other hand, the filters  $F_{37}$ ,  $F_{38}$ ,  $F_{39}$  and D in the proposed multirate FIR filter



Fig. 15. Frequency response of the designed  $F_{22}$  band in (a) parallel IIR bank, (b) parallel FIR bank, (c) multirate IIR bank, and (d) multirate FIR bank, respectively. The circled parts show the imaging distortion due to the multirate processing.

bank have tap-lengths of 41, 33, 27 and 35, respectively. They require only 21, 17, 14, and 18 multiplications. Besides, half of the outputs of filter D are discarded due to the down-sampler. Averagely, the filter D requires 9 multiplications. Therefore, each octave requires 61 multiplications. Moreover, the 2nd, 3rd, 4th, 5th, and 6th octaves' sampling rates are only 1/2, 1/4, 1/8, 1/16, and 1/32 of that of the 1st octave; their computation complexity should be multiplied by 1/2, 1/4, 1/8, 1/16, and 1/32,

TABLE III COMPUTATION COMPLEXITY COMPARISONS OF FOUR AUDITORY COMPENSATION SYSTEMS

|                      |       | IIR | FIR   |  |
|----------------------|-------|-----|-------|--|
| Parallel             | # mpy | 192 | 3,270 |  |
| Paramer              | # add | 165 | 6,528 |  |
| Multirate            | # mpy | 102 | 120   |  |
| wi uniti ate         | # add | 90  | 233   |  |
| (Analysis bank only) |       |     |       |  |

|                                  |       | IIR | FIR   |  |
|----------------------------------|-------|-----|-------|--|
| Parallel                         | # mpy | 192 | 3,270 |  |
| rataliet                         | # add | 182 | 6,545 |  |
| Multirate                        | # mpy | 139 | 140   |  |
| Multirate                        | # add | 127 | 278   |  |
| (Analysis bank + Synthesis bank) |       |     |       |  |

(7 marysis bank + Synthesis bank

TABLE IV COMPARISONS OF STORAGE COMPLEXITY

|          | # bands | structure | # storage elements |
|----------|---------|-----------|--------------------|
| [7]      | 16      | parallel  | 110                |
| [6]      | 8       | IFIR      | 668                |
| Proposed | 18      | multirate | 246                |

respectively. Hence the required multiplications per sample of our design is about 120 (note that the 6th octave doest not contain filter D). This saves about 96% computations of the parallel FIR filter bank. Even though the multirate approach also saves the computations of the IIR filter bank, the saving is not as significant as that in the FIR filter bank. Therefore, the proposed multirate FIR filter bank has comparable computation complexity compared with any IIR one.

We also make a comparison on the storage complexity for the proposed design and the filter banks presented in [6] and [7]. Table IV lists the requirement of delay elements of these filter banks. (Note that here we only consider the delay-lines but not the coefficient memory.) The filter bank in [7] uses a parallel architecture, so its delay-lines of each FIR filters can be shared. Even though it consists of 16 FIR filters with 110 taps, it only requires one delay-line with 110 storage elements. The filter bank in [6] uses the IFIR approach, which requires more delay elements. Although it implements two filters with only 19 and 39 taps respectively, the overall filter bank requires 668 delay elements. Finally, our design requires 246 delay elements as described in Section V-A. Note that the design in [7] uses a dual-port memory but our design uses a single-port one.

## B. Effectiveness of Low-Power Optimizations

For evaluating the effectiveness of each low-power design technique described in Section V.B, we have implemented the proposed filter bank by using the cell-based design flow with the Artisan Metro standard cell library in TSMC 0.13  $\mu$ m technology. The data RAM in the memory block is implemented with a 256×16 bit register file that is generated by Artisan's Memory Compiler. In this subsection, the power is estimated through the gate-level and circuit-level simulations by using PrimePower and Nanosim, respectively. Three 250-ms input sequences are used for power estimation: the recorded female voice, male voice, and random signals.

As discussed in Section V-B, both the selective coefficient negation and the computation reordering techniques can apply



Fig. 16. Implementation results of applying low-power design techniques on our design.

to reduce the Hamming distance of the input pattern in order to minimize the switching activities of the MAC unit as best as possible to save power. Our simulations show that with the computation reordering, the averaged input Hamming distance is reduced from 6.4 to 5.3. After applying the selective negation technique, the averaged Hamming distance can be further reduced to 3.6. That is, up to 43.75% Hamming distance of the input sequence to the MAC unit is diminished. Consequently, the low-power consumption of the MAC unit is guaranteed.

Now, let's consider the operand isolation and clock-gating techniques on our design. The topmost bar in Fig. 16 shows the power distribution of the proposed filter bank, where the memory block consumes 67  $\mu$ W and the MAC unit consumes 68  $\mu$ W, respectively, and totally 178  $\mu$ W is dissipated. Applying the clock gating technique, the dissipated power of the MAC unit is reduced to 45  $\mu$ W, as shown in the second bar in Fig. 16.

Finally, we consider the multi-V<sub>DD</sub> implementation in the following. We first examine the multiplier, which is the main computing functional unit in the MAC unit. Fig. 17 depicts the simulation results of the energy per operation, in terms of pJ, as well as the critical-path delay, in terms of ns, of the multiplier related to different supply voltages. The energy dissipation of the multiplier is lowered, as one expects; however, the critical-path delay increases with the decreasing of the supply voltage. Recall that, as described in Section V.A, the timing constraint of the proposed architecture is moderate; this chip can operate at a lower supply voltage of 0.6 V without degrading performance. With multi- $V_{DD}$  implementation, as indicated in Fig. 13, we set  $V_{DD}2$  to 0.6 V, while maintain  $V_{DD}1$  at 1.2 V. The circuit-level simulation shows that the power consumption of the  $V_{DD}2$  power domain is reduced to 53  $\mu$ W. Consequently, the proposed design consumes totally 87  $\mu$ W with the multi-V<sub>DD</sub> implementation.

# C. Silicon Implementation

Fig. 18 demonstrates the micro-photo of the test chip and the corresponding gate counts of different modules. The core area is 0.3 mm  $\times$  0.3 mm (excluding I/O pads). The test chip contains two sets of the power/ground rings around the core. Besides, it has two power cuts on the chip's right side, which is used to separate the power domains of V<sub>DD</sub>1 and V<sub>DD</sub>2, respectively.

Table V summarizes the comparisons between the proposed design and other filter banks in the literature. Note that there is relatively less number of bands in [3] and [5], respectively.



Fig. 17. Energy dissipations and critical-path delays plot of the multiplier operated at different supply voltages.



Fig. 18. Silicon implementation.

Moreover, the filter bank in [5] only has 40 dB attenuation. The power performance of these filter banks may greatly increase if more bands are necessary and the attenuation is required to set to 60dB. On the other hand, the design in [7] is complicated, which implements each of the 16 bands with a 110-tap FIR filter. For the purpose of fair comparison with different process technologies, we normalize the power with respect to the square of supply voltage [25], the process [25], as well as the number of filter bands, that is,

$$P_{\text{normalized}} = \text{Power} \times \left(\frac{0.13}{\text{Process}}\right) \left(\frac{1.2}{V_{\text{DD}}}\right)^2 \left(\frac{1}{\#\text{bands}}\right)$$

As shown in Table V, we conclude that the proposed multirate ANSI S1.11 1/3-octave filter bank is the most energy-efficient. Compared with other filter banks, our filter bank has the advantages of low computation complexity and compliance with the ANSI S1.11 1/3-octave specification. But, the price we pay is the long group delay, which is due to the multirate structure and the severe filter specification. The researches have shown that delays more than 10 ms may cause disturbing perception for hearing aid users listening to their own voices [26]. However, since the proposed iterative, multirate filter bank algorithm is regular and the designed architecture is quite modular, the proposed hardware design can then be easily modified so that the number of bands, which is equal to the number of iterations, is programmable. That is, the delay of the proposed filter bank can be reduced gradually if we can use fewer bands, for example, 10, 13, 16, or 18 bands. The programmable filter bank of the auditory compensation in the advanced hearing aids is helpful and is the adequate solution for hearing-impaired people. For example, in applications such as watching movies or listening to the radio and music, we can use all of the bands for higher quality, and

 TABLE V

 COMPARISONS OF FILTER BANKS FOR HEARING AIDS

|          | #<br>bands | Process<br>(µm) | V <sub>DD</sub><br>(V) | Power<br>(µW) | P <sub>normailzed</sub> |
|----------|------------|-----------------|------------------------|---------------|-------------------------|
| [5]      | 7          | 0.70            | 1.55                   | 471           | 7.5                     |
| [3]      | 8          | 0.18            | 1.60                   | 316           | 16.0                    |
| [7]      | 16         | 0.35            | 1.10                   | 220           | 6.1                     |
| Proposed | 18         | 0.13            | 0.60/1.20              | 87            | 4.8                     |

in applications such as conversation, we can reduce the number of bands for shorter delay. Besides, the experimental results in [27] show that people having profounder hearing loss may be less sensitive to the delay effect. Therefore, if the users have severe hearing loss and are less sensitive to delay, they can use all the 18 bands to match the complicated prescriptions. On the contrary, if the users' prescriptions are flatter, they can use fewer bands to match such prescriptions and thus reduce the hearing aids' delay. Furthermore, the proposed filter bank not only applies for hearing aids but also for some assistive listening devices. For example, some new cell phones are designed for the hearing impaired with built-in hearing loss compensator. And, the classroom broadcasting systems for children with hearing loss [28] also need the auditory compensation function. In these applications, the users would not hear their own voices through the instrument. So, the delay has less effect in these situations and the proposed filter bank is applicable.

# VII. CONCLUSION

This paper addresses the low-power filter bank design for advanced digital hearing aids. In the literature, the standard ANSI S1.11 1/3-octave filter bank is rarely adopted in hearing aids due to high computation complexity even though it has the advantage of well matching the human hearing characteristics. We develop an efficient multirate filter bank algorithm to implement an 18-band ANSI S1.11 1/3-octave FIR filter bank. The proposed architecture needs only 4% of multiplications and additions of a straightforward parallel FIR filter bank design. We also investigate and apply several lower-power VLSI techniques, available in the literature, on our design. The clock gating and operand isolation saves a great amount of the power consumption. A test chip of this 18-band ANSI S1.11 1/3-octave filter bank has been fabricated in TSMC 0.13  $\mu$ m CMOS technology. The chip consumes only 87  $\mu$ W for the 18-band 24-kHz audio signal processing. The proposed filter bank is energy-efficient and being able to precisely matching the prescribed gains generated by the widely used NAL-NL1 hearing aid prescription formula. Although the group delay, which is the price paid for the low computation complexity, is long, our design is still very useful for some delay-insensitive applications.

#### ACKNOWLEDGMENT

This paper was extensively discussed in a series of meetings and extensive exchanges with Prof. S.-T. Young at the National Yang-Ming University, Taipei, Taiwan. The authors would like to thank Prof. Young for providing the Mandarin-specific hearing aid algorithm, the prescription formula, and a lot of constructive advice on this work.

## REFERENCES

- H. Dillon, *Hearing Aids*. New York: Thieme Medical Publisher, 2001.
- [2] R. Brennan and T. Schneider, "A flexible filter bank structure for extensive signal manipulations in digital hearing aids," in *Proc. IEEE Int. Symp. Circuits Syst.*, CA, 1998, pp. 569–572.
- [3] H. Li, G. A. Jullien, V. S. Dimitrov, M. Ahmadi, and W. Miller, "A 2-digit multidimensional logarithmic number system filter bank for a digital hearing aid architecture," in *Proc. IEEE Int. Symp. Circuits Syst.*, AZ, 2002, pp. II-760–763.
- [4] T. Lunner and J. Hellgren, "A digital filterbank hearing aid—Design, implementation and evaluation," in *Proc. ICASSP Conf.*, 1991, pp. 3661–3664.
- [5] L. S. Nielsen and J. Sparso, "Designing asynchronous circuits for low power: An IFIR filter bank for a digital hearing aid," in *Proc. IEEE*, Feb. 1999, vol. 87, no. 2, pp. 268–281.
- [6] Y. Lian and Y. Wei, "A computationally efficient nonuniform FIR digital filter bank for hearing aids," *IEEE Tran. Circuits Syst.*, vol. 52, no. 12, pp. 2754–2762, Dec. 2005.
- [7] K. S. Chong, B. H. Gwee, and J. S. Chang, "A 16-channel low-power nonuniform spaced filter bank core for digital hearing aid," *IEEE Tran. Circuits Syst.*, vol. 53, no. 9, pp. 853–857, Sep. 2006.
- [8] A. Lozano and A. Carlosena, "DSP-based implementation of an ANSI S1.11 acoustic analyzer," *IEEE Trans. Instrum. Meas.*, vol. 52, no. 4, pp. 1213–1219, Aug. 2003.
- [9] S. B. Davis, "Octave and fractional octave band digital filtering based on the proposed ANSI standard," in *Proc. ICASSP Conf.*, 1986, pp. 945–948.
- [10] Specification for Octave-Band and Fractional-Octave-Band Analog and Digital Filters, ANSI Standard S1.11-2004.
- [11] D. Byrne, H. Dillon, T. Ching, R. Katsch, and G. Keidser, "NAL-NL1 procedure for fitting nonlinear hearing aids: Characteristics and comparisons with other procedures," *J. Amer. Acad. of Audiology*, vol. 12, no. 1, pp. 37–54, Jan. 2001.
- [12] J. H. Chang, K. S. Tsai, P. C. Li, and S. T. Young, "Computer-aided simulation of multi-channel WDRC hearing aids," in *Proc. 17th Annu. Conv. & Expo Amer. Acad. of Audiology*, 2005.
- [13] E. Ozalevli, W. Huang, P. E. Hasler, and D. V. Anderson, "A reconfigurable mixed-signal VLSI implementation of distributed arithmetic used for finite-impulse response filtering," *IEEE Tran. Circuits Syst. I: Reg. Papers*, vol. 55, no. 2, pp. 510–521, Mar. 2008.
- [14] M. Aktan, A. Yurdakul, and G. Dundar, "An algorithm for the design of low-power hardware-efficient FIR filters," *IEEE Tran. Circuits Syst. I: Reg. Papers*, vol. 55, no. 6, pp. 1536–1545, Jul. 2008.
- [15] M. Chasm and F. A. Russo, "Hearing aids and music," *Trends in Amplification*, vol. 8, no. 2, pp. 35–47, 2004.
- [16] T. Van den Bogaert, J. Wouters, T. J. Klasen, and M. Moone, "Distortion of interaural time cues by directional noise reduction system in modern digital hearing aids," in *Proc. IEEE Workshop on Appl. of Signal Process. to Audio and Acoust.*, 2005.
- [17] Y. Z. Liu and Z. P. Lin, "Optimal design of frequency-response masking filters with reduced group delays," *IEEE Tran. Circuits Syst. I: Reg. Papers*, vol. 55, no. 6, pp. 1560–1570, Jul. 2008.
- [18] R. Lehto, T. Saramaki, and O. Vainio, "Synthesis of narrowband linear-phase filters with a piecewise-polynomial impulse response," *IEEE Tran. Circuits Syst. I: Reg. Papers*, vol. 54, no. 10, pp. 2262–2276, Oct. 2007.
- [19] Y. T. Kuo, T. J. Lin, Y. T. Li, W. H. Chang, and C.-W. Liu, "Design of ANSI S1.11 filter bank for digital hearing aids," in *Proc. ICECS Conf.*, 2007, pp. 242–245.
- [20] P. P. Vaidyanathan, *Multirate Systems and Filter Banks.*, New Jersey: Prentice Hall, 1993.
- [21] L. R. Rabiner and B. Gold, *Theory and Application of Digital Signal Processing*. New York: Prentice Hall, 1975.
- [22] M. Vishwanath, "The recursive pyramid algorithm for the discrete wavelet transform," *IEEE Trans. Signal Process.*, vol. 42, no. 3, pp. 673–676, Mar. 1994.
- [23] M. Mehendale and S. D. Sherlekar, VLSI Synthesis of DSP Kernels-Algorithmic and Architectural Transformations. Amsterdam, the Netherlands: Kluwer Academic, 2001.
- [24] A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS Design. Amsterdam, the Netherlands: Kluwer Academic, 1995.

- [25] T. C. Chen and R. B. Sheen, "A power-efficient wide-range phaselocked loop," *IEEE J. Solid-State Circuits*, vol. 37, no. 1, pp. 51–62, Jan. 2002.
- [26] M. A. Stone and B. C. J. Moore, "Tolerable hearing-aid delays: III. effects on speech production and perception of across-frequency variation in delay," *Ear and Hearing*, vol. 24, no. 2, pp. 175–183, 2003.
- [27] M. A. Stone and B. C. J. Moore, "Tolerable hearing-aid delays: IV. Effects on subjective disturbance during speech production by hearingimpaired subjects," *Ear and Hearing*, vol. 26, no. 2, pp. 225–235, 2005.
- [28] D. E. Lewis, "Assistive Devices for classroom listening: FM systems," Amer. J. Audiology, vol. 3, pp. 70–83, 1994.



**Yu-Ting Kuo** received the B.S. and the M.S. degrees in electronics engineering from National Chiao Tung University, Taiwan, in 2004 and 2006, respectively. He is currently working towards the Ph.D. degree in electronics engineering at National Chiao Tung University, Taiwan.

His researches include low-power signal processing, digital hearing aids, and computer architecture.



**Tay-Jyi Lin** (S'00–M'06) received the B.S. degree in electrical and control engineering and the Ph.D. degree in electronics engineering, from National Chiao Tung University, Taiwan, in 1998 and 2005, respectively.

He is currently with the Microelectronics and Information Systems Research Center, National Chiao Tung University, as a researcher assistant professor. His research interests include VLSI signal processing, low-power design methodology, and computer architecture.









**Chih-Wei Liu** (M'03) was born in Taiwan. He received the B.S. and Ph.D. degrees, both in electrical engineering, from National Tsing Hua University, Hsinchu, Taiwan, in 1991 and 1999, respectively.

From 1999 to 2000, he was an integrated circuits design engineer at the Electronics Research and Service Organization (ERSO) of Industrial Technology Research Institute (ITRI), Hsinchu, Taiwan. Then, near the end of 2000, he started to work for SoC Technology Center (STC) of ITRI as a project leader and eventually left ITRI at the end of Sept. 2003. He

is currently with the Department of Electronics Engineering and the Institute of Electronics, National Chiao Tung University, Hsinchu, Taiwan, as an associate professor. His current research interests are SoC and VLSI system design, processor for embedded computing system, digital signal processing, digital communications, and coding theory.