# 10-ms 18-Band Quasi-ANSI S1.11 1/3-Octave Filter Bank for Digital Hearing Aids Chih-Wei Liu, Member, IEEE, Kuo-Chiang Chang, Ming-Hsun Chuang, and Ching-Hao Lin Abstract—The ANSI S1.11 1/3-octave filter bank is suitable for digital hearing aids, but its large group delay and high computational complexity complicate matters considerably. This study presents a 10-ms 18-band quasi-ANSI S1.11 1/3-octave filter bank for processing 24 kHz audio signals. We first discuss a filter order optimization algorithm to define the quasi-ANSI filters. The group delay constraint of filters is limited to 10 ms. The proposed design adopts an efficient prescription-fitting algorithm to reduce inter-band interference, enabling the proposed quasi-ANSI filter bank to compensate any type of hearing loss (HL) using the NAL-NL1 or HSE prescription formulas. Simulation results reveal that the maximum matching error in the prescriptions of the mild HL, moderate HL, and severe-to-profound HL is less than 1.5 dB. This study also investigates the complexity-effective multirate IFIR quasi-ANSI filter bank. For an 18-band digital hearing aid with a 24 kHz sampling rate, the proposed architecture eliminates approximately 93% of the multiplications and up to 74% of the storage elements, compared with a parallel FIR filters architecture. The proposed analysis filter bank (AFB) was designed in UMC 90 nm CMOS high-VT technology, and on the basis of post-layout simulations, it consumes 73 $\mu$ W (@V<sub>DD</sub> = 1 V). By voltage scaling (to 0.6 V), the simulation results show that the power consumption decreases to 27 $\mu$ W, which is approximately 30% of that consumed by the most energy-efficient AFB available in the literature for use in hearing aids. Index Terms—Filter bank, hearing aid, low group delay. # I. INTRODUCTION EARING loss [1]–[3] can be characterized as *conductive*, *sensorineural*, and *mixed* hearing loss. Conductive hearing loss means the sound is not conducted well through a disordered outer or middle ear. Sensorineural hearing loss (SNHL) means the sensory cells in the cochlea are absent or not functioning appropriately. If both conductive and sensorineural losses are present, the result is mixed hearing loss. Conductive hearing loss can be recovered after some adequate treatments, but most people with SNHL are fitted with hearing aids. SNHL can degrade the functions of human ear in several different ways and introduce phenomena such as a raised hearing threshold, decreased and squeezed hearing range, reduced temporal and spectral resolution, and the loss of noise tolerance [1]. These factors make hearing aids more complex than simply amplifying sound [1]–[3]. Manuscript received November 14, 2011; revised April 17, 2012, June 15, 2012; accepted June 28, 2012. Date of publication August 14, 2012; date of current version February 21, 2013. This work was supported in part by the Nation Science Council, Taiwan, under Grant NSC99-2220-E-009-057- and by a grant of making the chip in the National Chip Implementation Center (CIC), Taiwan. This paper was recommended by Associate Editor M. Chakraborty. The authors are with the Department of Electronics Engineering, National Chiao Tung University, Hsinchu, Taiwan (e-mail: cwliu@twins.ee.nctu.edu.tw). Digital Object Identifier 10.1109/TCSI.2012.2209731 Audiologists usually identify and diagnose hearing loss with the pure tone audiogram (PTA) test, which uses sinusoidal signals over octave frequencies from 250 Hz to 8 kHz to measure the minimum levels of sound (i.e., hearing thresholds). The results of PTA test are generally recorded on an audiogram. Fig. 1 demonstrates a typical example of moderate-to-severe hearing loss. Fitting hearing aids usually requires a prescription formula. The widely used NAL-NL1 [4], or the HSE for Chinese [5], generates the ideal electro-acoustic response (i.e., the gain-curve) of a hearing aid. The gain-curve specifies the insertion gain, or the amplification, at each standard 1/3-octave frequency from 150 Hz to 8 kHz. The goal of the NAL-NL1 is to maximize the speech intelligibility while maintaining the loudness of the amplified sound equal to, or less than, that perceived by people with normal hearing. The NAL-NL1 produces different gain-curves for different input sound pressure levels (SPLs) (e.g., 40, 50, 60, 65, 70, 80, and 90 dB). The right side of Fig. 1 illustrates the example prescription of a 40 dB SPL input level. Advanced hearing aids are currently battery-powered digital devices consisting of a microphone, digital signal processing (DSP) circuit, and receiver (i.e., the loudspeaker) [1]–[3]. The microphone and the receiver perform the transformation between acoustic and electrical signals. The DSP circuit performs sophisticated functions including the auditory compensation algorithm to overcome the hearing loss, and noise reduction and feedback cancellation to improve speech quality and intelligibility. The DSP circuit also uses adaptive directional microphones and spectral shaping for speech enhancement. According to Kates [3], a DSP block, performing entire set of DSP functions, typically consumes up to 61% of the overall power budget of a digital hearing aid. One common approach to realize the auditory compensation algorithm, which makes the sound audible for hearing-aid wearer, is to employ an analysis filter bank (AFB) followed by sub-band amplification and multi-channel wide dynamic-range control (WDRC) and an synthesis filter bank (SFB) [1]-[3], [6], [14]. A low power Mandarin-specific hearing aid test chip was recently implemented in UMC 90 nm CMOS technology with High-VT standard cells [6]. The test chip contains an 18-band filter bank and 3-channel WDRC auditory compensator and a multi-band noise reduction with entropy enhanced voice activity detection (VAD). The power consumed by AFB is approximately 27% of the total power in [6]. The filter bank designed for use in hearing aids can be classified as uniform filter banks [7], [8] and non-uniform filter banks [9]–[14]. Fig. 2 shows different types of filter bank for reader's reference. A 32-band discrete Fourier transform (DFT) filter bank was designed in [7], and an 8-band filter bank with equal-spaced finite-impulse response (FIR) filters was reported in [8]. Non-uniform filter banks can be further classified into octave- Fig. 1. Audiogram versus prescription formula plot for 40 dB SPL input level. band [9]–[11], critical-band [12], symmetric-band [13], and 1/3-octave-band [14] filter banks. A 7-band octave filter bank was designed in [9] and [10] using the interpolated FIR (IFIR) filter technique. Lian and Wei [11] proposed an 8-band octave filter bank with the IFIR and frequency-response masking (FRM) techniques to reduce the computational complexity. Chong *et al.* designed a critical-band filter bank to match well human perception [12]. However, the irregular property of the critical bands makes their implementation difficult. In addition to [11], Wei and Lian proposed a 16-band symmetric filter bank [13] that guarantees high frequency resolution at both high and low frequency regions but rather low resolution near to $\pi/2$ . Kuo *et al.* recently proposed an efficient 18-band ANSI S1.11 1/3-octave filter bank [14]. This 1/3-octave filter bank is suitable for both NAL-NL1 and HSE prescription formulas for hearing aids. To design a filter bank for hearing aids, the frequency response should match the prescription as closely as possible. Suppose that the prescriptions by NAL-NL1 or HSE in Fig. 1 are the target specification, and we evaluate the matching capability of different types of filter bank (Fig. 3). Further assume that filter banks in Fig. 3 possess 18 bands and the prescription-fitting algorithm, described in Section II, is applied to minimize the matching error. The uniform filter bank has equal-space sub-band bandwidth, which results in a fixed frequency resolution. The lowest resolution in the low frequency region contributes the maximum matching error, which is approximately 8.4 dB. The symmetric filter bank, on the other hand, has a rather low frequency resolution near $\pi/2$ . The maximum matching error, appearing in the middle frequency region, equals 6.2 dB. With matching to human hearing characteristics, the critical-like filter bank reduces the maximum matching error to 3 dB. Finally, the 18-band ANSI S1.11 1/3-octave filter bank achieves zero matching error because the frequency sampling points of NAL-NL1 or HSE are the same as the central frequencies of ANSI filter bank [15]. Filters usually cause delays in the datapath of the hearing aid. Although the 1/3-octave filter bank has the best matching capability, it suffers from 78 ms delay for processing 24 kHz Fig. 2. Different types of filter bank. audio [14]. The delay of the 1/3-octave filter bank is still up to 27 ms if parallel minimum-phase infinite-impulse response (IIR) filters are applied [14]. This is because the sharp transition bandwidth of the ANSI filter is defined in a low frequency region [15]. Except for ANSI filter bank, the other filter banks in Fig. 3 have delays of approximately 10 ms. The matching capability of different filter banks obeys the acoustic uncertainty principle, which states that the time-bandwidth product is constant. That is, if spectral resolution increases, temporal resolution decreases, and vice versa. Hearing aids transmit signals into the ear canals through two different paths. One is the directly received sound and the other is the sound processed by the hearing aid. Previous studies have Fig. 3. Matching capability comparisons for different types of filter bank. investigated the acceptable delay introduced by the hearing aid. The general requirement of less than 12 ms [2] prevents the loss of visual cues (un-synchronized) with respect to hearing. Stone and Moore [16], [17] indicated that a delay of 20–30 ms can be judged as objectionable for mild-to-moderate hearing loss. The popularity of open-canal (OC) fitting hearing aids, which leave the ear canal much more open than traditional close-fitting earmolds, makes hearing aid delays even more concerning. In the OC fitting hearing aid, more sounds would travel directly into the ear canal. A delay of approximately 10 ms might create the comb filter effect [18], [19] (which will not be the case at most of frequencies) if the direct path signal amplitude is comparable to the one produced by the hearing aid. Using the high performance ANSI 1/3-octave filter bank, a relaxed-version with a low group delay filter bank, called the quasi-ANSI filter bank, for the digital hearing aid is designed and implemented. This study proposes a filter order optimization algorithm for developing the FIR filters. The delay constraint of each filter is limited to 10 ms. To reduce the match error, this study also considers an efficient prescription fitting algorithm. Simulation results show that the maximum matching error to various prescriptions of different types of hearing loss is less than 1.5 dB. Moreover, a low complexity multirate IFIR filter bank architecture is proposed. Compared with an 18-band parallel FIR filters, this design saves approximately 93% of the multiplications and 74% of the storage elements. The proposed analysis filter bank has also been implemented in UMC 90 nm CMOS technology with a high-VT standard cell library. By processing 24 kHz audio, the chip consumes only 73 $\mu$ W. Applying voltage scaling enables further energy savings. If the supply voltage decreases to 0.6 V, the simulation result reveals that the power consumption of the proposed analysis filter bank equals 27 $\mu$ W, which is about 30% of that consumed by the most energy-efficient AFB [14] available in the literature design for the hearing aid. The rest of this paper is organized as follows. Section II presents a low delay quasi-ANSI S1.11 1/3-octave filter bank using a filter order optimization algorithm and an efficient prescription-fitting algorithm to minimize the matching error. Several simulation results in this section verify the effec- tiveness of the proposed filter bank. Section III develops the low-complexity VLSI architecture of the proposed filter bank by exploiting the IFIR and multirate signal processing techniques. Section IV demonstrates the implementation result of the proposed filter bank. Finally, Section V presents some concluding remarks. ### II. LOW-DELAY FILTER BANK DESIGN This section presents a 10-ms 18-band quasi-ANSI S1.11 1/3-octave FIR filter bank, denoted by $G_1$ - $G_{18}$ , for digital hearing aids. ## A. Quasi-ANSI S1.11 1/3-Octave Filter Bank The ANSI S1.11 standard [15] defines 3-class, 43 1/3-octave bands covering the frequency range of 0–20 kHz. Each 1/3-octave band is specified by its midband frequency (or central frequency) and bandwidth. The midband frequency of the nth band, denoted by $f_m(n)$ , is defined by $$f_m(n) = 2^{\left(\frac{n-30}{3}\right)} f_r \tag{1}$$ where $f_r$ , the reference frequency, is set to 1 kHz. For example, the midband frequency of $f_m(22)$ , the 22nd 1/3-octave band, is 157 Hz and, the midband frequency of $f_m(39)$ is 8 kHz. With the midband frequency $f_m(n)$ , two band-edge frequencies $f_1(n)$ and $f_2(n)$ of the nth band are determined by $$f_1(n) = 2^{\frac{-1}{6}} f_m(n)$$ , and $f_2(n) = 2^{\frac{1}{6}} f_m(n)$ . (2) The bandwidth of the nth band can then be calculated by $$\Delta f(n) = f_2(n) - f_1(n). \tag{3}$$ Fig. 4(a) illustrates the specification of ANSI S1.11 class-2 octave-band filter, where $\min A(\omega)$ and $\max A(\omega)$ denote the limits on the minimum and maximum attenuations of the nth band filter, respectively. The stop-band attenuation of each band is at least 60 dB. Note that NAL-NL1 and HSE both generate the prescription of a hearing aid at standard 1/3-octave frequencies from 150 Hz to 8 kHz. If one applies ANSI filters to fit the prescription by NAL-NL1 or HSE, it is rational to design the filters Fig. 4. (a) ANSI S1.11 class-2 filter specification [15] and, (b) parameters of the designed filter. $F_{22}$ - $F_{39}$ , (i.e., the 22nd–39th 1/3-octave filters). An efficient algorithm to optimize the coefficients of filters $F_{22}$ - $F_{39}$ was proposed in [14]. Given by parameters $[\delta_p, \delta_s, f_{s1}, f_{s2}, f_{p1}, f_{p2}]$ in Fig. 4(b), where $\delta_p$ and $\delta_s$ are pass-band ripple and stop-band attenuation, $(f_{s1}, f_{s2})$ is the pair of stop-band band-edge frequencies, and $(f_{p1}, f_{p2})$ is the pass-band band-edge frequency pair, the algorithm [14] applies Park-McClellan algorithm to design linear-phase FIR filter. If P denotes the order of the filter, then P can be estimated by [24] $$P \approx \frac{-10\log_{10}(\delta_s \delta_p) - 13}{2.43B_{\text{TW}}}$$ (4) where $B_{\rm TW} = {\rm min}(B_{\rm TW1}, B_{\rm TW2})$ and $(B_{\rm TW1}, B_{\rm TW2})$ is the transition bandwidth pair of the bandpass filter, (i.e., $B_{\rm TW1} = f_{p1} - f_{s1}$ and $B_{\rm TW2} = f_{s2} - f_{p2}$ ). The group delay $T_g$ of the filter is $$T_g = \frac{P}{2f_s} \tag{5}$$ where $f_s$ is the sampling frequency. Note that maximizing $B_{\rm TW}$ simultaneously minimizes the filter order thereof. Instead of searching for $[\delta_p, \delta_s, f_{s1}, f_{s2}, f_{p1}, f_{p2}]$ , the proposed algorithm [14] explores the feasible maximum transition bandwidth of the bandpass filter. If the transition bandwidth pair is not feasible, $(B_{\rm TW1}, B_{\rm TW2})$ decrease by 5% each time until the designed filter meets ANSI specifications. The 60 dB attenuation of the proposed $F_{39}$ in [14] is approximately 11.216 kHz. Therefore, the sampling frequency is set to 24 kHz to meet Nyquist sampling theorem. Simulation results show that the order of $F_{22}$ , the sharpest filter in [14], is 1488. Hence, the group delay of the proposed filter bank is 31 ms for straightforward parallel implementation. To reduce multiplicative complexity greatly, Kuo *et al.* [14] applied an area-efficient Fig. 5. Quasi-ANSI S1.11 1/3-octave filter coefficient optimization algorithm. iterative architecture that saves approximately 96% of multiplications and additions. However, this design suffers from a large delay of 78 ms. Based on the good matching performance, this study designs a relaxed-version of standard ANSI filters of constraint tap-length (i.e., $G_1$ - $G_{18}$ , called the quasi-ANSI filter bank) for digital hearing aids. Fig. 5 outlines the proposed filter coefficient optimization algorithm, which contains two iterative design procedures: one meets the 10 ms group delay constraint, and the other limits the relaxation in the matching error. Note that an advanced noise reduction algorithm, such as the Siemens SoundSmoothing noise reduction algorithm [20], contributes a nearly 1 ms group delay [19]. Therefore, the constraint of 10 ms group delay of the filter bank is sufficient to meet the general requirement of the hearing aid without loss of visual cues with respect to hearing [2]. Moreover, to design a filter bank for the hearing aid, the frequency response should match the prescription as closely as possible. A 3 dB error performance is also a necessary constraint to achieve the preferable compensation for each hearing loss pattern. The proposed algorithm starts on the standard ANSI filter $F_{22}$ - $F_{39}$ . The Design filter coefficient with minimal order algorithm (Fig. 5) is almost the same as the optimization flow in [14], except that it slightly stretches the transition bandwidth pair $(B_{\rm TW1}, B_{\rm TW2})$ , step by step, at which the group delay of the filter is larger than 10 ms by Next(log<sub>2</sub> $$B_{\text{TW1}}$$ , log<sub>2</sub> $B_{\text{TW2}}$ ) = $(1 + k)(\log_2 B_{\text{TW1}}, \log_2 B_{\text{TW2}})$ . (6) Note that expanding the transition bandwidth reduces the group delay of the designed filter by (4) and (5). However, the adjacent filter interference does increase, which degrades the matching | Factor k | Group delay (ms) | Matching error (dB) | |----------|------------------|---------------------| | 0.4 | 17.0 | 0.8 | | 0.6 | 13.4 | 1.4 | | 0.8 | 10.0 | 1.5 | | 0.9 | 9.9 | 1.5 | | 1.0 | 9.8 | 1.5 | | 1.1 | 9.6 | 1.8 | | 1.2 | 9.4 | 1.9 | | 1.4 | 9.4 | 2.0 | TABLE I EXPLORATION RESULTS OF FILTER $G_1$ performance of the filter bank. The Minimize matching-error algorithm reduces the matching error caused by inter-band interferences. Suppose that $G_{m,n}, n=1,2,\ldots,18$ , is the sampled amplitude response (in dB) of the filter $G_m$ at the 18 standard 1/3-octave frequency. If $I'_m$ is the given prescribed gain for the m-th band, then $\sum_{m=1}^{18} G_{m,n}I'_m$ can be considered as the resulting frequency response sampled at the 18 1/3-octave frequency. Suppose further that $I_n, n=1,2,\ldots,18$ , is the target prescription by NAL-NL1. Then, obtain $I'_m$ to minimize the maximum matching-error by solving $$\min(\max_{n} E_n) = \min\left(\max_{n} \left| I_n - \sum_{m=1}^{18} G_{m,n} I_m' \right| \right)$$ (7) where $E_n, n=1,2,\ldots,18$ , is the matching error of the nth prescribed gain. If the 3 dB matching error constraint is not satisfied, it may be necessary to fine-tune the attenuation factor $\delta_s$ and re-design the filter, as Fig. 5 shows. Table I presents the exploration results of the filter $G_1$ . The group delay gradually decreases as the value of k increases. For k>0.8, the group delay is smaller than 10 ms, which meets the design constraint. From an implementation point of view, the value of k should be as large as possible. However, because of the inter-band interference, the matching error increases as the $B_{\rm TW}$ expands. The simulation results in Table I show that the matching error is less than 1.5 dB if $k \leq 1$ . This suggests that k=1 for the filter $G_1$ . Similar design procedures can be applied to define the other filters $G_2$ - $G_{18}$ . In summary, the upper 9 filters (i.e., $G_{10}$ - $G_{18}$ ) are just the standard filters $F_{31}$ - $F_{39}$ , while the lower 9 filters (i.e., $G_1$ - $G_9$ ,) are the relaxed versions of the ANSI filters $F_{22}$ - $F_{30}$ . For comparison, Fig. 6 depicts the magnitude responses of the proposed quasi-ANSI filters $G_1$ - $G_{10}$ and the corresponding ANSI filters $F_{22}$ - $F_{31}$ . ## B. Verification Results To evaluate the effectiveness of the proposed filter bank, this study uses audiograms from the Independent Hearing Aid Information, a public service of Hearing Alliance of America [21]. These audiograms include mild hearing loss, moderate hearing loss, and severe-to-profound hearing loss. These audiograms also appear in [11], but they considered fitting the audiograms only, and not their prescriptions. The audiogram in Fig. 7(a) depicts low frequency mild-to-moderate hearing loss and mild high frequency hearing loss. Fig. 6. Magnitude response comparison between (a) standard ANSI filters and (b) quasi-ANSI S1.11 1/3-octave filters. People with this type of hearing loss lose overall loudness because most vowels cannot be heard. Very close distance conversations should be necessary. The maximum matching error of the proposed filter bank is approximately 0.1 dB. The audiogram in Fig. 7(b), like that in Fig. 1, reveals moderate-to-severe hearing loss at middle to high frequency region, which is the common type of hearing loss caused by aging. The sensitivity at low frequencies is good enough to get some vowel information, helping the person realize that someone is talking. However, without consonants, they cannot easily distinguish between one word and another. The maximum matching error of the proposed filter bank is approximately 0.4 dB, which is slightly worse than 0 dB, the standard ANSI filter bank, but much better than the others in Fig. 3. The audiogram in Fig. 7(c) reveals severe-to-profound hearing loss at middle to high frequency region, which occurs commonly in older workers exposed to noisy environments for prolonged periods. The maximum matching error of the proposed filter bank is approximately 0.6 dB. Finally, the audiogram in Fig. 7(d) shows severe flat hearing loss at all frequencies, where the hearing thresholds are more than 70 dB. Although this is a difficult case to compensate for, the maximum matching error is less than 1.5 dB, thus validating the effectiveness of the proposed filter bank. ## III. MULTIRATE IFIR QUASI-ANSI FILTER BANK This section presents the efficient VLSI architecture of the proposed filter bank by exploiting the IFIR and multirate signal processing techniques. Given by filter parameters $[\delta_p,\delta_s,f_{s1},f_{s2},f_{p1},f_{p2}]$ of G(z), the basic IFIR structure consists of an image suppression filter I(z) and a model filter H(z) [22]. The minimum transition bandwidth of H(z) is L times of that of G(z), i.e., $B_{\mathrm{TW}}^H = L \times B_{\mathrm{TW}}^G$ . L represents the interpolation factor. An L-fold interpolated filter $H(z^L)$ would contain repeatedly duplicate spectra in the frequency domain with period $2\pi/L$ . Filtering by I(z) produces G(z): $$G(z) = I(z)H(z^L)$$ (8) Suppose that filters I(z) and H(z) are linear-phase FIR filters. According to (4), increasing the transition bandwidth by Fig. 7. Matching results for different types of hearing loss: (a) mild to moderate hearing loss in low frequencies, (b) hearing loss due to aging, (c) noise induced deafness, (d) severe to profound flat hearing loss. L times decreases the order of the model filter H(z) almost L-fold. However, this tightens the design constraint of the filter I(z) to satisfy the given specification. For simplicity, assume that the pass-band ripple of each filter in (8) is approximately $\delta_p/2$ . Then, based on (4) and the IFIR technique, the number of multiplications per sample for G(z) can be estimated by $P_H$ , the order of H(z), and $P_I$ , the order of I(z): $$\frac{P_H + P_I}{2} \approx \frac{-10\log_{10}\left(\frac{\delta_p}{2}\delta_s\right) - 13}{4.86} \left(\frac{1}{B_{\text{TW}}^H} + \frac{1}{B_{\text{TW}}^I}\right)$$ (9) where $B_{\rm TW}^I = (\frac{2\pi}{L} - f_{s2}) - f_{p2}$ is the minimum transition bandwidths of I(z). Specifying the best choice of the factor L is equivalent to minimizing the right-hand side of (9), which is the simple convex optimization problem. Carefully selecting L results in the optimum filter design with minimum complexity. Consider the filter $G_1$ as an illustrative example. The parameters $[\delta_p, \delta_s, f_{s1}, f_{s2}, f_{p1}, f_{p2}]$ of filter $G_1$ are [1 dB, 60 dB, 19 Hz, 298 Hz, 149 Hz, 168 Hz]. Using Parks-McClellan algorithm directly, the filter order is 380 and the number of multiplications per sample is 191. Solving the differential equation of (9) with respect to L yields the possible integer solutions of L, which are either 10 or 11. Table II describes the multiplicative complexity with L=10 and L=11, respectively, indicating that L=11 produces the minimum solution. Hence, the filter order of $G_1(z)$ is 384 and the required multiplications per sample is 32. Compared with the direct implementation, this greatly reduces computation complexity. The cost of applying IFIR is a slight increase in the overall group delay. In addition to the IFIR technique, multirate signal processing can be used to further reduce the computation complexity. According to multirate system theory [22], if the stop-band frequency of the band-limit filter is lower than $\pi/M$ , it can be down-sampled by a factor of M to reduce the complexity. M is called the decimation factor. Fig. 8(a) demonstrates the multirate IFIR architecture of the implementation of (8) considering TABLE II COMPUTATIONAL COMPLEXITY COMPARISON WITH L=10 and 11 | L | 10 | 11 | |-----------------|-----|-----| | # mpy of $H(z)$ | 18 | 16 | | # mpy of $I(z)$ | 14 | 16 | | # mpy of $G(z)$ | 32 | 32 | | order of $G(z)$ | 388 | 384 | the analysis filter and the synthesis filter. With the noble identity property, the structure of Fig. 8(a) can be reduced to Fig. 8(b). A down-sampling factor M leads to a lower data rate for processing, and can decrease the computational complexity of the filter. Unfortunately, increasing the factor M is equivalent to tightening the design constraint of the interpolation filter $I_A(z)$ , and $I_S(z)$ . Similarly, the optimum filter design relies on careful selection of the factor M. Simply assume that the pass-band ripple of each filter is approximately $\delta_p/3$ , the multiplicative complexity of the multirate IFIR architecture in Fig. 8(b) can then be estimated by $$\frac{-10\log_{10}\left(\frac{\delta_{p}}{3}\delta_{s}\right) - 13}{4.86} \left(\frac{1}{B_{\text{TW}}^{H}} + \frac{1}{B_{\text{TW}}^{IA}} + \frac{1}{B_{\text{TW}}^{IS}}\right) \quad (10)$$ where $B_{\mathrm{TW}}^{H} = LM \times B_{\mathrm{TW}}^{G}, B_{\mathrm{TW}}^{IA} = (\frac{2\pi}{L} - f_{s2}) - f_{p2}$ , and $B_{\mathrm{TW}}^{IS} = (\frac{2\pi}{M} - f_{s2}) - f_{p2}$ . Without loss of generality, assume that L = M to find the minimum value of (10). Solving the differential equation with respect to L obtains the minimum solution when L = 5 and, the number of multiplications per sample is 23, which is smaller than 32 using the IFIR technique. Fig. 9 presents the exploration results of $G_1$ for different values of L. Instead of L=5, consider the case of L=4. The number of multiplications per sample is still 23, but the order of $G_1$ with L=4 is slightly larger than the minimum solution. However, L=4 is still preferable for $G_1$ because the power of 2 factor is easy to implement. The same multirate IFIR exploration procedures for the other filters $G_2$ - $G_{18}$ can also be applied, indicating that the down- Fig. 8. (a) Illustrations of multirate IFIR implementation, and (b) noble identity. Fig. 9. Exploration results for the estimated number of multiplications per sample versus the factor L for filter $G_1(z)$ . TABLE III TAP-LENGTH OF EACH SUB-FILTER IN THE QUASI-ANSI FILTER BANK | Filter | $I_{A1}$ | $I_{A2}$ | $H_{16}$ | $H_{17}$ | $H_{18}$ | |------------|----------|----------|----------|----------|----------| | Tap-length | 35 | 49 | 41 | 33 | 27 | | Filter | $H_9$ | $H_8$ | $H_7$ | $H_6$ | $H_5$ | $H_4$ | $H_3$ | $H_2$ | $H_1$ | |------------|-------|-------|-------|-------|-------|-------|-------|-------|-------| | Tap-length | 67 | 83 | 95 | 97 | 97 | 97 | 97 | 97 | 97 | sampling factor of the filters $G_1$ - $G_{12}$ is 4 and the down-sampling factor of the filters $G_{13}$ - $G_{15}$ is 2. With these exploration results, it is possible to develop a low complexity multirate IFIR quasi-ANSI filter bank for digital hearing aids (Fig. 10). Because filters $G_{10}$ - $G_{18}$ are simply the standard ANSI filters $F_{31}$ - $F_{39}$ , the band<sub>10</sub>-band<sub>18</sub> can be recursively constructed by three identical filters (i.e., $H_{16}$ - $H_{18}$ ) using the 1/3-octave symmetry property [14]. Moreover, because of the identical downsampling factor, the image suppression filters for $G_1$ - $G_{12}$ can be shared (i.e., $I_{A2}$ ) to save cost. And, it is only necessary to consider filter $G_{12}$ when designing $I_{A2}$ because it supplies the strictest constraint among $G_1$ - $G_{12}$ . Fig. 10 includes a synthesis bank with up-samplers and interpolation filters to reconstruct the signal. The interpolation filters are designed to filter out all imaging distortions caused by up-sampling. This design considers $I_{A1} = I_{S1}$ and $I_{A2} = I_{S2}$ for simplicity. Consequently, the proposed quasi-ANSI filter bank contains 14 sub-filters, which are $I_{A1}$ , $I_{A2}$ , $H_1$ - $H_9$ , and $H_{16}$ - $H_{18}$ . Table III shows the tap-length of each sub-filter, with a maximum of 97 and a minimum of 27. Fig. 11 shows the data scheduling algorithm for the proposed multirate IFIR filter bank, which is the recursive pyramid algorithm (RPA) [25]. Consider the computation complexity of the proposed quasi-ANSI filter bank. The RPA processes the 1st oc- tave for every sample. This requires 21+17+14=52 multiplications because of the symmetry property of linear phase FIR filter. In addition to $I_{A1}$ , the 2nd octave is processed every two samples, requiring 52 multiplications. In addition to $I_{A2}$ , the 3rd-6th octaves are processed every four samples, which requires $52+34+42+48+49\times 6=470$ multiplications. The filters $I_{A1}$ and $I_{A2}$ can be implemented using the poly-phase decomposition method to reduce complexity. Therefore, for the analysis filter bank (AFB), the number of multiplications per sample is $$\frac{4(52) + 2(52) + 2(18) + 470 + 25}{4} = 210.75 \approx 211.$$ Similarly, for the synthesis filter bank (SFB), the number of multiplications per sample can be calculated as $$\frac{2(18) + 25}{4} = 15.25 \approx 15.$$ Table IV summarizes the multiplicative complexity of three different architectures of the 18-band 1/3-octave filter bank, which are the parallel low-delay quasi-ANSI FIR filters, the iterative standard ANSI filter bank [14], and the proposed architecture. Comparing with the parallel FIR filters, this design saves approximately 93% of multiplications per sample. However, the complexity increases by approximately 61% of that of iterative architecture in [14]. This study also evaluates the storage complexity of the proposed architecture. Recall from Fig. 11 that three independent delay-lines are needed to complete the filtering process. The first delay-line serves the filtering calculations for $I_{A1}$ , $I_{A2}$ , and $H_{16}$ - $H_{18}$ , and requires 49 registers, the second delay-line serves $H_{13}$ - $H_{15}$ and requires 41 registers, and the third delay-line for $H_1$ - $H_{12}$ requires 97 registers. Hence, 187 data registers are required for the proposed AFB. As a result, the path for the 6th octave bands contributes the longest latency. Consequently, the group delay of the proposed filter bank can be determined by (5): $$\frac{P_{I_{A2}} + 4P_{H_1} + P_{I_{S2}}}{2f_s} = \frac{48 + 4(96) + 48}{2(24 \text{ kHz})} = 10 \text{ ms}.$$ This confirms that the latency of the proposed filter bank is 10 ms for processing 24 kHz sound. Considering AFB only, the group delay is 9 ms. Other than the longest path, some buffer registers in SFB should be added to the first two octave bands to ensure that all 18 bands have the same phase shift to avoid frequency dependent delay [16]. Hence, sharing first delay line requires an additional $240 - (24 + 2 \times 20 + 17) = 159$ buffer registers for the second octave bands. The first octave bands can share the 159 buffer registers. Thus, the first octave bands require an additional 240 - 159 - 24 = 57 buffer registers. Moreover, filters $I_{S1}$ and $I_{S2}$ require 35 and 49 data registers. Consequently, the proposed SFB requires 159 + 57 + 35 + 49 = 300 registers. Although normal human ears are not sensitive to phase-delay, designing filter bank with exact linear phase [11], [12], [14], offers some advantages regarding the development of advanced binaural hearing aids, which not only target at compensating hearing losses, but also music signals and sound localization for binaural hearing aids [1]–[3]. Fig. 10. Proposed 18-band multirate IFIR quasi-ANSI filter bank Fig. 11. Computation scheduling for the proposed filter bank. TABLE IV COMPLEXITY COMPARISON ON DIFFERENT 1/3-OCTAVE FILTER BANKS | | # mj | Group | | | |------------------|------|-------|-------|--------| | | AFB | Total | Delay | | | Parallel filters | 3144 | 0 | 3144 | 7.7 ms | | [14] | 120 | 20 | 140 | 78 ms | | This work | 211 | 15 | 226 | 10 ms | Table V lists the comparison of the storage complexity of different filter banks. The filter bank in [11] uses the FRM parallel architecture, which is generated by two prototype filters H(z) and $F_m(z)$ with tap-lengths of 19 and 39 respectively. With symmetry coefficients in the linear phase FIR filter, the coefficient memory requires only 30 registers. Using a similar calculation, the FRM parallel architecture requires 714 data registers for AFB and 196 buffer registers for SFB. The filter bank in [12] uses a parallel of 16 FIR filters with an equal tap-length of 110. Hence, it requires $55 \times 16 = 880$ registers for coefficients. With a fully parallel architecture, the delay-line of each FIR filter can be shared and SFB does not need any buffer register. In this case, AFB requires 110 data registers. Standard ANSI filter bank has $TABLE\ V$ Comparison of Storage Complexity of Different Filter Banks | | # | fs | | # storage elements | | | | |-----------|-------|-------|---------------------|--------------------|------|-------|--| | | bands | (KHz) | structure | AFB | SFB | Coef. | | | [11] | 8 | 16 | FRM +<br>Parallel | 714 | 196 | 30 | | | [12] | 16 | 16 | Parallel | 110 | 0 | 880 | | | [14] | 18 | 24 | Multirate | 246 | 3432 | 91 | | | This work | 18 | 24 | IFIR +<br>Multirate | 187 | 300 | 506 | | implemented using an iterative architecture with five FIR filters of tap-lengths 41, 33, 27, 35, and 41, respectively in [14]. Hence, the coefficient memory requires 91 registers. With the iterative architecture, the delay-line of each octave bands cannot be shared and each of them requires 41 registers. Thus, AFB requires $6 \times 41 = 246$ data registers. To guarantee the same phase shift for each octave band, SFB in [14] requires a large amount of 3432 buffer registers. As Table V shows, the storage complexity of the proposed architecture is comparable to [11], [12] and extremely less than that of [14]. # IV. LOW-POWER VLSI IMPLEMENTATION One important issue in early stage of the system design is to decide the appropriate design parameters among possible design alternatives or design spaces. The design spaces usually involve multiple metrics of interest, such as timing, resource usage, power, and cost. In general, less functional units require higher clock rate and temporary storages or complicated control logic. Consider the silicon implementation in [14] as an example. By applying a single multiply-and-accumulate (MAC) unit, standard ANSI analysis filter bank was implemented in TSMC 130 nm CMOS technology and the chip operated at 6.13 MHz for real-time processing of 24 kHz data. However, 6.13 Fig. 12. Filter-oriented RPA data scheduling algorithm. MHz may be too high for hearing aid applications. And, the MAC unit occupies only approximately 25% of the chip area and consumes approximately 30% of the total power [14]. That is, the control logic and the storages are dominant, which may not be a good architecture for low-power VLSI [27]. ## A. Multi-MAC Architecture Instead of single MAC unit, consider a set of 25 parallel multipliers, which can perform up to 49-tap linear-phase FIR filtering calculation in one cycle. With 25 multipliers, the first delay line requires 5 cycles to complete filtering calculations for every sample, the second delay line requires 3 cycles for every two samples, and the third delay line requires 21 cycles for every four samples. Consequently, the number of cycles per sample to complete 18-band filtering process is $\lceil 5+3/2+21/4 \rceil=12$ . If data are well scheduled, there will be no stall cycle and the hardware can operate at 288 kHz for real-time processing of 24 kHz audio. Otherwise, a higher clock rate will be necessary. 1) Filter-Oriented RPA Algorithm: For simplicity, assume that within a clock cycle there would be one, and only one, sub-filter with the right to access the set of 25 multipliers. The efficient data scheduling algorithm can be derived by modifying the RPA in Fig. 11, called the filter-oriented RPA algorithm. As Fig. 12 shows, the algorithm leaves the calculations for the first two delay lines unchanged (i.e., the same as RPA in Fig. 11) and divides that for the third into five groups (i.e., $H_{12}$ - $H_{10}$ , $H_{9}$ - $H_{8}$ , $H_{7}$ - $H_{6}$ , $H_{5}$ - $H_{4}$ , and $H_{3}$ - $H_{1}$ .) and distributes them adequately for load balancing. Therefore, at most 12 cycles per sample are required to accomplish the filtering operations. Note that the unused multipliers in each cycle can be clock-gated for saving power. The second row of Table VI shows the implementation result of the proposed quasi-ANSI AFB, using filter-oriented RPA, in UMC 90 nm CMOS high-VT technology. Three 250 ms input sequences were used for power estimation: a female voice, male voice, and random signal. Synopsys PrimeTime suite and Nanosim were respectively applied to gate-level and circuit-level simulations to evaluate the power performance. The clock rate of the proposed quasi-ANSI AFB was 288 kHz and the power consumption was 91 $\mu \rm W$ . 2) DelayLine-Oriented RPA Algorithm: The filter-oriented RPA is comprehensible; however, the data fed into the set of TABLE VI SEVERAL HARDWARE IMPLEMENTATIONS OF AFB@UMC 90 NM CMOS HIGH-VT CELL LIBRARY | | Dynamic power (µW) | Total<br>power<br>(µW) | Area<br>(gate) | clock<br>(kHz) | |-------------------------------------|--------------------|------------------------|----------------|----------------| | Filter-oriented RPA<br>(25 MACs) | 41 | 91 | 46724 | 288 | | Delayline-oriented RPA<br>(25 MACs) | 31 | 84 | 49499 | 288 | | Multiplier-less<br>(45 Adders) | 65 | 137 | 82390 | 288 | | Re-implementation of [14] (one MAC) | 82 | 102 | 18764 | 6130 | 25 multipliers would switch over delay lines frequently. This might consume extra dynamic power. To address this issue, partition the set of 25 multipliers into three independent subsets, dedicated to three delay-lines (i.e., 9 multipliers for the first delay-line, 3 multipliers for the second, and 13 multipliers for the third). Then, with 9 multipliers, the 5 sub-filters $I_{A1}$ , $I_{A2}$ , and $H_{16}$ - $H_{18}$ of the first delay-line require 2+3+3+2+2=12 cycles to complete the filtering operations. Similarly, the second and the third delay-lines require 16 and 41 cycles, respectively, to complete the filtering operations. This is called the delay-line-oriented RPA algorithm. Note that $16 \le (2 \times 12)$ and $41 \le (4 \times 12)$ , which satisfies the real-time constraint. Consequently, at most 12 cycles per sample are required to accomplish the 18-band filtering operations. The third row of Table VI outlines the implementation result of the proposed AFB by applying delayline-oriented RPA. The clock rate was 288 kHz and the power consumption was 84 $\mu$ W. Note that switching over delay lines infrequently reduces the dynamic power to 31 $\mu$ W, comparing 41 $\mu$ W with the filter-oriented RPA. ### B. Adder-Based FIR Architecture Although the control logic is simple, the results in Section IV.A conclude that the allocation of 25 multipliers seems to be an overdesign. One efficient method to reduce the redundant operations is to apply multiple constant multiplications (MCMs) [28] or common sub-expression elimination (CSE) [29] method. [29] An efficient multiplier-less (i.e., Fig. 13. Hardware architecture of the proposed filter bank. adder-based) quantization framework for FIR filters was recently proposed in [30], which allows explicit tradeoffs between the hardware complexity and the quantization error to facilitate FIR filter design exploration. Simulation results reveal that the adder-based architecture saves approximately 43% redundant additions, compared with the direct implementation for each sub-filter. To achieve the same clock rate (i.e., 288 kHz), a chain of 45 adders are allocated. The fourth row of Table VI shows the implementation results of an adder-based 18-band AFB, which consumes 137 $\mu$ W. Both the chip area and power consumption are significantly worse than that of multi-MAC cases. This is because that the adder-based architecture usually accompanies an extreme increase in storage elements for temporary values [27]. Despite rather limited arithmetic units, the control logic of the adder-based filter bank is overly complicated, and requires many large multiplexers. This overrides the benefit of the reduced resource usage. For a fair comparison, we have re-implemented the result in [14] using the same CMOS technology (i.e., UMC 90 nm CMOS high-VT technology). The simulation results in Table VI show that the single-MAC architecture of [14] consumes 102 $\mu$ W. Despite a 60% increase in multiplicative complexity, the proposed quasi-ANSI AFB using multi-MAC architecture outperforms standard ANSI AFB by single-MAC unit. This result substantiates that the area-efficient architecture may not always lead to the best power performance without design space exploration. We note that, if single-MAC architecture, the clock rate of the proposed quasi-ANSI AFB increases up to 6.8 MHz and the power consumption is nearly 198 $\mu$ W. That is why these results do not appear in Table VI. ### C. The Optimized Low-Power Architecture The implementation results in Table VI show that the optimized hardware would be a compromise design consisiting of fewer, but enough, parallel multipliers, limited storage, and control logics. As described in Section III, the integral comparison ratios regarding the multiplicative complexity for three delay-lines are approximately 3:1:4, respectively. Because of possessing the least complexity, it is necessary to allocate one MAC unit for the second delay-line to serve filtering calculations. To guarantee adequate computer power preventing from stall or wait cycles, the number of MACs designated for the first and the third delay line, respectively, will be 3 and 4. With 3 multipliers, 33 cycles are required to complete filtering calculations for the first delay-line. The second and the third delay-lines require 52 and 125 cycles, respectively, to complete calculations with 1 and 4 multipliers, respectively. Note that $52 \le (2 \times 33)$ and $125 \le (4 \times 33)$ , which satisfies the timing constraint. The clock rate of the optimized AFB is designed to $33 \times 24$ kHz =792 kHz for real-time processing of 24 kHz audio. Fig. 13 shows the optimized hardware architecture of the proposed AFB, which consists of three modules: the system controller (sys\_ctrl), the register module (reg), and the filter engine (filter). The data word-length is fixed of 16 bits. In addition to clock (clk) and reset (rst) signals, the input and output have its own valid signal (i.e., in\_valid & out\_valid) to communicate with other sub-system (e.g., the ADC and noise reduction) in a hearing aid SoC. The system controller coordinates the data flow, according to the scheduling algorithm, and handles the input interface. The register module contains the coefficient memory and the data memory. The coefficient memory stores the 14 sub-filter coefficients, while the data memory maintains 3 separate delay-lines. The filter engine contains 3 independent sets (i.e., 3, 1, and 4, respectively) of MAC units, dedicated for three delay-lines. The optimized 10 ms 18-band quasi-ANSI AFB has been implemented in UMC 90 nm CMOS high-VT standard cell library. The chip has an area of approximately 33274 (2-input NAND) gates and operates at 792 kHz. For processing of 24 kHz audio, the power consumption is approximately 73 $\mu$ W, estimated using three 250 ms input sequences: the female voice, male voice, and random signal. Table VII summarizes the detail comparisons between different analysis filter banks, including the 10 ms 18-band critical-like AFB, the 3.44 ms 16-band critical-like AFB [12], the 78 ms 18-band standard ANSI AFB, and the proposed one. The stop-band attenuation of each filter bank is at least 60 dB. Note that the result of SFB is not included in Table VII because the filter banks in the literature [7]–[14] have all considered AFB only. Moreover, SFB in a hearing aid SoC will likely be merged with dynamic range compressors for further optimization [6], [26]. The comparison results in Table VII validate the effectiveness of the proposed quasi-ANSI AFB. Fig. 14. Simulation results by applying low-power design techniques (in terms of $\mu$ W). TABLE VII SILICON COMPARISONS BETWEEN DIFFERENT ANALYSIS BANKS | | Delay<br>(ms) | error (dB) | # mpy | fs<br>(kHz) | Vdd | CLK<br>(KHz) | Area<br>(gate) | P<br>(μW) | |-------------------------------------|---------------|------------|-------|-------------|-----------------|--------------|----------------|-----------| | 18 bands<br>(critical-like) | 10 | ~3.0 | 4320 | 24 | - | | | | | [12]<br>16 bands<br>(critical-like) | 3.44 | ~5.9 | 880 | 16 | 1.1V@<br>0.35μm | 960 | 27959 | 247.5 | | [14]<br>18 bands<br>(ANSI) | 78 | ~0.0 | 120 | 24 | 1V@<br>90nm | 6130 | 18764 | 102 | | 18 bands<br>(quasi-ANSI) | 9 | ~0.4 | 211 | 24 | 1V@<br>90nm | 792 | 33274 | 73 | The matching error in Table VII is examined by the case of moderate-to-severe hearing loss in Fig. 1. Recall from Figs. 3 and 7(b) that the matching error of standard ANSI filter bank is null, while the matching errors of the proposed quasi-ANSI filter bank and the 10-ms critical-like filter bank are approximately 0.4 dB and 3.0 dB, respectively. However, with only 110-tap (i.e., approximately 3.44 ms group delay) for each critical-like filter [12], simulation results reveal a rather large matching error of 5.9 dB. Instead of applying RAM and ROM, all the memory modules of this study are synthesized by 16-bit registers. Hence, the area and power dissipation of the chip are a little higher than that using standard RAM (for data) and ROM (for coefficients) cells. Furthermore, the coefficient memory can be further optimized because some consecutive 0's (for positive small values) or 1's (sign extension for negative values) are usually present in the 16-bit coefficients of a particular filter. For example, the estimated coefficient memory for a 110-tap, 16-band critical-like filter bank is 14080 bits. It was optimized in the design of [12] using 10615 bits memory instead. For further power reduction, consider the voltage scaling technique. Decreasing the supply voltage of the circuit naturally improves its power performance; however, the critical path increases as the supply voltage decreases. Because the long clock period (i.e., 792 kHz), the proposed AFB can operate at a lower supply voltage, such as 0.6 V, without violating the timing constraint. Simulation results show that voltage scaling decreases the power consumption of the proposed AFB to 27 $\mu W$ . Fig. 14 presents detailed information. We note here that for multi-V\_DD SoC, it does increase power planning complexity and require some voltage level shifters for interfacing across different power domains, which are not included in Fig. 14. ## V. CONCLUSION This study presents a low-delay, high-performance, and low-power filter bank design for advanced digital hearing aids. The standard ANSI S1.11 1/3-octave bank is rarely adopted in hearing aids because of its high computation complexity and rather large group delay, even though it has the advantage of good match to human hearing characteristics. This study proposes a 10-ms 18-band quasi-ANSI S1.11 1/3-octave filter bank with a slight relaxation the ANSI specification. The computation complexity is 226 MACs. The storage complexity is 187 registers for delay-line, 506 coefficients, and 300 buffer registers to meet linear-phase requirements. The proposed AFB was implemented in UMC 90 nm CMOS high-VT technology, and operated at 792 kHz for real-time processing of 24 kHz audio and consumed approximately 73 $\mu$ W with $V_{DD} = 1 \text{ V}$ supply voltage. The chip can also operate at a low voltage (0.6 V) without any performance degradation. The contributions of this study include the following: (1) a systematic framework for developing more appropriate quasi-ANSI specification of filters for hearing aids that are more easily implementable and realizable, as Section II shows; (2) a thorough design space exploration method that exploits multirate and IFIR techniques to construct a VLSI architecture that significantly reduces multiplicative complexity of the filter bank without increasing the latency unduly, as described in Section III; and (3) an efficient data scheduling algorithm and appropriate hardware resource allocation for the small chip area and ultra-low power implementation of the proposed filter bank, as Section IV shows. For business considerations, the detailed specifications of modern hearing aids are beyond disclosure, and it is difficult to compare them with the proposed filter bank. Nevertheless, we believe that, if NAL-NL1 or HSE prescription formula is applied, the proposed design is superior. ## REFERENCES - H. Dillon, Hearing Aids. New York: Thieme Medical Publisher, 2001. - [2] J. Katz, Handbook of Clinical Audiology, 5th ed. New Yorl: Lippincott Williams & Wilkins, 2001. - [3] J. M. Kates, Digital Hearing Aids. : Plural Publishing, 2008. - [4] D. Byrne, H. Dillon, T. Ching, R. Katsch, and G. Keidser, "NAL-NL1 procedure for fitting nonlinear hearing aids: characteristics and comparisons with other procedures," *J. Amer. Acad. Audiol.*, vol. 12, no. 1, pp. 37–54, Jan. 2001. - [5] J. H. Chang, K. S. Tsai, P. C. Li, and S. T. Young, "Computer-Aided simulation of multi-channel WDRC hearing aids," presented at the Proc. 17th Ann. Convention Expo Amer. Acad. Audiology, Washington, DC, 2005. - [6] C.-W. Wei et al., "A low-power mandarin-specific hearing aid chip," in Proc. IEEE Asian Solid-State Circuits Conference, Beijing, China, 2010, pp. 1–4. - [7] R. Brennan and T. Schneider, "A flexible filter bank structure for extensive signal manipulations in digital hearing aids," in *Proc. IEEE Int. Symp. Circuits Syst.*, 1998, pp. 569–572. - [8] H. Li, G. A. Jullien, V. S. Dimitrov, M. Ahmadi, and W. Miller, "A 2-digit multidimensional logarithmic number system filter bank for a digital hearing aid architecture," in *Proc. IEEE Int. Symp. Circuits Syst.*, Arizona, USA, 2002, pp. II-760–II-763. - [9] T. Lunner and J. Hellgren, "A digital filterbank hearing aid—Design, implementation and evaluation," in *Proc. ICASSP Conf.*, 1991, pp. 3661–3664. - [10] L. S. Nielsen and J. Sparso, "Designing asynchronous circuits for low power: An IFIR filter bank for a digital hearing aid," *Proc. IEEE*, vol. 87, no. 2, pp. 268–281, Feb. 1999. - [11] Y. Lian and Y. Wei, "A computationally efficient nonuniform FIR digital filter bank for hearing aids," *IEEE Tran. Circuits Syst. I, Reg. Papers*, vol. 52, no. 12, pp. 2754–2762, Dec. 2005. - [12] K. S. Chong, B. H. Gwee, and J. S. Chang, "A 16-channel low-power nonuniform spaced filter bank core for digital hearing aid," *IEEE Tran. Circuits Syst. I, Reg. Papers*, vol. 53, no. 9, pp. 853–857, Sep. 2006. - [13] Y. Wei and Y. Lian, "A 16-band nonuniform FIR digital filterbank for hearing aid," in *Proc. IEEE Biomed. Circuits Syst. Conf.*, 2006, pp. 186–189. - [14] Y. T. Kuo, T. J. Lin, Y. T. Li, and C. W. Liu, "Design & implementation of low-power ANSI S1.11 filter bank for digital hearing aids," *IEEE Tran. Circuits Syst. I, Reg. Papers*, vol. 57, no. 7, pp. 1684–1696, Jul. 2010. - [15] Specification for Octave-Band and Fractional-Octave-Band Analog and Digital Filters, ANSI Standard S1.11-2004. - [16] M. A. Stone and B. C. J. Moore, "Tolerable hearing-aid delays III—Effects on speech production and perception of across frequency variation in delay," *Ear and Hearing*, vol. 24, no. 2, pp. 175–183, 2003. - [17] M. A. Stone and B. C. J. Moore, "Tolerable hearing-aid delays IV—Effects on subjective disturbance during speech production by hearing-impaired subject," *Ear and Hearing*, vol. 26, no. 2, pp. 225–235, 2005. - [18] M. A. Stone, B. C. J. Moore, K. Meisenbacher, and R. Derleth, "Tolerable hearing-aid delays V—Estimation of limits for open canal fittings," *Ear and Hearing*, vol. 29, no. 4, pp. 601–617, 2008. - [19] R. Herbig and J. Chalupper, "Acceptable processing delay in digital hearing aids," *Hearing Review*, vol. 17, no. 1, pp. 28–31, 2010. - [20] [Online]. Available: http://hearing.siemens.com/fr/01-professional/audiologie/01-etudes-rapports-audiologiques/\_resources/ pdf/sat-sound-smoothing.pdf - [21] [Online]. Available: http://www.earinfo.com - [22] P. P. Vaidyanathan, Multirate Systems and Filter Banks. Englewood Cliffs, NJ: Prentice Hall, 1993. - [23] R. J. Zatorre, P. Belin, and V. B. Penhune, "Structure and function of auditory cortex: Music and speech," *Trends in Cognitive Sciences*, pp. 37–46, 2002. - [24] J. F. Kaiser, "Nonrecursive digital filter design using the I<sub>0</sub>-sinh window function," in *Proc. Int. Symp. Circuits Syst.*, 1974, pp. 20–23. [25] M. Vishwanath, "The recursive pyramid algorithm for the discrete - [25] M. Vishwanath, "The recursive pyramid algorithm for the discrete wavelet transform," *IEEE Trans. Signal Process.*, vol. 42, no. 3, pp. 673–676, Mar. 1994. - [26] K.-C. Chang, Y.-T. Kuo, and C.-W. Liu, "Low-complexity dynamic range compression for digital hearing aids," *IEEE Tran. Circuits Syst.*, submitted for publication. - [27] M. Mehendale and S. D. Sherlekar, VLSI Synthesis of DSP Kernels—Algorithmic and Architectural Transformations. Norwell, MA: Kluwer, 2001. - [28] M. Potkonjak, M. Srivastava, and A. Chandrakasan, "Multiple constant multiplications—Efficient and versatile framework and algorithms for exploring common sub-expression elimination," *IEEE Trans. Comput.-Aided Design*, vol. 15, pp. 151–165, Feb. 1996. - [29] R. I. Hartley, "Subexpression sharing in filters using canonic signed digit multipliers," *IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.*, vol. 43, pp. 677–688, Oct. 1996. - [30] Y. T. Kuo, T. J. Lin, and C. W. Liu, "Complexity-aware quantization and lightweight VLSI implementation of FIR filters," EURASIP J. Adv. Signal Processing, 2011 [Online]. Available: http://asp.eurasipjournals.com/content/pdf/1687-6180-2011-357906.pdf Chih-Wei Liu (M'03) was born in Taiwan. He received the B.S. and Ph.D. degrees, both in electrical engineering, from National Tsing Hua University, Hsinchu, Taiwan, in 1991 and 1999, respectively. From 1999 to 2000, he was an integrated circuits design engineer at the Electronics Research and Service Organization (ERSO) of Industrial Technology Research Institute (ITRI), Hsinchu, Taiwan. Then, near the end of 2000, he started to work for SoC Technology Center (STC) of ITRI as a project leader and eventually left ITRI at the end of Sept. 2003. He is currently with the Department of Electronics Engineering and the Institute of Electronics, National Chiao Tung University, Hsinchu, Taiwan, as an Associate Professor. His current research interests are SoC and VLSI system design, processor for embedded computing system, digital signal processing, digital communications, and coding theory. **Kuo-Chiang Chang** was born in Taiwan. He received the B.S. degree in electrical and control engineering from National Chiao-Tung University, Hsinchu, Taiwan, in 1998, where he is currently working toward the Ph.D. degree in electronics engineering. His research interests include SoC design, VLSI system design, and low-power digital signal processing. Ming-Hsun Chaung was born in Taiwan. He received the B.S. degree in engineering and system science from National Tsing Hua University, Hsinchu, Taiwan, in 2007 and the M.S. degree in electronics engineering from National Chiao Tung University, Taiwan, in 2010. He is currently an integrated circuit design engineer at the MStar Semiconductor, Inc., Hsinchu, Taiwan. His research interests include SoC and VLSI system design and digital signal processing. Chin-Hao Lin was born in Taiwan. He received the B.S. degree in engineering science from National Cheng Kung University, in 2010. He is currently a M.S. candidate in the Department of Electronics Engineering, National Chiao Tung University, Hsinchu, Taiwan. His research interests include low-power signal processing, digital hearing aid, and computer architecture.