

Moreover, the symmetry which exists between both complementary logic trees equilibrates the worst case delay of the outputs to the discharge of a chain of six transistors. The use of scaling techniques is advisable in the design stage, as this improves the gate speed and reduces the total channel area of the NMOS chain. Moreover, this CLA accepts the alternative charge compensation scheme derived from the domino logic based on PMOS transistors feedback from the output which solves the charge sharing and charge leakage problems. Fig. 2 shows the gate which generates the signals  $G_i$ ,  $P_i$ ,  $\bar{P}_i$  and  $N_i$  according to eqns. 2 and 6. This implementation uses the recurrence of the logic to reduce the device count. Fig. 3 shows the EXOR gate which generates the output sum according to eqn. 4.



Fig. 3 EXOR gate

The worst case propagation time of the output carry ( $C_4$ ,  $\bar{C}_4$ ) can be improved through the additional circuitry shown in the dashed lines of the CLA gate which is made up of 11 transistors. The PMOS transistor marked by \* is for avoiding the charge sharing problem. This circuit consists in a four-bit dynamic AND domino gate which turns on the two bypass transistors if all carry propagate signals are true. The overall speed in CLA four-bit parallel adders connected in series should be greatly reduced because all  $P_i$  are generated in parallel and are evaluated simultaneously in this circuitry.

Table 1: Comparison between conventional and proposed adder

| 4 bit                               |                                |                                |                 |
|-------------------------------------|--------------------------------|--------------------------------|-----------------|
|                                     | Conventional                   | Proposed                       | Reduction ratio |
| Area                                | $289 \times 315 \mu\text{m}^2$ | $251 \times 264 \mu\text{m}^2$ | 27.2%           |
| Number of transistors               | 209                            | 176                            | 15.8%           |
| Worst-case carry                    | 2.91ns                         | 2.58ns                         | 11.3%           |
| Worst-case sum                      | 3.70ns                         | 3.24ns                         | 12.4%           |
| Average power consumption (at 1MHz) | 3.81mW                         | 2.7mW                          | 29.1%           |

| 32 bit                              |              |          |                 |
|-------------------------------------|--------------|----------|-----------------|
|                                     | Conventional | Proposed | Reduction ratio |
| Worst-case carry                    | 18.87ns      | 15.43ns  | 18.2%           |
| Worst-case sum                      | 19.51ns      | 16.10ns  | 17.5%           |
| Average power consumption (at 1MHz) | 29.7mW       | 20.3mW   | 31.6%           |

Two four-bit CLA adders have been designed in a double-metal  $1.0 \mu\text{m}$  CMOS technology using the enhanced CLA in MODCVS proposed, and a conventional DCVS scheme with a MODL CLA to generate the  $C_i$ . In turn, these adders have been used to design 32 bit CLA adders. The simulation of these circuits has been carried out using an HSPICE circuit simulator at level six with  $V_{DD} = 5\text{V}$  and  $C_L = 0.2\text{pF}$  (Table 1). The input operands were  $A = 0xF$ ,  $\bar{A} = 0x0$ ,  $B = 0x0$  and  $\bar{B} = 0xF$  in the case of four-bit, and  $\bar{A} = 0xFFFFFFF$ ,  $A = 0x0000000$ ,  $B = 0x0000000$  and  $\bar{B} = 0xFFFFFFFF$  in the case of 32 bit, with input carry  $C_0 = 0$  and  $C_0 = 1$ , which corresponds to the worst case. The CLA adder proposed reduces the device count, wire length, fan-in and fan-out, conferring substantial advantages in terms of speed, area and power consumption over standard CLA in DCVS logic.

## References

- 1 CHU, L.K.M., and PUFFREY, D.L.: 'Design procedures for Differential Cascode Voltage Switch Circuits', *IEEE J. Solid-State Circuits*, 1986, **21**, (6), pp. 1082-1087
- 2 MENG, T.H.Y.: 'Synchronisation design for digital systems' (Kluwer, Norwood, MA, 1991)
- 3 HWANG, I.S., and FISHER, A.L.: 'Ultrafast compact 32-bit CMOS adders in multiple-output domino logic', *IEEE J. Solid-State Circuits*, 1989, **24**, (2), pp. 358-369

## Parallel pixel processing using programmable gate arrays

D.M. Budgett, P.E. Tang, J.H. Sharp, C.R. Chatwin, R.C.D. Young, R.K. Wang and B.F. Scott

*Indexing terms:* Programmable logic devices, Image processing, Parallel algorithms

A reconfigurable hardware design permits very fast feature extraction from high frame rate video images. By implementing parallel pixel processing paths in programmable gate arrays, a wide range of image processing algorithms can be implemented in realtime.

*Introduction:* Programmable gate arrays (PGAs) can be used to perform image analysis tasks at very high frame rates. Fast cameras output data at 16 million pixel/s. Using a digital signal processor, only simple real-time algorithms (e.g. thresholding) can be implemented at these data rates. However, PGAs can implement a wide range of algorithms at high speed by using parallel pixel processing paths.

A hybrid optical/digital correlator system is currently under construction at Glasgow University as part of the Brite EuRAM II project contract number BRE2-CT93-0542 [1]. Objects contained in the field of view of an input CCD camera are compared against a database of images stored in a holographic memory [2]. This system will be capable of computing the correlation between a  $512 \times 512$  pixel input image with a database of images at a rate of 2900 frame/s. Application areas include high speed quality control in manufacturing processes and real-time object identification and tracking.



Fig. 1 Energy distribution in correlation plane

a Sharp peak indicates good match between images

b Broad distribution indicates poor match between images

Each time a correlation between the input and a database image is performed, a correlation plane output image is produced. Fig. 1a illustrates the energy distribution in the correlation plane when

there is a good match between an object in the input scene and the object contained in a database image. The position of the object can also be obtained by locating the peak in the correlation image. A poor match is illustrated in Fig. 1b. For this application, the image processing task is to discriminate between good and poor correlations.

**PGA-based hardware:** The hardware architecture exploits the ability of PGAs to tackle the computationally demanding tasks which must be performed on every pixel in the image (e.g. summation, counting, comparisons, and logic operations). It is the ability to configure different parts of a PGA to compute multiple image features in parallel that distinguishes this design from conventional high performance image processing systems. Functions such as divisions, which are not required for every pixel, can be easily executed by a host central processing unit (CPU).



Fig. 2 Functional components of imaging system

The functional architecture is shown in Fig. 2. The main components are the PGA processors, host CPU, high speed memory and supporting interfaces. The division of the hardware into two computational groups greatly enhances the ability to implement more sophisticated real-time algorithms. During the period in which phase one processing is implemented on the incoming raw data for the current frame period, a second data processing stage is implemented using the image captured during the previous frame period together with the results from the phase one processing of that image.

Careful planning of the internal routing permits very fast logic and registers to be implemented using the 100 combinational logic blocks (CLBs) available in each Xilinx 3130 PGA device. This PGA-based system provides both the high speed performance advantages of dedicated hardware and the adaptability advantages from software solutions.

**Algorithm:** The first phase of processing determines a threshold level for the input image. It is necessary that this threshold be adaptable to compensate for variations in the light intensity of the correlation image which changes with different database image search strategies. The output from this process is a binary image where all pixels greater than the threshold are mapped to 1, and all others to 0. Pixels set to a value of 1 contribute to the correlation target. It is the assessment of the target pixels in the phase two processing which quantifies the correlation quality and location.

Table 1: Image parameters computed in parallel by PGAs

| Parameter | Description                                           |
|-----------|-------------------------------------------------------|
| $xMin$    | Minimum $X$ co-ordinate of the target bounding box    |
| $xMax$    | Maximum $X$ co-ordinate of the target bounding box    |
| $yMin$    | Minimum $Y$ co-ordinate of the target bounding box    |
| $yMax$    | Maximum $Y$ co-ordinate of the target bounding box    |
| $xSum$    | Summation of $X$ co-ordinate of all the target pixels |
| $ySum$    | Summation of $Y$ co-ordinate of all the target pixels |
| $nSize$   | Number of pixels set to 1 in the target               |

A more complicated algorithm is implemented in phase two where multiple correlation peak parameters (listed in Table 1) are derived. Four of the parameters specify the bounding box which contains all of the image pixels set to 1. The  $nSize$  parameter counts the total number of '1' pixels within the bounding box.

Table 2: Parameters computed by host CPU

| Parameter  | Description                                     | Formulae                                             |
|------------|-------------------------------------------------|------------------------------------------------------|
| $xGravity$ | Mean value of $X$ -position of the target       | $xGravity = ySum/nSize$                              |
| $yGravity$ | Mean value of $Y$ -position of the target       | $yGravity = ySum/nSize$                              |
| $nDensity$ | Concentration index of all pixels in the target | $nDensity = \frac{nSize}{((xMax-xMin)*(yMax-yMin))}$ |

These parameters are then used to estimate the density of the correlation peak within the bounding box. The  $xSum$  and  $ySum$  parameters are used to identify the peak location. These are all high speed operations where pixel data is processed at a rate of 40 MHz. Following the numerically intensive pixel by pixel phase two PGA processing, the final correlation parameters specified in Table 2 are calculated by six software computations performed by the host CPU.



Fig. 3 PGA layout for multiple feature extraction

Six classes of functions are implemented, counters (CNT), maximum and minimum registers (MAX,MIN), summations (SUM) and thresholding (BIN)

**Results:** The PGA design implemented for the grey-level image analysis used 52 CLBs. The arrangement of PGA functions for implementing phase two processing is shown in Fig. 3. This design used 92 CLBs which enables a single Xilinx 3130 PGA device to implement each phase. The performance of the system has been evaluated using a  $64 \times 64$  pixel array DALSA CA-D1-0064 operating at 2900 frame/s. The pixel data clock speed is 16MHz which determines the clock speed for phase one processing. The longest PGA task requires 103 $\mu$ s which is well within the 345 $\mu$ s camera frame period. The few computations performed by the host CPU add an additional 4 $\mu$ s to the algorithm execution time.

Table 3: Performance of  $nDensity$  parameter over test images

| Object | Reference |     |     |     |     |
|--------|-----------|-----|-----|-----|-----|
|        | R1        | R2  | R3  | R4  | R5  |
| R1     | 100       | 83  | 57  | 14  | 11  |
| R2     | 83        | 100 | 75  | 14  | 8   |
| R3     | 50        | 75  | 100 | 19  | 7   |
| R4     | 9         | 7   | 23  | 100 | 7   |
| R5     | 10        | 7   | 6   | 7   | 100 |

A set of 20 input scene images of a Rover cam shaft bearing cap component viewed at different angles was used to test the accuracy of the algorithm. The hybrid optical/digital correlator is still under construction, and the correlation plane images have been evaluated digitally. Five images labelled R1-R5 are used to illustrate the discrimination ability of the PGA-based algorithm in Table 3. Image R2 is separated from R1 and R3 by only a 5° inplane rotation. A more prominent altitude difference involving both inplane and out-of-plane rotations occurred between the remaining images in the test set. Excellent discrimination results were achieved, and the  $nDensity$  parameter was also capable of

quantifying the quality of the correlation. The algorithm has been tested on a larger image database and was found to be robust in the presence of realistic quantities of image noise.

**Summary:** Based on programmable gate array devices, this hardware-based technique satisfies requirements for fast response, complex feature extraction, and algorithm flexibility.

© IEE 1996  
*Electronics Letters Online No: 19961025*

3 June 1996

D.M. Budgett, J.H. Sharp, R.K. Wang and B.F. Scott (*Department of Mechanical Engineering, University of Glasgow, Glasgow G12 8QQ, United Kingdom*)

P.E. Tang (*Department of Control Engineering, National Chiao-Tung University, Hsin-Chu, Taiwan, Republic of China*)

C.R. Chatwin and R.C.D. Young (*School of Engineering, University of Sussex, Brighton, Sussex, United Kingdom*)

## References

- 1 BUDGETT, D.M., SHARP, J.H., TANG, P.C., and CHATWIN, C.R.: 'Implementation of a High Speed Optical/Digital Correlator System'. *Optique et Information, European Optical Society Topical Digest Series*, 1995, Vol. 6, p. 2.2
- 2 SHARP, J.H., BUDGETT, D.M., TANG, P.C., and CHATWIN, C.R.: 'Automated recording system for page oriented volume holographic memories', *Rev. Sci. Instrum.*, 1995, **66**, pp. 5174-5177

## Shielding effectiveness of a rectangular enclosure with a rectangular aperture

M.P. Robinson, J.D. Turner, D.W.P. Thomas, J.F. Dawson, M.D. Ganley, A.C. Marvin, S.J. Porter, T.M. Benson and C. Christopoulos

*Indexing terms:* Shielding, Electromagnetic compatibility

An analysis is presented of the shielding effectiveness (SE) of a rectangular enclosure with an aperture in one face. The enclosure is treated as a length of rectangular waveguide and the aperture as a length of coplanar strip transmission line. Theoretical values of SE agree with measurements for a range of enclosures, apertures and frequencies. The variation of SE with position in the enclosure is also correctly predicted.

**Introduction:** Shielding effectiveness (SE) is an important parameter that affects the electromagnetic compatibility of equipment. It is defined as the ratio of fields in the presence and absence of an enclosure. The SE of practical metal enclosures is determined mainly by their apertures.

A formula that is frequently quoted is Ott's equation  $SE = 20 \log_{10} \lambda/2l$ , where  $l$  is the longest dimension of the aperture and  $\lambda$  the wavelength in free space [1]. Experimentally, however, SE is found to depend also on the dimensions of the enclosure and the point at which SE is measured. More complicated calculations of SE in the literature generally either rely on numerical computation for each enclosure, or apply only to restricted frequency ranges [2-4].

We present here a new analytical formulation for a rectangular aperture in a rectangular box. We represent the box by a short-circuited length of rectangular waveguide (operating either above or below cut-off) and the aperture by a length of transmission line shorted at both ends.

**Theory:** Fig. 1 shows an enclosure with an aperture illuminated by a plane wave, and its equivalent circuit, where  $Z_0 = 377\Omega$  and  $k_0 = 2\pi/\lambda$ . The waveguide has characteristic impedance  $Z_g = Z_0 \sqrt{1 - (\lambda/2a)^2}$  and propagation constant  $k_g = k_0 \sqrt{1 - (\lambda/2a)^2}$ . The transition between free space and waveguide is represented by considering the aperture as a length of coplanar strip transmission line of total width  $b$  and separation  $w$ . Its characteristic impedance is given by Gupta *et al.* [5] as  $Z_{0s} = 120\pi K(k_e)/K'(k_e)$  where  $k_e = w/b$ .

The effective width  $w_e$  is

$$w_e = w - \frac{5t}{4\pi} \left( 1 + \ln \frac{4\pi w}{t} \right) \quad (1)$$

where  $t$  is the thickness of the enclosure wall. If  $0 < k_e < 1/\sqrt{2}$  (which is true for most practical apertures) then the following is accurate to three parts per million:

$$\frac{K(k_e)}{K'(k_e)} = \pi \left[ \ln \left( \frac{2 + \sqrt[4]{1 - k_e^2}}{1 - \sqrt[4]{1 - k_e^2}} \right) \right]^{-1} \quad (2)$$

To calculate SE, we first transform the short circuits at the ends of the aperture to an impedance  $Z_{ap}$  at  $A$ . It is necessary to include a factor  $l/a$  to account for the coupling between the aperture and the enclosure:

$$Z_{ap} = \frac{1}{2} \frac{l}{a} j Z_{0s} \tan \frac{k_0 l}{2} \quad (3)$$

Combining  $Z_0$ ,  $v_0$  and  $Z_{ap}$  gives a voltage  $v_0 = v_0 Z_{ap}/(Z_0 + Z_{ap})$  with a source impedance  $Z_1 = Z_0 Z_{ap}/(Z_0 + Z_{ap})$ . We transform  $v_1$ ,  $Z_1$  and the short circuit at the end of the waveguide to  $P$ , giving a voltage  $v_2$ , source impedance  $Z_2$  and load impedance  $Z_3$ :

$$v_2 = \frac{v_1}{\cos k_g p + j(Z_1/Z_g) \sin k_g p} \quad (4)$$

$$Z_2 = \frac{Z_1 + j Z_g \tan k_g p}{1 + j(Z_1/Z_g) \tan k_g p} \quad (5)$$

$$Z_3 = j Z_g \tan k_g (d - p) \quad (6)$$

The voltage at  $P$  is now  $v_p = v_2 Z_3/(Z_2 + Z_3)$ .

In the absence of the enclosure, the load impedance at  $P$  is simply  $Z_0$ , so the voltage at  $P$  is  $v'_p = v_0/2$ , and the shielding effectiveness is therefore  $SE = -20 \log_{10} |v_p/v'_p| = -20 \log_{10} |2v_p/v_0|$ .



Fig. 1 Rectangular box with aperture, and its equivalent circuit

**Measurements:** SE was measured in screened rooms with either a Bilog antenna (100-1000MHz), a log periodic antenna (200-1000MHz) or a stripline (1-300MHz). The field at various points in the enclosure was obtained with a short wire in the lid, coupled optically or electrically to a network analyser. The dimensions of the enclosures varied from 222 × 146 × 55 mm to 483 × 483 × 120mm;  $l$  varied from 80 to 200mm,  $w$  from 5 to 80mm.

**Results:** The new analytical formulation gives values within 10dB of measurements, and predicts enclosure resonances. Fig. 2 shows theoretical and measured values of SE at two points in a box with dimensions  $a = 300\text{mm}$ ,  $b = 120\text{mm}$ ,  $d = 300\text{mm}$ ,  $t = 1.5\text{mm}$  and