# Unit 2 MOS Inverters

# CMOS Design With Delay Constraints: *Design* for Performance

- The propagation delay equations on chart 4-5 can be rearranged to solve for W/L, as shown below, where we substituted  $C_{ox}\mu_n(W_n/L_n)$  for  $k_n$  and similarly for  $k_p$
- These equations can then be used to "size" a CMOS circuit to achieve a desired minimum rising or falling propagation delay assuming C<sub>load</sub> and other parameters are known
  - After determining the desired W/L values, we can obtain the device widths W based on the technology minimum design device lengths L
- Other constraints such as rise time/fall time or rise/fall symmetry may also need to be considered in addition to rise and fall delay

#### **Computing Intrinsic Transistor Capacitance**





- Intrinsic PN junction capacitance of the driving circuit must be added to the load capacitance C<sub>load</sub>
- Consider the inverter example at left:
  - Area and perimeter of the PMOS and NMOS transistors are calculated from the layout and inserted into the circuit model
    - NMOS drain area = W<sub>n</sub> x D<sub>drain</sub>
    - PMOS drain area =  $W_p \times D_{drain}$
    - NMOS drain perimeter = 2 (W<sub>n</sub> + D<sub>drain</sub>)
    - PMOS drain perimeter = 2 (W<sub>p</sub> + D<sub>drain</sub>)
- SPICE simulations were done (bottom left) for a fixed extrinsic load of 100fF with increasing transistor width (Wp/Wn = 2.75)
  - Results show diminishing returns beyond a certain Wn (say about 6 um) due to effect of the increasing drain capacitance on the overall capacitive load

#### Area x Delay Figure of Merit



- Increasing device width shows diminishing returns on propagation delay time (inverter circuit of chart 4-24)
- Define a figure of merit as area x delay for the inverter circuit
  - Increasing device width Wn shows a minimum in area x delay product
- Unconstrained increase in transistor width in order to improve circuit delay is often a poor tradeoff due to the high cost of silicon real estate on the wafer!!

#### Transistors in Series: CMOS NAND





- Several devices in series each with effective channel length L<sub>eff</sub> can be viewed as a single device of channel length equal to the combined channel lengths of the separate series devices
  - e.g. 3 input NAND: a single device of channel length equal to 3L<sub>eff</sub> could be used to model the behavior of three series devices each with L<sub>eff</sub> channel length, assuming there is no skew in the increasing gate voltage of the three N pull-down devices.
  - The source/drain junctions between the three devices essentially are assumed as simple zero resistance connections
  - During saturation transient, the bottom two devices will be in their linear region and only the top device will be pinched off.

## Delay Dependence on Input Rise/Fall Time

- For non-abrupt input signals, circuit delays show some dependency upon the input rise/fall time
  - Case a with input a and output x shows minimum rising and falling delays
  - Case b with input b and output y shows added delay due to the delay in getting the input to the switching voltage.
  - Empirical relationship to include input rise/fall time on output fall/rise delay are given:

 $t_{df} = [t_{df}^{2}(\text{step input}) + (t_{r}/2)^{2}]^{\frac{1}{2}}$ 

 $t_{dr} = [t_{dr}^{2}(\text{step input}) + (t_{f}/2)^{2}]^{\frac{1}{2}}$ 

• For CMOS the affect of input rise(fall) time on the output fall(rise) time will be less severe than the impact on the falling(rising) delay.



#### **Bootstrapping Effect on Inverter Delay**



- Gate-to-drain capacitance Cgd in a CMOS inverter (or other MOS logic ckt) causes feedback of the transient signal from the output to the input gate
  - called *Bootstrapping* or *Miller Effect*
  - as input rises and output falls, Cgd couples back a portion of output transient to the input, thus slowing the input rising waveform
  - SPICE simulation at left shows impact on input node 'a' due to a 0.05pF bootstrap capacitor versus no impact on input node 'c' inverter with no bootstrap capacitor
  - Small effect in most small inverters and logic circuits
- Voltage doubling circuits and certain large swing drivers use intentionally designed bootstrap capacitors to provide overdrive to the gate of pullup devices

#### Modeling Parasitic Capacitances: 4 input NAND



- Capacitances Cab, Cbc, Ccd exist at internal nodes of series-connected devices and add to delay of circuit
  - must be discharged to ground along with Cout through N1, N2, N3 series devices when all inputs go high
  - must be charged through N2, N3, N4 and P1 when input D goes low
- Modeling approaches (Simple RC Delay):

(Rn1 + Rn2 + Rn3 + Rn4) x (Cout + Cab + Cbc + Ccd)

- not very accurate
- Modeling approaches (Elmore ladder delay):

Rn1 Ccd + (Rn1 + Rn2) Cbc + (Rn1 + Rn2 + Rn3) Cab + (Rn1 + Rn2 + Rn3 + Rn4) Cout

- more accurate
- Penfield-Rubenstein Slope Delay Model factors the input rise (fall) time into the above

#### Effect of Loading Capacitance on Gate Delay





- Delay equations are often written to factor the impact of the fan-out and load capacitance to the circuit delay
- $t_{d} = t_{d\_intrinsic} + (k1 \times C_{L}) + (k2 \times FO)$ 
  - where CL is the load capacitance,
    FO is the fan-out, and td\_intrinsic
    is the unloaded delay of the circuit
- Tables of delay versus load condition are built up from simulation models and used for path delay prediction.

#### Body Effect on Delay: 4 input NAND





- In a logic gate with devices in series causing source voltages above ground (for a NAND) or below Vdd (for a NOR), the circuit response is slowed due to the body effect on increasing threshold voltage Vtn (or |Vtp|).
  - If only the bottom series N device is switched, nodes ab, bc, and cd are sitting at Vdd – Vtn prior to the switching
  - Each node must be discharged to ground successively prior to discharging Cout through the 4 series N devices
    - See figure at left



#### **CMOS Ring Oscillator Circuit**



- An odd number of inverter circuits connected serially with output brought back to input will be astable and can be used an an oscillator (called a ring oscillator)
- Ring oscillators are typically used to characterize a new technology as to its intrinsic device performance
- Frequency and stage are related as follows:
  f = 1/T = 1/(2nτ<sub>P</sub>)

where n is the number of stages and  $\tau_{\text{P}}$  is the stage delay



V<sub>50%</sub>-

VOL

#### **CMOS Gate Transistor Sizing**





- Symmetrical inverter design (case a):
  - P mobility =  $\frac{1}{2}$  x N mobility
  - Wp = 2 x Wn
  - Input gate capacitance = 3 x Ceq where Ceq is the pull-down device gate capac.

Pair delay = tfall + trise = R3Ceq + 2(R/2)3Ceq = 6RCeq

Non-symmetrical inverter design (case b):

- Wp = Wn
- Input gate capacitance = 2 x Ceq

Pair delay = tfall + trise = R2Ceq + 2R2Ceq = 6RCeq

In the simple case where the load is comprised mainly of input gate capacitance no impact to the total delay of the pair of inverters was observed by using nonsymmetrical Wn=Wp

#### Driving Large Capacitive Loads: Stage Ratio



- For driving large load capacitance  $C_L$ , can use N buffer drivers in series, each with stage ratio  $C_{out}/C_{in} = a$ 
  - Input capacitance Cg
  - Delay per stage = at<sub>d</sub> given that the delay of a minimum size stage driving another minimum size stage is t<sub>d</sub>
  - Let  $\mathbf{R} = \mathbf{C}_{\mathrm{L}} / \mathbf{C}_{\mathrm{g}} = \mathbf{a}^{\mathrm{N}}$
  - Then the total stage delay is given by
    Total Delay = Nat<sub>d</sub> = at<sub>d</sub>(In R/ In a)
  - Setting derivative of total delay w/r a equal to zero yields optimum stage ratio
    a = e
- If we allow inclusion of inverter output drain capacitance term in the analysis, the optimum stage ratio is given by  $a_{opt} = e^{(k + aopt)/aopt}$  where  $k = C_{drain}/C_{gate}$

#### Increasing Importance of Interconnect Delay

- IC's are going to 6-7 levels of metal interconnect in advanced technologies
- Chart at bottom left shows typical distribution of wire length on a processor chip or an ASIC



- As feature size drops, interconnect delay often exceeds gate delay
  - Chart below shows that for very long wires, interconnect delay has exceeded gate delay above 1um feature size
- Interconnect delay is becoming the most serious performance problem to be solved in future IC design



#### Interconnect Delay with Inductive Effects



 For the design of critical performance nets (such as clock distribution) on a processor chip, inductance must be taken into consideration

- Simulation result in (b) shows the effect of ringing on a rising transition due to reflections at a discontinuity on an inductive net
  - Additional delay due to settling time is incurred if such ringing can not be eliminated by proper transmission line design techniques

#### Power Dissipation in a CMOS Inverter: Summary



For complementary CMOS circuits where no dc current flows, average dynamic power is given by

#### $P_{ave} = C_L V_{DD}^2 f$

where  $C_L$  represents the total load capacitance,  $V_{DD}$  is the power supply, and f is the frequency of the signal transition

- above formula applies to a simple CMOS inverter or to complex, combinational CMOS logic
- applies only to dynamic (capacitive) power
- dc power and/or short-circuit power must be computed separately

#### Average Dynamic Power in CMOS Inverter



- Average dynamic power derivation:
  - On negative going input, pull-up device charges the load capacitance. On positive going input, pull-down device discharges the load into ground.
  - Average power given by
  - $\begin{aligned} \mathbf{P}_{\text{ave}} &= (\mathbf{1}/\mathsf{T}) f \mathbf{C}_{\text{L}} \left( d \mathbf{v}_{\text{out}} / d \mathbf{t} \right) \left( \mathbf{V}_{\text{dd}} \mathbf{v}_{\text{out}} \right) d \mathbf{t} + \\ & (\mathbf{1}/\mathsf{T}) f (-\mathbf{1}) \mathbf{C}_{\text{L}} \left( d \mathbf{v}_{\text{out}} / d \mathbf{t} \right) \mathbf{v}_{\text{out}} d \mathbf{t} \quad \text{where} \\ & \text{the first integral is taken from 0 to T/2} \\ & \text{and the second integral is from T/2 to T} \end{aligned}$
  - completion of the integral yields

 $P_{ave} = C_L V_{dd}^2 f$  where f = 1/T

 Note that the dynamic power is independent of the typical device parameters, but is simply a function of power supply, load capacitance and frequency of the switching!

#### **CMOS Short-Circuit Power Dissipation**



- The total power in a CMOS circuit is given by  $P_{total} = P_d + P_{sc} + P_s$  where Pd is the dynamic average power (previous chart), Psc is the short circuit power, and Ps is the static power due to ratio circuit current, junction leakage, and subthreshold loff leakage current
- Short circuit current flows during the brief transient when the pull down and pull up devices both conduct at the same time where one (or both) of the devices are in saturation
- For a balanced CMOS inverter with βn=βp, and Vtn = |Vtp|, the short circuit power can be expressed by

 $P_{sc} = (\beta/12)(V_{dd} - 2V_t)^3 (t_{rf}/t_p)$ where t<sub>p</sub> is the period of the input waveform and t<sub>rf</sub> is the total risetime (or falltime) tr = t<sub>f</sub> = t<sub>rf</sub>

#### Power Meter for use in SPICE Simulation



- Add a zero value voltage source Vs in series with V<sub>DD</sub> and circuit in question
  - i<sub>s</sub> is the current through Vs
- Add current source βi<sub>s</sub>, resistor Ry, and capacitor Cy in parallel, as shown
- Integrating the current in the power circuit  $C_y(dV_y/dt) = \beta i_s V_y/R_y$  yields the solution

 $V_y(T) = (V_{DD}/T) \int_0^T i_{DD}(\tau) d\tau$ where β = V<sub>DD</sub>C<sub>y</sub>/T

 V<sub>y</sub>(T) will be the average power dissipated over the period T and can be plotted or printed out during the SPICE simulation

#### **Charge Sharing Principle**

- At time t=0-, switch is open and each capacitor contains some initial charge
- At time t=0+, the switch is closed and the charge redistributes across both capacitors
- Conserve the total charge:
  - Sum up initial charge Qt = Qb + Qs = CbVb + CsVs
  - Final charge is given by Qt = (Cb + Cs)Vf
  - Therefore, Vf = (CbVb + CsVs)/(Cb + Cs)
- If Vb = Vdd and Vs = 0, then

Vf = Vdd Cb/(Cb + Cs) (which is similar to the equation for a resistor divider)

• Charge sharing plays an important role in many dynamic circuits, especially pulsed DOMINO and NORA logic as well as in DRAM operation.



#### **Process Variation: Normal Distribution**



- CMOS and other MOSFET circuit design requires designing around tolerances in the technology and process, the supply voltage Vdd, and the temperature.
  - Process parameter distributions are typically normal (Gaussian) where operation out to the 3 sigma point is usually a requirement
  - Statistical models are often derived with Gaussian or log normal distributions for each process parameter such as Tox, Xj, Vt, W, L, and the various mask dimensional images
  - Rejecting product outside the +/- 3 sigma limits only excludes 0.3% of the product
  - Power supply and temperature are normally given uniform distributions
- Definition of the design space involves identifying those corners of the multidimensional space where critical circuit performance, power, and operability exist

#### Definition of Design Window Corners

- Worst Case Design Methodology:
  - Identify corners of the design space where the circuit is slowest, or power is highest, or circuit ratio effects are critical
- Slow Circuit:
  - Vdd is low (say 10%), temperature is high, n and p transistors are slow caused by thick tox, high Vt, long L, and narrow W
- Fast Circuit/High Power:
  - Vdd is high, temperature is low, n and p transistors are fast
- Ratio circuit down level worst case:
  - Vdd is high, temperature is low, n device is slow, p device is fast

| <b>TABLE 4.11</b> | <b>CMOS Digital System Checks (Commerc</b> |             |                                                                            | ial)       |
|-------------------|--------------------------------------------|-------------|----------------------------------------------------------------------------|------------|
| PROCESS           | TEMP                                       | VOLTAGE     | TESTS                                                                      |            |
| Fast-n/fast-p     | 0°C                                        | 5.5V (3.6V) | Power dissipation (DC),<br>races, hold time constra                        |            |
| Slow-n/slow-p     | 125°C                                      | 4.5V (3.0V) | Circuit speed, setup time constraints                                      | e          |
| Slow-n/fast-p     | 0°C                                        | 5.5V (3.6V) | Pseudo-nMOS noise ma<br>level shifters, memory v<br>read, ratioed circuits | <b>U</b> , |
| Fast-n/slow-p     | 0°C                                        | 5.5V (3.6V) | Memories, ratioed circus<br>shifters                                       | ts, leve   |

## MOSFET Device Technology Scaling

| PARAMETER                            | SCALING MODEL       |                     |              |
|--------------------------------------|---------------------|---------------------|--------------|
|                                      | Constant field      | Constant<br>voltage | Lateral      |
| Length (L)                           | 1/α                 | 1/α                 | 1/α          |
| Width (W)                            | 1/α                 | 1/α                 | 1            |
| Supply voltage (V)                   | $1/\alpha$          | 1                   | 1            |
| Gate-oxide thickness $(t_{ox})$      | $1/\alpha$          | 1/α                 | 1            |
| Current $(I = (W/L)(1/t_{ox})V^2)$   | 1/α                 | α                   | α            |
| Transconductance (gm)                | 1                   | α                   | α            |
| Junction depth $(X_i)$               | $1/\alpha$          | 1/α                 | 1            |
| Substrate doping $(N_A)$             | ά                   | α                   | 1            |
| Electric Field across gate oxide (E) | 1                   | α                   | 1            |
| Depletion layer thickness (d)        | $1/\alpha$          | 1/α                 | 1            |
| Load Capacitance ( $C = WL/t_{ox}$ ) | 1/α                 | 1/α                 | 1/α          |
| Gate Delay (VC/I)                    | 1/α                 | $1/\alpha^2$        | $1/\alpha^2$ |
|                                      | RESULTANT INFLUENCE |                     |              |
| DC power dissipation $(P_s)$         | $1/\alpha^2$        | α                   | α            |
| Dynamic power dissipation $(P_d)$    | $1/\alpha^2$        | α                   | α            |
| Power-delay product                  | $1/\alpha^3$        | 1/α                 | 1/α          |
| Gate Area $(A = WL)$                 | $1/\alpha^2$        | $1/\alpha^2$        | 1/α          |
| Power Density (VI/A)                 | <sup></sup> 1       | $\alpha^3$          | $\alpha^2$   |
| Current Density                      | α                   | $\alpha^3$          | $\alpha^2$   |

- Bob Dennard of IBM Watson Research Labs developed scaling theory for reducing device dimensions, power supply voltage and junction depths, while maintaining roughly constant electric fields
- Scaling theory is the basis for the SIA's NTRS (National Technology Roadmap for Semiconductors) which has been the roadmap for the industry for many technology generations
  - Moore's Law (Gordon Moore of Intel) has quantified the reduction in dimensions and increase in density and performance
    - 4X increase in DRAM and logic density every generation (2-3 years)
    - 2X increase in logic device performance every generation (2-3 yrs)