CoE 197U Scaling

From Microlab Classes
Revision as of 00:36, 1 March 2021 by Louis Alarcon (talk | contribs)
Jump to navigation Jump to search

Moore's Law

In 1965, Gordon Moore published a 4-page paper entitled Cramming more components onto integrated circuits[1], where he predicted that the number of components in an integrated circuit will increase by a factor of two every year, as shown in Fig. 1. Note that he based his extrapolation on just 4 data points!

Figure 1: Gordon Moore's 1965 prediction[1].

Why is this paper and the graph in Fig. 1 important? Gordon Moore's prediction, also known as Moore's Law, has reflected and, more importantly, driven the steady and rapid progress in computing technology[2]. Thus, satisfying Moore's Law has become the goal instead of being merely a prediction.

Evolution of Complexity

As Gordon Moore predicted, the cost and performance advantage of putting more and more devices into a single integrated circuit (IC) led to the rapid increase in circuit complexity. One convenient indicator of circuit complexity is the number of transistors contained in a single IC, reaching the 1-billion transistor level in 2010, and achieving almost 50 billion transistors in 2020, as shown in Fig. 2.

Figure 2: Transistor Count (1970 - 2020)[3].

Challenges in Digital Design

Figure 3: Technology node and transistor gate length versus calendar year[4].

For more than 30 years, the development of integrated circuits continued roughly on the trajectory predicted by Moore's Law. This steady advance has relied heavily on the advances in device and fabrication technologies. Device dimensions are gone down by a factor of approximately 0.7 every generation, giving us approximately a 50% reduction in device area every 2 years, as seen in Fig. 3. However, in the past decade, we have seen a slowdown in the pace of development and performance of integrated circuits, seen in Fig. 6, as we face various physical and manufacturing limitations.

The Power Wall

Figure 4: Scaling and processor performance[5].

As MOS device sizes continue to scale, digital gate delays also decrease, leading to higher maximum operating frequencies. This results in higher average dynamic power consumption, |1}}

This gives rise to higher power densities or power per unit volume. Since the power is dissipated as heat, adequate cooling mechanisms must be put in place to prevent the temperature from rising to levels that can reduce the performance or cause permanent damage to the IC. In most applications, the additional cooling costs might not be worth the extra performance. Thus, instead of a steady increase as predicted by Moore's Law, clock frequencies are starting to level off, as seen in Fig. 4.

However, in the past decade, we have seen a slowdown in the pace of development and performance of integrated circuits, seen in Fig. 6, as we face various physical limits. As device sizes approach atomic-scales, the characteristics of the devices themselves have started to limit the performance of digital circuits, leading to larger leakage currents and variability. Further reductions in device dimensions requires more precise manufacturing processes with more stringent requirements, significantly increasing the fabrication costs.

As the speed of digital circuits increase at lower geometries, power consumed by the chip increases. This is due to the fact that the average dynamic power dissipated (as heat) by a digital gate is proportional to the frequency of operation and the square of the supply voltage, or equivalently,

 

 

 

 

(1)

The small chip size makes it difficult to remove the heat from the chip, but without removing this heat, it builds up to temperatures that can damage or even melt the device.

Frequency Scaling

One way to reduce the chip power dissipation and heat buildup is to reduce the operating frequency, as seen in Fig. 4, where the frequency curve starts to level off. Operating at lower frequencies limits the power consumption, but reduces the performance in relation to Moore's Law. To go around this power problem, other "more-than-Moore" technologies and strategies are employed, such as the increased use of parallelism, or the use of non-traditional devices.

Voltage Scaling

Another way to reduce the average power dissipated by an integrated circuit is to lower its supply voltage since, as evident in Eq. 1. Thus, scaling the supply voltage can be used to prevent dangerously high power densities, or power per unit volume. This power problem is a typical consideration when designing high performance devices such as servers, GPUs, or network switches. However, in moderate and low performance battery operated devices such as mobile phones, reducing the power is still critical since it determines how long a device can operate under battery power. This dichotomy of power regimes can be seen in Fig. 5.

The thickness of the gate-oxide layer, , is also scaled, along with the transistor width and length. In deep sub-micron devices, the gate-oxide layer can only be several atomic layers thick. Thus the dielectric breakdown voltage of the gate-oxide determines the maximum supply voltage that can be used. This reduction of supply voltage to accommodate the process technology and to control the power dissipation is shown in Fig. 5.

Figure 5: Supply voltage and power scaling[6].
Figure 6: Active and Leakage Power[7].
Figure 7: Supply and threshold voltage scaling[8].

Process Variations

Figure 8: Leakage and frequency variations[9].

Why Scale?

Figure 9: Semiconductor power density[10].
Figure 10: Calculations per second for a fixed cost[11].

The Cost of Integrated Circuits

Figure 11: A wafer showing the individual die[12].

Cost is the main motivation for reducing device sizes. Aside from the marked improvement in speed, more transistors can be integrated in the same die or wafer, and this results in the significant reduction in the cost per transistor. The total cost of producing an integrated circuit can be broken down into two components: (1) the fixed or non-recurrent engineering costs (NRE), and (2) the variable or recurrent costs.

Non-Recurrent Engineering (NRE) Costs
  • NRE costs are overhead costs independent of volume, i.e. the number of integrated circuits made or sold.
  • Examples of NRE costs include design time and manpower, CAD tools and equipment, mask generation costs, etc.
Recurrent Costs
  • Costs that are proportional to the volume of ICs produced.
  • This includes the fabrication costs, packaging costs, and testing costs.
  • In most cases, these costs are proportional the chip area.

Thus, the cost of an integrated circuit can be expressed as:

 

 

 

 

(1)

Where the components of the variable cost are:

 

 

 

 

(2)

Die Cost

The cost of a die or chip is dependent on the (1) cost of a wafer, shown in Fig. 11, (2) the number of chips or dies in a wafer, and (3) the die yield:

 

 

 

 

(3)

Note that increasing the number of dies in a wafer can reduce the cost per die. This can be done by increasing the size of the wafer, as long as the increase in cost due to the larger wafer size is offset by the reduction in overall cost due to more dies in a wafer, as seen in Fig. 12. However, increasing the wafer size can cause structural problems such as warping of the wafer, leading to increased device variability. One way to solve this problem is to increase the thickness of the wafer, as shown in Fig. 13.

Figure 12: The evolution of wafer size[13].
Figure 13: Wafer thickness[13].

Yield

The yield, , is defined as the ratio of good dies in a wafer relative to the total number of dies in a wafer, or equivalently in percent units:

 

 

 

 

(4)

We can estimate the total number of dies, each with area , in a wafer with diameter, as:

 

 

 

 

(5)

The first term in Eq. 5 gives us the total wafer area divided by the die area. However, as seen in Fig. 14, there are partial dies at the edge of the wafer that must be discounted since they cannot be used. The number of partial dies at the edge is estimated by the second term in Eq. 5. Note that by reducing the die area, the yield can significantly increase for the same wafer defect pattern, as shown in Fig. 15. Thus, for large die sizes, the die cost is higher since the yield is lower, thus the cost of the good dies must shoulder the cost of the bad dies.

Figure 14: Wafer defects due to fabrication non-idealities[14].
Figure 15: The effect of die size on yield[15].

References

  1. 1.0 1.1 Gordon E Moore, Cramming more components onto integrated circuits, Electronics, Volume 38, Number 8, April 19, 1965 (pdf)
  2. Gordon Moore: The Man Whose Name Means Progress, IEEE Spectrum, March 2015.
  3. Wikimedia
  4. S. E. Thompson, S. Parthasarathy, Moore's law: the future of Si microelectronics, Materials Today, Volume 9, Issue 6, 2006, Pages 20-25. (link)
  5. K. Rupp, 42 Years of Microprocessor Trend Data, link
  6. B. Kunert, Integration and Application of Epitaxial Systems: III/V on Silicon for Optoelectronics, IMEC Belgium 2015 (link)
  7. W. Haensch et al., Silicon CMOS devices beyond scaling, IBM Journal of Research and Development, vol. 50, no. 4.5, pp. 339-361, July 2006, doi: 10.1147/rd.504.0339.
  8. ITRS, The International Technology Roadmap for Semiconductors (2004 edition), 2004. Technical Report, http://public.itrs.net
  9. S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi and V. De, Parameter variations and impact on circuits and microarchitecture, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451), Anaheim, CA, USA, 2003, pp. 338-342, doi: 10.1145/775832.775920.
  10. Chen (IBM), ISS Europe 2007, (link).
  11. BCA Research (link).
  12. J. Dennis, CMOS Compatible Bulk Micromachining, IntechOpen 2013 (link)
  13. 13.0 13.1 P. Stallhofer, Why Are Silicon Wafers as Thick as They Are? In: Burghartz J. (eds) Ultra-thin Chip Technology and Applications, Springer 2011, New York, NY. (link)
  14. WikiChip (link)
  15. K. Choi's lecture notes on Design Metrics, CMPEN 411: VLSI Digital Circuits, Pennsylvania State University, Fall 2016 (link)