Intel and AMD discussed some of their most advanced chip designs at the International Solid State Circuits Conference this week, and they highlighted the role that advanced packaging plays in their future high end chip products. In both cases, the impressive new performance capabilities come from modular approaches that combine building blocks made at different fabs using different manufacturing processes. It illustrates the vast potential of chip packaging in the future of semiconductor innovation.
Intel’s target market for Ponte Vecchio is as a high performance module to be built into large datacenter systems. It is a graphics processing unit (GPU), and is designed for applications in artificial intelligence, machine learning, and computer graphics. It is named after the medieval stone bridge that connects the Piazza della Signoria on one side of the Arno River in Florence, Italy with the Pallazzo Pitti on the other side. One of the highlights of the design is how it connects a multitude of specialized chiplets – integrated circuit building blocks that are meant to be combined to make complete systems.
Ponte Vecchio uses eight “tiles” manufactured on Taiwan Semiconductor Manufacturing Company’s (TSMC) most advanced 5 nm process. Each tile has eight “Xe” cores, and each of the eight cores in turn has eight vector and eight specialized matrix engines. The tiles are placed on top of a “base tile,” which connect them to memory and the outside world with a giant switch fabric. This base tile is built using the company’s “Intel 7” process, which is a new name for the company’s enhanced 10 nm SuperFin manufacturing process. There is also a high performance memory system called “RAMBO,” which stands for Random Access Memory, Bandwidth Optimized, which was built on a base tile using Intel 7 Foveros interconnect technology. Lots of other building blocks are incorporated as well.
The Ponte Vecchio design is a case study in heterogeneous integration – combining 63 different tiles (47 that perform computing functions and 16 for thermal management) with a total of over 100 billion transistors in a single package that is 77.5 x 62.5 mm (approximate 3 x 2.5 inches). It wasn’t that long ago when that much computing power filled a warehouse and required its own connection to the electrical grid. The engineering challenges in such a design are plentiful:
Connecting all the parts. Designers need a way to move signals between all of the disparate chips. In the old days, this was done with wires or traces on printed circuit boards, and chips were attached by soldering them to the boards. But that ran out of steam long ago, as the number of signals and the speed increased. If you put everything into a single chip, you can connect them with metal traces in the back end of the manufacturing process. If you want to use multiple chips, that means you need a lot of connecting pins, and you want the connecting distances to be short. Intel uses two technologies to support this. The first is its “embedded multi-die interconnect bridge” (EMIB) which is made of a small sliver of silicon which can provide hundreds or thousands of connections at a time, and the second is its Foveros die-to-die stacking technology first used in its Lakefield mobile processor.
Making sure all the parts are synchronized. Once you connect lots of disparate pieces, you need to ensure that all of the parts can talk to each other in synchrony. This usually means distributing a timing signal known as a clock, so that all the chips can work in lockstep. This turns out not to be trivial, as signals tend to get skewed and the environment is very noisy, with lots of signals bouncing around. Each compute tile, for example, has more than 7,000 connections in a space of 40 square millimeters, so that’s a lot to keep in sync.
Managing heat. The modular tiles each require a lot of power, and delivering it uniformly across the whole surface while removing the heat that is generated is a huge challenge. Memory chips have been stacked for some time, but the heat that is generated is fairly uniformly distributed. Processor chips or tiles can have hot spots depending on how heavily they are being used, and managing heat in a 3D stack of chips is not easy. Intel used a metallization process for the back sides of chips, and integrated these with heat spreaders to handle the projected 600 watts produced by the Ponte Vecchio system.
Initial lab results that Intel reported included >45 Teraflops performance. The Aurora supercomputer being built at the Argonne National Laboratories will use more than 54,000 Ponte Vecchios along with more than 18,000 next-generation Xeon processors. Aurora has a targeted peak performance of over 2 Exaflops, which is 1,000 times more than a Teraflop machine. Back in the mid 1990s when I was in the supercomputer business, a one Teraflop machine was a $100 million science project.
AMD’s Zen 3
AMD talked about its Zen 3 second generation microprocessor core built on TSMC’s 7 nm process. This microprocessor core was designed to be used across AMD’s market segments, from low-power mobile devices, desktop computers, and all the way to its most powerful datacenter servers. The central tenet of this strategy was packaging its Zen 3 core with support functions as a “core complex” on a single chiplet, which served as modular building blocks much like Intel’s tiles. Thus they could package eight chiplets together for a high performance desktop or server, or four chiplets for a value system, like a cheap home system I might buy. AMD also stacks chips vertically by using what are called through-silicon vias (TSVs), a way of connecting multiple chips placed on top of each other. It could also combine two to eight of these chiplets with a server die made on a GlobalFoundries 12 nm process to make its 3rd generation EPYC server chips.
The great opportunity that Ponte Vecchio and Zen 3 highlight is the ability to mix and match chips made using different processes. In Intel’s case, this included parts made on both its own as well as TSMC’s most advanced processes. AMD could combine parts from TSMC and GlobalFoundries. A big of advantage of connecting smaller chiplets or tiles together rather than just build one big chip is that the smaller ones will have better manufacturing yields and therefore are less costly. You can also mix-and-match new chiplets with older proven ones that you know are good, or that are made on a less expensive process.
Both the AMD and Intel designs are technical tours de force. No doubt they represent a lot of hard work and learning, and represent huge investments of resources. But just as IBM introduced modular subsystems in its mainframe System/360 in the 1960s, and personal computers went modular in the 1980s, the modular partitioning of silicon microsystems as exemplified by these two designs and enabled by advanced chip packaging herald a significant technology shift. Granted many of the capabilities displayed here are still out of the reach of most start-ups, but we can imagine that when the technology become more accessible, it will unleash a wave of mix-and-match innovation.
Source: https://www.forbes.com/sites/willyshih/2022/02/22/intels-ponte-vecchio-and-amds-zen-3-show-the-promise-of-advanced-semiconductor-packaging-technology/