Moore’s Law is dead – Long live Moore’s Law
It is no longer a secret; it is becoming increasingly difficult to make transistors smaller. The number of transistors in a chip would, depending on interpretation, double every one to two years, a principle known since the mid-1960s as Moore’s Law, or Moore’s Law. That law is named after Gordon Moore, a co-founder of Intel. By making transistors smaller and smaller, a process known as scaling, more and more will fit on the same surface area and costs will decrease. The law has grown from an observation to a guideline for the entire semiconductor industry. However, it is proving increasingly difficult to continue to follow that guideline.
One of the most telling examples of scaling becoming increasingly difficult is seen in Intel’s tick-tock cadence. Previously, the company released a smaller processor architecture, known as a tick, every year and a half. The tock that followed brought a new microarchitecture, giving rise to Intel’s well-known tick-tock cycle. Since the 22nm generation, the pace has slowed down a bit and Haswell even introduced a Refresh, because the step to 14nm codenamed Broadwell turned out to be more difficult than expected. Also with the step from Skylake to CannonLake, from 14 to 10nm, an intermediate station has been announced, in the form of Kaby Lake.
Scaling is therefore becoming increasingly difficult, with increasing costs to make smaller transistors. Higher density thanks to smaller transistors theoretically produces cheaper chips, but as the number of steps to make them get bigger and the equipment required becomes more expensive, a balance has been reached somewhere. Some say we are getting closer to that point and increasing the transistor density simply by scaling is becoming less and less effective. That was the theme of the Imec Technology Forum, a two-day symposium at which semiconductor research institute Imec, and partners such as ASML and Intel presented the latest developments in semiconductor manufacturing. In this background article we review the main points of this symposium. In other words:
Traditional scaling
In recent years, scaling has continued to follow Moore’s Law, with smaller geometries yielding more transistors per wafer, more than offsetting the wafer’s higher manufacturing costs. The production of each smaller node costs more and more money; the equipment to enable smaller transistors is becoming more expensive and the number of production steps is increasing. The latter is partly caused by the need to implement increasingly difficult techniques to make smaller transistors. For example, strained silicon was used for the step to 90nm and the hkmg transistor was introduced for the 45nm node . Double patterning is used for 20/22nm transistorsand for smaller transistors, double-patterning techniques and finfets are used . With the latter technology, Intel was a generation ahead and it already applied finfets in the 20nm node, resulting in the finfet cost item a generation earlier.
All of this leads to higher costs. Until the 28nm node, the cost increases were always about 15 to 20 percent per node; with the step to 20nm, however, the extra costs already rose to almost 30 percent and to 14nm even more than 40 percent. The higher transistor density provided a cost-effective method for the step to 28nm, but the step to 20nm did not lead to cheaper transistors for the first time, and that also applies to the 14nm node. Incidentally, it is not only the production costs that are getting higher, the R&D costs are also rising to such an extent that fewer and fewer companies can afford them. The result is a consolidation of chip manufacturers, smaller companies that continue to produce on larger nodes and increasingly longer periods before new, smaller nodes become available.
For smaller nodes, the cost of scaling only increases. Finfets must again be used for the 10nm node, in combination with double or even triple patterning and self-aligned patterning. The delay of euv lithography, which would allow fewer patterning steps, also results in higher costs for the 10nm and 7nm nodes. For even smaller transistors, on the 7nm/5nm node, even five patterning steps and/or directed self assembly would be required. All those extra steps cost a lot of time and therefore money.
To illustrate: the switch to delayed EUV could shorten the so-called cycle time, the time a wafer spends in the factory, by twenty days. The number of masking steps , each good for a day and a half, would go from about 30 for 193nm immersion lithography to less than 10 for euv lithography. However, the lifespan of euv is limited: we are now at 14nm and 193nm will still be used for 10nm. The insertion point for euv would be the 7nm node, but euv wouldn’t go much further than 5nm.
So what is the solution? The benefits of further scaling are already outweighing the costs and for smaller nodes it will only get worse. In addition, ‘flat’ transistors on the 14nm node no longer work well and a switch must be made to more complex finfets. But even those can only last for a while; for smaller transistors other techniques are needed. Is making transistors smaller actually the only solution to make cheaper, more efficient and smaller chips, sensors and memories?
The salvation of memory?
There remains hope for the semiconductor industry to keep Moore’s Law intact: memory. Not only dram is reaching higher and higher densities, thanks to new techniques, but new types of memory are also being developed. For example, nand memory has risen with 3d nand from various manufacturers and Intel developed x-point, a memory type between dram and nand. Nand, in particular, follows the Moore’s Law curve nicely, and since it accounts for the vast majority of transistor production, the law seems safe.
In theory, the production of memory, with increasing densities thanks to 3D NAND, TLC memory and new techniques, saves Moore’s Law. For the vast majority of transistors produced for memory, the law still applies. For memory, we can therefore make some progress, thanks to the development of more and more layers of memory cells in a 3D structure. For logic chips such as processors, however, this stacking is much more difficult to achieve. Memory is very simple in structure and organization, and can therefore be stacked quite easily. This is not the case for processor circuits; the complexity and the heat produced by the high clock speeds make 3d structures difficult.
The package
Of course, the stacking of memory structures cannot go on indefinitely. To get more efficient, cheaper and smaller processors and devices, something more fundamental has to happen. Fortunately, before we get into the sub-10nm regions, there is another quick win to reduce power consumption and footprint .of electronics. In the packaging of the chip, the so-called package, an average of about thirty percent of the total energy of the processor or soc would be lost. This is partly due to the increased complexity and integration of chips; a soc must have inputs and outputs for memory, pcie lanes and usb, and a host of other i/o. They scale less than the silicon on the dies and lead to a disproportionate burden on the power and surface budget. For example, silicon components would have scaled three orders of magnitude in the time when the package was only scaled by a factor of three.
ne method to make the package more efficient could be better integration. Instead of a relatively large motherboard with chips soldered to it that are packaged in their own packages, more stacking or interposers could be used. We have already seen some tentative steps in that direction. Memory as nand is often stacked in a package, but often with the wiring outside the dies . A more advanced form is the connection of the dies via TSVs and nand manufacturers have now all arrived at 3D integration on the die. Another improvement that counteracts energy loss and enables more compact hardware is the use of interposers. AMD, among others, uses this for its video cards with HBM memory on an interposer with the ASIC. This makes shorter lines to the memory and therefore less energy loss, smaller products and higher speeds possible.
We have now had two pillars of the foundation under scaling: the production process, including euv, and architectural changes, with interposers and 3d structures. Incidentally, these are not the only improvements for both pillars. For example, plasma enhanced atomic layer deposition or peald can be used to apply atomic-thick layers of material to a substrate with great precision. And in the field of architecture, we will look at other techniques that are supposed to maintain Moore’s Law, but we will look at them in combination with the third pillar: materials.
Beyond finfets
In the history of integrated circuits, many innovations have been developed to enable smaller transistors. For example, techniques such as strained silicon, hafnium oxide and, most recently, finfets have been developed in recent years. It is almost self-evident that new architectural techniques for, for example, interconnects or the construction of a transistor are accompanied by new materials. For example, materials cannot always scale with production nodes; the insulating capacity of silicon oxide turned out to be insufficient for gates in the 45nm node, which is why hafnium oxide was used, together with metallic gates instead of silicon gates. This resulted in the well-known hkmg transistor generation. A fine example of new materials combined with a new architecture.
For example, scaling has always been an interplay of ways to build the transistors and other components, along with developing and deploying new materials, and so it will have to be for future scaling. The trend of ‘3D transistors’ with finfets will continue, initially with increasingly higher fins to achieve a larger surface area between gate and channel, but that will reach its limits. An interim solution for probably all the 7nm node and certainly at 5nm will be the use of nanowires for the channels. In addition, the gate can be located all the way around the channel, which allows for more surface area and therefore higher control voltages and higher speeds. Those nanowires could initially be horizontal, but to make 3D integration easier, they should be vertical in the future, standing, therefore, must be built. That could probably make nanowires usable up to 3nm.
Of course, more techniques are being developed; silicon nanowires can also scale only to a limited extent, and much beyond 5 and 3nm there is still little certainty about materials and architectures. There is a lot of searching though. For example, it is becoming increasingly clear that the traditional interconnect of copper has too high a resistance, which may be a reason to switch to materials such as cobalt. In any case, the interconnects, the metal traces that connect transistors electrically, need to be revised. With more and smaller transistors on a chip, more and more tracks are needed to connect them, but the space on the flat surface is of course limited. Also now, stacking the traces should provide solutions, together with, for example, cobalt and other improvements in the back-end. As an example: a 14nm chip has thirteen layers for wiring; there will be one or two more for each step.
To compensate for the lack of surface area, future chips will probably have to go up. By stacking different dies and giving the individual dies different layers, a lot of logic can be accommodated in a small area. For example, one or more v-nand dies can be applied to the bottom of a substrate, with an insulating layer in between and one or more layers of processor dies on top. This has the additional advantage that the memory can achieve higher speeds and lower latencies, and in addition, thanks to its simple, iterative structure, it is much easier to scale than processor dies. Finally, the various dies can be connected with each other with TSVs. All this would result in a reduction of more than fifty percent in the wiring of the SOC and a halving of the energy required for the interconnects.
There are plenty of other developments going on in the field of scaling. For example, it is being investigated whether other materials, of which so-called III-V semiconductors are the best known, function better on a small scale. Other ways of switching currents, such as tunnel-fets or spin-fets, are also being developed and tested. Work is even being done on DNA as a storage medium, but that is so slow that the practical applicability is still far in the future.
Interposers and scm
It should be clear that the era of ‘easy’ scaling is definitely over. Since two nodes, the production costs for smaller transistors have increased to such an extent that they can no longer compensate for the lower price per transistor due to higher density. This will only get worse with the development of smaller nodes. The required techniques are becoming more and more expensive, while the number of production steps is increasing, so it takes more time and more money to make smaller chips. It is no longer sufficient or even feasible to look for cost, performance and energy gains in simply smaller processes. You may find it a buzzword, but a holistic approach of materials, the production process and the architecture or structure of chips is needed. In the coming years, it is important to look for materials other than silicon and to tackle the interconnect. This must be accompanied by a new type of transistor, in whichgate all around , gaa for short, in the form of nanowires instead of finfets will represent the first major change in architecture.
Particularly in the packaging of the chips, there is still considerable profit to be made, with interposers having to bring the classic Von Neumann components such as processor and memory closer together. Interposers are literally and figuratively just an interim solution before chips become 3d integrated, with not only 3d nand but also 3d processors in a single package, connected to tsvs. However, this poses another problem: heat and energy consumption. To limit the latter, transistors are being developed that operate at increasingly lower voltages and interconnects with a low intrinsic resistance are being developed. Lower resistance means less loss in the circuit and lower voltages allow for more economical circuits. For example, transistors of InGaAs in gaa nanowires can be driven with 0.5V,
That doesn’t alter the fact that processors are struggling with a phenomenon called ‘dark silicon’. In order to stay within the heat and energy budgets, large parts of processors have to be switched off so as not to generate too much heat and consume power. As circuits become more complex, the proportion of this dark silicon increases. The impact of dark silicon is partly counteracted by finfets, dynamic voltage and frequency scaling , or lower speeds with lower voltages to limit the power, but around 7 or 5nm the share of dark silicon becomes so large that these techniques are not enough compensate more.
Again, there are a number of methods to squeeze better performance out of chips. In this way more memory can be provided, especially cache memory close to the cores provides performance gains, but with traditional architectures this is a costly affair, especially in terms of surface area. Sram is still the fastest memory we can make, but it’s too expensive for the larger caches; as many as six transistors are needed for a memory cell. For the larger caches, such as the L3 and L4 caches, mram could be a solution.
This magnetic memory is simple in structure, but fast. Like dram, only one transistor is needed per memory cell, but mram is much faster due to it not relying on capacitor charging, but on the much faster tunnel magnetoresistance effect. The fact that mram works on a magnetic principle, just like hard drives, has the added advantage of not being volatile. Bits therefore remain stored and do not have to be refreshed as with dram.
That same mram would therefore be a good candidate to replace the current dram, the second layer in the memory hierarchy. Traditionally, the third layer consisted of magnetic memory such as hard disks, but NAND, in the form of SSDs, has intervened for some time now. We are working on another intermediate layer, between working memory and fast storage memory. That would be a storage class memory between mram and 3d-nand and would have to be formed by phase change memory. That is fast, bit-readable and not volatile. Intel’s 3D crosspoint memory is a good example of this. In future computers, this would expand the memory hierarchy and make it faster, as in the image above.
Neural or Neumann?
This means that we are still left with very classical architectures; an input system supplies data to a data bus, a cpu calculates with the data and exchanges data with memory and finally a result is sent to an output. That principle is known as a Von Neumann architecture, and that is precisely what researchers and developers of future computer systems may want to move away from. After all, there are systems that are much more suitable for many tasks than classical architectures, namely brains. They are extremely good at pattern recognition and working in parallel. That is precisely where many workloads are increasingly moving and where a neural infrastructure would perform much better than a Von Neumann architecture.
f course, classic computers will continue to be developed and continue to exist, but more and more computer tasks are becoming visual. Think of the huge amount of photos and video that is sent daily to YouTube, Facebook and other services. Almost all of that visual information needs to be indexed, categorized and analyzed to make it searchable, but also to tag people you know on Facebook and to search for images or video via Google. The data from the increasing number of security cameras must also be analysed. Millions of video streams yield exabytes of data, an amount that is only increasing. The automated analysis and storage of only relevant images is a job for computers with advanced machine vision capabilities.
All these visual tasks can be performed much more efficiently by neural networks than by classical computer architectures. It makes little difference how that neural network is created, but an architecture that is built specifically for this purpose is always more efficient than an emulated system. That is one reason to develop computers that are modeled after neurons, in order to build neural networks that are better able to recognize patterns and process visual information. A large number of research institutes are working on this, not only for short-term applications as described above, but also to eventually simulate human brains and perhaps to realize artificial intelligence.
Exaflops, and then?
What are we supposed to do with all that computing power? The desktop, or laptop if you will, has barely gotten any faster for quite a few years now. Based on our daily use, also with smartphones and tablets, you could easily say that the stretch is a bit off and that faster is not really necessary. However, scaling is not only faster, but also more economical and there is still a lot to gain, especially there, with batteries that currently barely last a day. We are also becoming more demanding in terms of computing power. We often find a computer without an SSD unusable, so it has to be cheap nand, made possible thanks to scaling, new 3D architectures and new materials. Also new interfaces, such as voice in the form of Cortana, Google Voice, Alexa and Siri, are becoming increasingly important and cost a lot of computing power. Other new interfaces, such as Microsoft’s Hololens and virtual reality, not only require a lot of computing power, but also benefit from hardware that has become more economical thanks to scaling. Just think of the backpack with a PC that some manufacturers showed at the Computex to enable a ‘cable-free’ VR experience.
Our expectations make another large part of the computational power indispensable; we want everything fast, direct and without waiting. Whether that be ordering, uploading content, searching for anything or streaming, everything depends on the cloud. A buzzword pur sang according to many, but absolutely necessary for just about everything we do with our gadgets and the demand for computing power is only increasing; just think of the visual tasks of Google, Facebook and YouTube, among others, that eat up computing power.
Cars or robots?
Another task that is expected to take off in the coming years is traffic observation. According to various car manufacturers – not just Tesla, but just about every manufacturer is working hard on autonomous cars – the car is being completely renewed. It would become ‘the smartest robot in our lives’, according to Audi, and smart cars obviously need computing power. All sensors on a car must provide meaningful data that must be analyzed and shared. Moreover, the analysis must be done very quickly; your car can hardly think for three seconds when you drive on the highway at 120 km / h. Here too, there are plenty of visual compute tasks, from cameras that monitor the environment to lidar and radar images that need to be processed.
To illustrate: the same Audi that promises us car robots almost shudders at the costs involved. Currently, a good 30 percent of the production costs are already spent on electronics. That will be 35 percent around 2020 and no less than 50 percent in 2025. These are costs that the company cannot control. Audi is a car manufacturer, not a radar or soc maker, so that must be purchased. That is strange for a manufacturer that traditionally had all production of a car in-house, but illustrative of the changes in the car market.
Incidentally, a few things have to change if these smart, driving robots are to become the success that manufacturers and futurologists have in mind. Work is underway and progress is being made on the hardware side; radar is becoming more powerful and smaller, machine vision better and Tesla and Google, among others, prove that autonomous vehicles can drive safely. However, to enable intelligent traffic, all those cars must also be connected via a fast network with low latency, so that data can be exchanged quickly. This has also been overcome with the development of 5G data connections. In two years’ time, at the 2018 Winter Olympics, a 5G network of 1Gbit/s will be tested.
Then only security remains on the hardware and software side, both traffic and data security. However, the biggest challenge will be in people’s perceptions; the regulators must be convinced of the safety and the driver must relinquish control. Another obstacle is ownership; we like to have our own car. However, the investment in smart cars only really pays off when the majority of the cars drive around almost continuously and when autonomous Uber taxis transport people. For this, cars must be designed with a lifespan of more than 120,000 hours, for 22.5 hours a day for fifteen years, instead of a lifespan of 8,000 hours, good for 1.5 hours a day for fifteen years. For example, the polluting vehicle fleet can be reduced enormously and cars make optimal use of roads: road traffic jams.
To your health
A final major consumer of computing power will be healthcare, if it is up to GSK, Johnson & Johnson, Intel Life Sciences and of course Imec itself. For all kinds of fields, from processing data from clinical trialsto the search for new antibiotics against resistant bacteria, and the simulation of cells to test drugs and conduct research: they are all hungry for a lot of computing power, new ways of data processing and methods to handle the huge data flows. And we are not even talking about genetic screening, for example for personalized treatment methods against specific forms of cancer, instead of a blunt ax that causes a lot of damage to healthy tissue. The human genome is about 1TB in size and its rapid sequencing and analysis could save lives.
In short, Moore’s Law may be over fifty years old and has been reformulated several times, but it is still the guiding principle, or perhaps self-fulfilling prophecy, that propels the semiconductor industry forward. The years of easy scaling, where the geometry of transistors and chips could be reduced quite easily, are definitely over, however. It is becoming more and more difficult to make smaller transistors and especially to do so at a decreasing cost. The materials, the research, the equipment and the time it takes to produce: everything becomes more expensive and complex. Yet the industry believes that scaling not only should, but can continue, albeit in ways that are much more complex than before.
Scaling must become an interplay of all elements and parties involved, from architecture to materials and production method. Chemical companies, such as BASF, also work together with manufacturers to develop new materials and new methods for controlling them per atom. Because even though it seems that laptops have barely gotten faster for five years, the demand for more computing power, control of energy on both the generation and consumption side and new technology such as visual computing, machine vision, big data, cloud technology and smart cities will only increase in the coming years. Moore’s Law has made all that possible and will continue to do so.