Intel’s new Kaby Lake processors – Same architecture, but slightly faster

By Faryad On Sep 2, 2016

Spread the love

The tick-tock cadence that Intel managed to maintain for years, in which a new processor architecture, or tock, was followed by a reduction in the production process, or tick, has proved unfeasible for several generations. Instead, we already saw a second or refresh generation with 22nm Haswell processors, while a die-shrink to 14nm was being worked on. It came with Broadwell processors, Intel’s first 14nm generation, in September 2014. The 14nm tock, Skylake, came eleven months later and brought improvements to the microarchitecture of the processors.

With the successor to Skylake, there will be no new architecture, but a refresh of the 14nm generation. Kaby Lake has an almost unchanged microarchitecture and is still produced at 14nm. Still, Kaby Lake is just a little more than just a refresh of Skylake and that’s why it gets its own code name instead of simply ‘Skylake Refresh’. However, we can see Kaby Lake largely as a refresh of Skylake and we treat it that way. We list the improvements and adjustments compared to Skylake and review the new features.

Introduction: step by step

The first products of the Kaby Lake generation are, as with the Broadwell introduction, only suitable for mobile use. The desktop market is still shrinking and the biggest gains in energy consumption and integration can be made in the mobile market. The Kaby Lake processors are being developed for very thin laptops, 2-in-1 devices and small, compact PCs, such as nucs and compute sticks. The first series of Kaby Lake processors therefore consists of six laptop processors: three from the U series and three from the Y series. This allows Intel to operate laptops with processors with TDPs from 4.5 to 15W, and all processors are dual-cores with hyperthreading support and, of course, a built-in GPU. The first products with these seventh generation Core processors should be available in September;

The six processors have the following specifications, no details have yet been released about the GPUs:

Y series	Core m3-7Y30	Core i5-7Y54	Core i7-7Y75
cores/threads	2/4	2/4	2/4
Clock speed	1GHz	1.2GHz	1.3GHz
Turbo speed	2.6GHz	3.2GHz	3.6GHz
Memory	2x ddr3l 1600MHz 2x lpddr3 1866MHz	2x ddr3l 1600MHz 2x lpddr3 1866MHz	2x ddr3l 1600MHz 2x lpddr3 1866MHz
Price	281 dollars	281 dollars	393 dollars

U series	Core i3-7100U	Core i5-7200U	Core i7-7500U
cores/threads	2/4	2/4	2/4
Clock speed	2.4GHz	2.5GHz	2.7GHz
Turbo speed	2.4GHz	3.1GHz	3.5GHz
Memory	2x ddr3l 1600MHz 2x lpddr3 1866MHz 2x ddr4 2133MHz	2x ddr3l 1600MHz 2x lpddr3 1866MHz 2x ddr4 2133MHz	2x ddr3l 1600MHz 2x lpddr3 1866MHz 2x ddr4 2133MHz
Price	281 dollars	281 dollars	393 dollars

In January, no doubt with CES in Las Vegas as a backdrop, the next batch of Kaby Lake generation processors will follow. Intel will then not only announce desktop processors, but also the high-end SKUs for laptops, such as the HQ-series quadcores and the H-series. Overclockable K processors and models for business applications, such as Xeons for servers and workstations, will also follow.

Enhancements: 4k

One of the biggest and, in terms of processor architecture, one of the few improvements is the updated media engine . The gpu itself is virtually unchanged and based on the Gen9 architecture, which was introduced with Skylake. So the execution units , divided into slices and subslices, perform the graphical calculation tasks. They are generic processing units and for specific tasks it is more efficient to build separate blocks that can perform only that task. Such DSPs are more economical and have the additional advantage that most of the GPU can remain switched off. Intel has such dsp blocks in its processors to decode video, for example. When watching a movie, video can be decoded with a hardware decode engine . The rest of the GPU or CPU then does not have to decode this in software, which would cost more calculations and therefore energy.

The decode engine in Kaby Lake has been updated to support new codecs for displaying video. The processors could of course already display 4k, but now there is support for 4k video material encoded with the hevc-10bit or vp9 codecs. The former HEVC codec is also known as h265 and is used for Blu-rays and Netflix streams, among other things. YouTube, among others, uses the vp9 codec for its 4k streams.

In addition to the decode engine for 4k HEVC and VP9 video, the processing engine and display enginevideo-enabled for more efficient processing of video streams. The result of the adjustments is the possibility to watch 4k video with a much lower energy consumption. In demonstrations, Intel showed a sixth-generation Core laptop showing about 50 percent CPU load while playing 4k video, while a seventh-generation Core laptop showed only five to 10 percent CPU load. This should lead to laptops that can stream more than nine hours of 4k HEVC content. Vp9 content from YouTube could be played 1.75 times as long. A test system could play four hours of video without vp9 hardware support, while a Kaby Lake system would get seven hours. A laptop with a Core i7-6500U was compared to a laptop with a Core i7-7500U, both with a 4k screen and a 66Wh battery.

In addition to decoding, the Kaby Lake GPU is also suitable for encoding video. 4k can be transcoded through Intel’s Quick Sync functionality. You can choose from two settings: a fast FF mode that uses the fixed function or dsp, and a setting that offers better quality and uses the gpu. The latter is obviously more energy intensive.

The enhanced media block consists of three fixed function media engines: the multi-format codec or mfx, the video quality engine or vqe and the scaler and format converter or sfc. In the mfx, the encoding and decoding functionality for hevc 8- and 10bit, and vp9 have been added, and the support for wireless display has been improved. Also here are the improved fixed function mode for quicksync and the better performing oavc-decode functionaccommodated. Support for HDR has been added in the vqe. Finally, the media engine must be able to display up to eight 4k30p avc or hevc streams simultaneously and for higher quality 4k60p streams up to 120Mbit/s are supported. A 4kp30 steam can be transcoded on a Y-series processor at twice the real-time rate, provided it converts from avc to avc, transcoding from avc to hevc is done in real time. On the U-series processors, that happens at three times and twice the real-time playback speed, respectively.

The 14nm+ process

It is not unusual that during the life of a technology node, in this case the 14nm node, increasingly better chips with increasingly higher yields are baked. This is also the case with the now third generation 14nm processors. After all, Intel could already practice with Broadwell and Skylake. According to the manufacturer, the process has now been developed so far that Intel speaks of a 14nm+ process, or 14Plus. However, the company is not very specific about the improvements and adjustments that this entails; it remains vague.

One of the improvements Intel claims for 14nm+ is an improved fin design, which is said to increase performance. From 22nm processors, Intel uses finfets, or transistors with one or more high combs for the gates. The larger the area of that gate, the better performance of the transistors is possible. The fins in Kaby Lake transistors are taller and thus have a larger surface area. The voltage or strain in the silicon of the channel, which ensures better mobility of the electrons and therefore better conductivity, has also been increased. Finally, the design of the processors has gone hand in hand with improvements in the manufacturing process. According to Intel, this can lead to a twelve percent increase in performance.

That’s where the shoe pinches a bit, because Intel shows figures in which an i7-7500U indeed performs twelve percent better in Sysmark 2014 than an i7-6500U. However, the latter runs at a maximum of 3.1GHz, while the Kaby Lake processor taps at a maximum of 3.5GHz. That speed difference could fully explain the performance gain. With short-intensity workloads, Intel calls them “bursty”, the difference between the Skylake and Kaby Lake processors would be up to nineteen percent, a difference that cannot be explained by the higher clocks. Or is it? The Speedstep technology has also been adjusted and improved. Speedstep takes the switching between different power states off the hands of the operating system and handles it in hardware. It can do this much faster than, for example, through Windows,

Apart from higher clock speeds and faster upshifting turbos, there are few architectural changes in Kaby Lake. With the more mature process, higher clock speeds with the same TDPs are possible, but the idle consumption would be virtually unchanged. Little can be done against the already low leakage currents. Idle consumption would be 40 to 50mW, for both the Skylake and Kaby Lake generations. The biggest change seems to be the support for 4k content with hevc and vp9 codecs for now. As soon as we get our hands on new-generation laptops, we’ll be able to test that. That should satisfy us until we can play with the desktop processors in January.