Intel’s Tiger Lake 11th Gen Core i7-1185G7 Review and Deep Dive: Baskin’ for the Exotic

Name: Intel’s Tiger Lake 11th Gen Core i7-1185G7 Review and Deep Dive: Baskin’ for the Exotic
Item: Intel’s Tiger Lake 11th Gen Core i7-1185G7 Review and Deep Dive: Baskin’ for the Exotic

by Dr. Ian Cutress & Andrei Frumusanu on September 17, 2020 9:35 AM EST

253 Comments | Add A Comment

253 Comments

New Instructions and Updated Security

When a new generation of processors is launched, alongside the physical design and layout changes made, this is usually the opportunity to also optimize instruction flow, increase throughput, and enhance security.

Core Instructions

When Intel first stated to us in our briefings that by-and-large, aside from the caches, the new core was identical to the previous generation, we were somewhat confused. Normally we see something like a common math function get sped up in the ALUs, but no – the only additional changes made were for security.

As part of our normal benchmark tests, we do a full instruction sweep, covering throughput and latency for all (known) supported instructions inside each of the major x86 extensions. We did find some minor enhancements within Willow Cove.

CLD/STD - Clearing and setting the data direction flag - Latency is reduced from 5 to 4 clocks
REP STOS* - Repeated String Stores - Increased throughput from 53 to 62 bytes per clock
CMPXCHG16B - compare and exchange bytes - latency reduced from 17 clocks to 16 clocks
LFENCE - serializes load instructions - throughput up from 5/cycle to 8/cycle

There were two regressions:

REP MOVS* - Repeated Data String Moves - Decreased throughput from 101 to 93 bytes per clock
SHA256MSG1 - SHA256 message scheduling - throughput down from 5/cycle to 4/cycle

It is worth noting that Willow Cove, while supporting SHA instructions, does not have any form of hardware-based SHA acceleration. By comparison, Intel’s lower-power Tremont Atom core does have SHA acceleration, as does AMD’s Zen 2 cores, and even VIA’s cores and VIA’s Zhaoxin joint venture cores. I’ve asked Intel exactly why the Cove cores don’t have hardware-based SHA acceleration (either due to current performance being sufficient, or timing, or power, or die area), but have yet to receive an answer.

From a pure x86 instruction performance standpoint, Intel is correct in that there aren’t many changes here. By comparison, the jump from Skylake to Cannon Lake was bigger than this.

Security and CET

On the security side, Willow Cove will now enable Control-Flow Enforcement Technology (CET) to protect against a new type of attack. In this attack, the methodology takes advantage of control transfer instructions, such as returns, calls and jumps, to divert the instruction stream to undesired code.

CET is the combination of two technologies: Shadow Stacks (SS) and Indirect Branch Tracking (IBT).

For returns, the Shadow Stack creates a second stack elsewhere in memory, through the use of a shadow stack pointer register, with a list of return addresses with page tracking - if the return address on the stack is called and not matched with the return address expected in the shadow stack, the attack will be caught. Shadow stacks are implemented without code changes, however additional management in the event of an attack will need to be programmed for.

New instructions are added for shadow stack page management:

INCSSP: increment shadow stack pointer (i.e. to unwind shadow stack)
RDSSP: read shadow stack pointer into general purpose register
SAVEPREVSSP/RSTORSSP: save/restore shadow stack (i.e. thread switching)
WRSS: Write to Shadow Stack
WRUSS: Write to User Shadow Stack
SETSSBSY: Set Shadow Stack Busy Flag to 1
CLRSSBSY: Clear Shadow Stack Busy Flag to 0

Indirect Branch Tracking is added to defend against equivalent misdirected jump/call targets, but requires software to be built with new instructions:

ENDBR32/ENDBR64: Terminate an indirect branch in 32-bit/64-bit mode

Full details about Intel’s CET can be found in Intel’s CET Specification.

At the time of presentation, we were under the impression that CET would be available for all of Intel’s processors. ~~However we have since learned that Intel’s CET will require a vPro enabled processor as well as operating system support for Hardware-Enforced Stack Protection.~~ This is currently available on Windows 10’s Insider Previews. I am unsure about Linux support at this time.

Update: Intel has reached out to say that their text implying that CET was vPro only was badly worded. What it was meant to say was 'All CPUs support CET, however vPro also provides additional security such as Intel Hardware Shield'.

AI Acceleration: AVX-512, Xe-LP, and GNA2.0

One of the big changes for Ice Lake last time around was the inclusion of an AVX-512 on every core, which enabled vector acceleration for a variety of code paths. Tiger Lake retains Intel’s AVX-512 instruction unit, with support for the VNNI instructions introduced with Ice Lake.

It is easy to argue that since AVX-512 has been around for a number of years, particularly in the server space, we haven’t yet seen it propagate into the consumer ecosphere in any large way – most efforts for AVX-512 have been primarily by software companies in close collaboration with Intel, taking advantage of Intel’s own vector gurus and ninja programmers. Out of the 19-20 or so software tools that Intel likes to promote as being AI accelerated, only a handful focus on the AVX-512 unit, and some of those tools are within the same software title (e.g. Adobe CC).

There has been a famous ruckus recently with the Linux creator Linus Torvalds suggesting that ‘AVX-512 should die a painful death’, citing that AVX-512, due to the compute density it provides, reduces the frequency of the core as well as removes die area and power budget from the rest of the processor that could be spent on better things. Intel stands by its decision to migrate AVX-512 across to its mobile processors, stating that its key customers are accustomed to seeing instructions supported across its processor portfolio from Server to Mobile. Intel implied that AVX-512 has been a win in its HPC business, but it will take time for the consumer platform to leverage the benefits. Some of the biggest uses so far for consumer AVX-512 acceleration have been for specific functions in Adobe Creative Cloud, or AI image upscaling with Topaz.

Intel has enabled new AI instruction functionality in Tiger Lake, such as DP4a, which is an Xe-LP addition. Tiger Lake also sports an updated Gaussian Neural Accelerator 2.0, which Intel states can offer 1 Giga-OP of inference within one milliwatt of power – up to 38 Giga-Ops at 38 mW. The GNA is mostly used for natural language processing, or wake words. In order to enable AI acceleration through the AVX-512 units, the Xe-LP graphics, and the GNA, Tiger Lake supports Intel’s latest DL Boost package and the upcoming OneAPI toolkit.

10nm SuperFin, Willow Cove, Xe, and new SoC Cache Architecture: The Effect of Increasing L2 and L3

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

253 Comments

View All Comments

MDD1963 - Saturday, September 19, 2020 - link
Although equaling/exceeding 7700K-level of performance within a 50W envelope in a laptop is impressive, the 4c/8t design is going to cause at least one or two frowns/raised eyebrows...
ballsystemlord - Saturday, September 19, 2020 - link
@Ian why do these companies always seem to have the worst timing on sending you stuff? Do you tell them when you'll be on vacation?

Thanks for the review!
Ian Cutress - Sunday, September 20, 2020 - link
It's happened a lot these past couple of years. The more segments of the tech industry you cover, the less downtime you have - my wife obviously has to book holiday months in advance, but companies very rarely tell you when launches are, or they offer surprise review samples a few days before you are set to leave. We do our best to predict when the downtime is - last year we had hands on with the Ice Lake Development system before the announcement of the hardware, and so with TGL CPUs being announced first on Sep 2nd, we weren't sure when the first units were coming in. We mistimed it. Of course with only two/three of us on staff, each with our own segments, it's hard to get substitutes in. It can be done, Gavin helped a lot with TR3 for example. But it depends on the segment.

And thanks :)
qwertymac93 - Sunday, September 20, 2020 - link
Finally a decent product from Intel. It's been a while. Those AVX512 numbers were impressive. Intel is also now able to compete toe to toe with AMD integrated graphics, trading blows. I feel that won't last, though. AMD is likely to at least double the GPU horsepower next gen with the move from a tweaked GCN5 to RDNA2 and I don't know if Intel will be able to keep up. Next year will be exciting in any case.
Spunjji - Sunday, September 20, 2020 - link
It'll be a while before we get RDNA2 at the high end - looks like late 2021 or early 2022. Before that, it's only slated to arrive with Van Gogh at 7-15W
efferz - Monday, September 21, 2020 - link
It is very interesting to see that the intel complier make the SPECint2017 scores 52% higher than other compliers without 462.libquantum.
helpMeImDying - Thursday, September 24, 2020 - link
Hello, before ranting I want to know if the scores of spec2006 and spec2017 were adjusted/changed based on processors frequency(Read something like that in the article)? Because you can't do that. Frequencies should be out of the topic here unless comparing same generation CPU's and even then there are some nuances. What matters is the performance per watt comparing low power notebooks. It can be done mathematically, if the TDP can't be capped at the same level all the time, like you did in the first few pages. I'm interested in scores at 15W and 25W. So you should have and should in the future monitor and publish power consumed numbers near the scores.
And if you are adjusting scores based on CPU frequencies, then they are void and incorrect.
helpMeImDying - Thursday, September 24, 2020 - link
Btw, same with iGPUs.
beggerking@yahoo.com - Friday, September 25, 2020 - link
none of the tests seem valid... some are intel based others are AMD based... I don't see a single test where Ryzen beats 10th gen but loses to 11th gen on standard 15 watt profile...

the speed difference between 10th and 11th gen intel is approx 10-15%.. its good, but probably not worth the price premium since Ryzen is already cheaper than 10th gen, i don't see how 11th gen would go cheaper than Ryzen...
legokangpalla - Monday, September 28, 2020 - link
I always thought AVX-512 was a direct standoff against heterogenous computing.
I mean isn't it a better idea to develop better integrations for GPGPU like SYCL, higher versions of OpenCL etc? Programming with vector instructions IMO is lot more painful compared to writing GPU kernels and tasks like SIMD should be offloaded to GPU instead being handled by CPU instruction(CPU instruction with poor portability).

Intel’s Tiger Lake 11th Gen Core i7-1185G7 Review and Deep Dive: Baskin’ for the Exotic

New Instructions and Updated Security

Core Instructions

Security and CET

AI Acceleration: AVX-512, Xe-LP, and GNA2.0

Post Your Comment

253 Comments

View All Comments

MDD1963 - Saturday, September 19, 2020 - link

ballsystemlord - Saturday, September 19, 2020 - link

Ian Cutress - Sunday, September 20, 2020 - link

qwertymac93 - Sunday, September 20, 2020 - link

Spunjji - Sunday, September 20, 2020 - link

efferz - Monday, September 21, 2020 - link

helpMeImDying - Thursday, September 24, 2020 - link

helpMeImDying - Thursday, September 24, 2020 - link

beggerking@yahoo.com - Friday, September 25, 2020 - link

legokangpalla - Monday, September 28, 2020 - link

Log in

Don't have an account? Sign up now