Dissecting Intel's EPYC Benchmarks: Performance Through the Lens of Competitive Analysisby Johan De Gelas & Ian Cutress on November 28, 2017 9:00 AM EST
- Posted in
- Xeon Platinum
- EPYC 7601
Although the AMD EPYC is definitely a worthy contender in the server space, AMD's technical marketing of the new CPU has been surprisingly absent, as the company not published any real server benchmarks. The only benchmarks published were SPEC CPU and Stream, with AMD preferring for its partners and third parties to promote performance. And, as our long-time readers know, while the SPEC CPU benchmarks have their merits and many people value them, they are a very poor proxy of most server workloads.
In every launch, we expect companies to offer an element of competitive analysis, often to show how their platform is good or better than the rest. At the launch of Intel's latest Xeon-SP platform, analysis to EPYC was limited to a high-level, as the systems were not as freely available as expected. AMD was able to do so on Broadwell-E at the time of the EPYC announcement because it was out and available - Intel wasn't able to do it on EPYC because AMD were several months away from moving it from a cloud-only ramp up program. This is partly the effect of AMD's server market implementation and announcement roadmap, although it didn't stop Intel from hypothesising about the performance deficits in ways that caught the attention of a number of online media.
Throughout all of this, AMD could not resist but to continue to tell the world that the "EPYC SoC Sets World Records on SPEC CPU Benchmarks". In the highly profitable field that is server hardware, this could not be left unanswered by Intel, who responded that the Intel Xeon Scalable has great "momentum" with no less than 110 performance records to date.
Jumping to the present time, in order to to prove Xeon-SP dominance over the competition, Intel's data center engineering group has been able to obtain a few EPYC systems and has started benchmarking. This benchmarking, along with justifications of third-party verification, was distributed to the small set of Xeon-SP launch reviewers as a guide, to follow up on that high-level discussion some time ago. The Intel benchmarking document we received had a good amount of detail however, and the conference call we had relating to it was filled with some good technical tidbits.
Our own benchmarks showed that the EPYC was a very attractive alternative in some workloads (Java applications), while the superior mesh architecture makes Intel's Xeon the best choice in other (Databases for example).
A Side Note About SPEC
A number of these records were achieved through SPEC. As mentioned above, while SPEC is a handy tool for comparing the absolute best tweaked peak performance of the hardware underneath, or if the system wants to be analysed close to the metal because of how well known the code base is, but this has trouble transferring exactly to the real world. A lot of time the software within a system will only vaguely know what system it is being run on, especially if that system is virtualised. Sending AVX-512 commands down the pipe is one thing, but SPEC compilation can be tweaked to make sure that cache locality is maintained whereas in the real-world, that might not be possible. SPEC says a lot about the system, but ultimately most buyers of these high-end systems are probing real-world workloads on development kits to see what their performance (and subsequent scale-out performance) might be.
For the purposes of this discussion, we have glossed over Intel's reported (and verified over at SPEC.org) results.
Pricing Up A System For Comparison
Professionals and the enterprise market will mention, and quite rightly, that Intel has been charging some heavy premiums with the latest generation, with some analysts mentioning a multiple jump up in pricing even for large customers, making it clear that the Xeon enterprise CPU line is their bread and butter. Although Intel's top-end Xeon Platinum 8180 should give the latest EPYC CPU a fit of trouble thanks to its 28 Skylake-SP cores running at 2.5 to 3.8 GHz, the massive price tag ($10009 for the standard version, $13011 for the high-memory model) made sure that Intel's benchmarking team had no other choice than also throwing in a much more modest Xeon Platinum 8160 (24 cores at 2.1 - 3.7 GHz, $4702k) as well as the Xeon Gold 6148 (20 cores at 2.4-3.7 GHz, $3072).
|Release Date||Early Q3, 2017||Late Q2, 2017*|
|Microarchitecture||Skylake-SP with AVX-512||Zen|
|Process Node||Intel 14nm (14+)||GloFo 14nm|
|Cores / Threads||28 / 56||24 / 48||20 / 40||32 / 64|
|Base Frequency||2.5 GHz||2.1 GHz||2.4 GHz||2.2 GHz|
|Turbo||3.8 GHz||3.7 GHz||3.7 GHz||3.2 GHz|
|L2 Cache||28 MB||24 MB||20 MB||16 MB|
|L3 Cache||38.5 MB||33.0 MB||27.5 MB||64 MB|
|TDP||205 W||150 W||150 W||180 W|
|PCIe Lanes||48 (Technically 64 w/ Omni-Path Versions)||128|
|DRAM||6-channel DDR4||8ch DDR4|
|Max Memory||768 GB||2048 GB|
As a result of this pricing, one of the major humps for Intel in any comparison will be performance per dollar. In order to demonstrate that systems can be equivalent, Intel offered up this comparison from a single retailer. Ideally Intel should have offered multiple configurations options for this comparison, given that a single retailer can intend for different margins on different sets of products (or have different levels of partnership/ecosystem with the manufacturers).
Even then, price parity could only be reached by giving the Intel system less DRAM. Luckily this was the best way to configure the Intel based system anyway. We can only guess how much the benchmarking engineers swore at the people who set the price tags: "this could have been so much easier...". All joking apart, the document we received had a good amount of detail, and similar to how we looked into AMD's benchmarking numbers at their launch, we investigated Intel's newest benchmark numbers as well.
Post Your CommentPlease log in or sign up to comment.
View All Comments
lefty2 - Tuesday, November 28, 2017 - linkWhen Skylake runs AVX 512 and AVX2 instructions it causes both the clock frequency to go down *and* the voltage to go up. (https://www.intel.com/content/dam/www/public/us/en... However, it can only bring the voltage back down within 1ms. If you get a mix of AVX2 and regular instructions, like you do in the POV ray test, then it's going to be running on higher voltage the whole time. That probably explains why the Xeon 8176 drawed so much more power than the EPYC in your Energy consumption test.
The guys at cloudflare also observed a similar effect (although they only notice the performance degrade): https://blog.cloudflare.com/on-the-dangers-of-inte...
Kevin G - Tuesday, November 28, 2017 - linkIn the HPC section, the article indicates that NAMD is faster on the Epyc system but the accompanying graphic points toward a draw with the Xeon Gold 6148 and a win for the Xeon Platinum 8160. Epyc does win a few benchmarks in the list prior to NAMD though.
Frank_han - Tuesday, November 28, 2017 - linkWhen you run those tests, have you bind CPU threads, how did you take care of different layers of numa domains.
UpSpin - Tuesday, November 28, 2017 - linkHIghly questionable article:
"A lot of time the software within a system will only vaguely know what system it is being run on, especially if that system is virtualised". Why do you say this if you publish HPC results? There the software knows exactly whay type of processor in what kind of configuration it is running.
"The second is the compiler situation: in each benchmark, Intel used the Intel compiler for Intel CPUs, but compiled the AMD code on GCC, LLVM and the Intel compiler, choosing the best result" More important, what type of math library did they use? The Intel MKL has an unmatched optimization, have they used the same for the AMD system?
"Firstly is that these are single node measurements: One 32-core EPYC vs 20/24-core Intel processors." Why don't you make it clear, that by doing this, the benchmark became useless!!! Performance doesn't scale linearly with core count: http://www.gromacs.org/@api/deki/files/240/=gromac...
So it makes a huge difference if I compare a simulation which runs on 32 cores with a simulation which runs on 20 cores. If I calculate the performance per core then, I always see that the lower core CPU is much much faster, because of scaling issues of the simulation software. You haven't disclosed how Intel got their 'relative performance' value.
Elstar - Tuesday, November 28, 2017 - linkDo we know for sure that the Omni-Path Skylake CPUs actually use PCIe internally for the fabric port? If you look at Intel's "ark" database, all of the "F" parts have one fewer UPI links, which seems weird.
HStewart - Tuesday, November 28, 2017 - linkI think this was a realistic article on analysis of the two systems. And it does point to important that Intel system is more mature system than AMD EPYC system. My personally feeling is that AMD is thrown together so that claim core count without realistically thinking about the designed.
But it does give Intel a good shot in ARM with completion and I expect Intel's next revision to have significantly leap in technology.
I did like the systems for similarly configured - as the cost, I build myself 10 years a dual Xeon 5160 that was about $8000 - but it was serious machine at the time and significantly faster than normal desktop and last much longer. It was also from Supermicro and find machine - for the longest time it was still faster than a lot machine you can get at BestBuy - it has Windows 10 on it now and still runs today - but I rarely used it because I like the portability of laptops
gescom - Tuesday, November 28, 2017 - linkhttps://www.servethehome.com/wp-content/uploads/20...
And suddenly - 8 core 6134 Skylake-SP - equals - 32 core Epyc 7601.
Amazing. Really amazing.
gescom - Tuesday, November 28, 2017 - linkHuh, I forgot - and that is Skylake at 130W vs Epyc at 180W.
ddriver - Tuesday, November 28, 2017 - linkGromacs is a very narrow niche product and also very biased - they heavily optimize for intel and nvidia and push amd products to take an inefficient code path.
HStewart - Tuesday, November 28, 2017 - linkThis is comparison with AVX2 / AVX512
AVX512 is twice as wide as AVX2 and significant more power than the AVX2 - so yes it very possible in this this test that CPU with 1/4 the normal CPU cores can have more power because AVX512.
Also I heard AMD's implementation of AVX2 is actually two 128 bits together - these results could show that is true.