General Code Compile - Who's it For?

As I had mentioned, one big advantage of having an Arm system like this is the fact that it enables your native software development, without having to worry about cross-compiling code and all of the kerfuffle that that entrails.

For me personally, one issue had been the fact that we need to compile our SPEC test suite for Arm architectures, which isn’t all that evident when you don’t have a native machine to test things on.

Still, I had been able to configure a proper cross-compile setup- in this case GCC9 on an x86 machine (AMD Ryzen 3700X) cross-compiling for an AArch64 target, versus natively compiling the same target with the same compiler on the eMAG workstation.

Native vs Cross-Compile (Andrei's SPEC setup)

 

The first thing I noted when compiling things on the eMAG system, is that it took quite an enormous amount of time for linking together the libraries and executables. Unfortunately, this isn’t something that can be parallelised, and it ends up being a mostly single-thread performance bottlenecked part of software development.

Although the eMAG system does have more processing power than your average consumer system, it’s unlikely for this to actually materialise in the average development environment due to the massive singe-threaded performance disadvantage of the system.

In my case, my personal desktop machine outperformed the eMAG system in this one use-case by a factor of >2x.

My personal view on this is that if I were to be trying to port a piece of software to Arm, assuming that the act of cross-compiling itself isn’t an inherent issue, then I would probably prefer to simply cross-compile things on my regular x86 machine and deploy and test it on a lower-end embedded Arm board.

The only real audience who could rationalise the system performance deficit for its architectural flexibility would be developers who actually work on hardware enablement – using the SBSA system to its fullest for things like Arm driver development and making full use of the peripherals and PCIe capabilities of the system.

SPEC2017: OK-ish MT Performance Conclusion - All Eyes on an Altra System
Comments Locked

35 Comments

View All Comments

  • SarahKerrigan - Friday, May 22, 2020 - link

    The X-Gene microarchitecture was never particularly stellar and by the time eMag rolled around it was woefully obsolete. I did some testing on eMag a few months back and it was pretty dire. When I spent some time on Graviton2 last week, it was like night and day compared to eMag (frequently 2+ times the single-thread perf despite a much lower clock), so I have high hopes for Altra.
  • SarahKerrigan - Friday, May 22, 2020 - link

    By the way, Andrei, you may want to correct the ST SPECFP subtest result graph - it looks like you used Graviton as a template and forgot to change the labels to eMag, because right now it only mentions Graviton1, and Graviton2, and Intel, not eMag.
  • Andrei Frumusanu - Friday, May 22, 2020 - link

    Thanks, good catch.
  • Flunk - Friday, May 22, 2020 - link

    Interesting to see even if this hardware only makes sense for very specialized purposes. ARM processors have gone from only applicable to mobile devices to something that would have made sense in a server a few years ago.
  • SarahKerrigan - Friday, May 22, 2020 - link

    This isn't exactly a good representative of ARM processors; chips like Graviton2 are competitive for server workloads today, and make eMag look like a toy by comparison.
  • eastcoast_pete - Friday, May 22, 2020 - link

    Thanks Andrei, good and in-depth review! You and others here have already commented on the great difference of this legacy CPU to Ampere's Altra or Amazon's Graviton 2. What I am also very curious about is Fujitsu's ARM-based multicore CPU (A64FX). Amongst other features, it supports 512-bit scalable vector extensions (SVEs), so same width as Intel's AVX512. I wonder if someone at Fujitsu reads Anandtech, and maybe send you a setup for review, although a PRIMEHPC might be out of the scope here. Still, that's an ARM v8 design that should beat the Graviton 2 and the Altra, especially if applications can make use of the wide SVEs.
  • anonomouse - Friday, May 22, 2020 - link

    Based on what we know of the A64FX, it’ll almost certainly *only* beat Graviton 2/Altra in cases where it can heavily utilize wide vectors. In all other scenarios it really doesn’t have a lot of execution width, and only runs at 2.2Ghz. The disclosures in their Microarchitecture guide also don’t showcase anything impressive looking on the branch predictor, which is fine for the typical HPC workloads it will run. That thing is very heavily purpose designed for HPC, and it’s clear they focused on that and not general performance.
  • SarahKerrigan - Friday, May 22, 2020 - link

    Indeed. It's a specialized chip. I would expect no miracles from it on general-purpose loads.
  • eastcoast_pete - Friday, May 22, 2020 - link

    Agree with you and anonomouse on general purpose loads; my interest in wide vectors is mainly due to their utility for video processing and encoding, if (!) the software supports it. For those applications, AVX512 is what keeps Intel competitive with EPYCs in the x64 space. As a question, is anything like an AV1 encoder even available for ARM v8, and specifically to use wide SVEs?
  • Wilco1 - Saturday, May 23, 2020 - link

    There are many AV1 codecs which have AArch64 optimizations, but most focus on older mobile phone cores (eg. http://www.jbkempf.com/blog/post/2019/dav1d-0.5.0-... ), so likely need further work on latest microarchitectures with up to 4 128-bit Neon pipes.

    It's early days for SVE, the first version (as in A64FX) is aimed at HPC. Video codecs will be optimized for SVE2 when hardware becomes available.

Log in

Don't have an account? Sign up now