This test seems to correlate with L1 cache (and then also cross chiplet latency), since need to do comparison with other threads? This doesn't seem like something that will correlate well with games as work like this would have a central thread orchestrating all the others since all dependent on the user input, and then also as mentioned the bottleneck is usually draw calls etc., not this.
I would guess that AMD's next gen (4) will then render this test a bit worthless as they're intending to stack memory, don't know whether that will have an increase in L1 though.
That cross chiplet latency is making me want a new CPU that I don't need. It's interesting that the raw scores will probably be used in reviews yet in the majority of games, something cheaper *may* be as good or better. It's about buying the right CPU for the job, not what scores highest on raw benchmarks.
The AMD CPUs with a single compute die seem to hold up well, until the # of threads exceeds the physical core count. Specifically, the R7 2700X, the R9 4900HS, and the R7 3700X. That supports the notion that core-to-core latency is important.
To be honest, that seems odd to me. If I were implementing something like this, I'd compute the new position of each boid based on the results of the previous iteration. So, those previous results would be read-only and therefore wouldn't require any locking or synchronization to access. Then, you'd just batch the boids and enqueue the batches for a bunch of worker threads to consume. If the batch size is sensible, there shouldn't be much lock contention on the work queue, and that lock contention is likely where core-to-core latency becomes important.
Apart from lock-contention, the only other reason I can see why core-to-core latency would matter is if you have false-sharing (i.e. the datastructures for the different boids aren't aligned on 64-byte boundaries).
Interesting. The only games I know of that are typically CPU bound are flight sims, probably through a combination of high numbers of draw calls, and all the physics simulation going on in the background. And, as near as I can tell, they are all using AVX2 vector math extensions. It was enough that Zen 1/1+ chips just weren't usable, because they didn't support AVX.
Unfortunately, there aren't very many good benchmarks for flight sims. The closest thing I've found is the Il-2 SYN_Vander Benchmark v6 from the forums. While the results are repeatable, it is not a quick and easy benchmark to run, and it does seem to be very sensitive to memory latency. (Going from a 3800X to a 5800X was something like a 40% boost on my CPU performance).
It will be interesting to see if the single thread numbers correlate at all with the Vander results.
Zen / Zen+ both feature full AVX and AVX2 feature compatibility. The older Phenom CPUs, however, do not, which has contributed (further) to making them age rather poorly.
Sims love IPC, as most rely on a single core physics thread which the rest of the engine piggy backs onto. The changes to instruction pipelining and cache improvements on Zen 3 are fantastic for that, as you say. Some other games that really stretched their legs are Starcraft 2 and Guild Wars 2. Both of these saw a near 50% improvement gen on gen. Good stuff.
Phenoms performed on par with Core2 and aged just about as well. Instruction-wise, I think Phenom only went up to SSE3, which is not quite as far as Core2 went.
‘Zen / Zen+ both feature full AVX and AVX2 feature compatibility. The older Phenom CPUs, however, do not, which has contributed (further) to making them age rather poorly.’
Piledriver ‘improved’ upon Bulldozer by making AVX slower than SSE-2. In other words, it broke AVX. Yet, AMD chose to keep Piledriver on the market as the only ‘high-performance’ CPU for many many years.
I'll have to go check that then. I know the Zen 1 and Zen 1+ chips had reputations of very poor performance in flight sims and were specifically advised against, and I recalled it was tied to their AVX2 performance, but it's been a few years, so will need to go dig up the tests and see.
The Zen 2 to Zen 3 transition was also pretty significant, with my CPU bound numbers going up by about 40%. I think that was more due to the memory latency improvements. Il-2 appears to be super sensitive to memory latency.
All I know is that AVX was broken when AMD released the Piledriver revision of the Bulldozer architecture. Zen was supposed to do AVX vastly better, although perhaps you’re correct about AVX-2. My hazy recollection is that Zen 1 only could do 128-bit at a time. To do 256-bit required more latency. I used to know all these details but age is catching up with me quickly.
Zen/Zen+ cores do support AVX/AVX2 indeed, but in case of AVX2, the execution takes two cycles instead of one on these generations ( https://www.anandtech.com/show/14525/amd-zen-2-mic... ). Full AVX2 support (meaning AVX2 instruction takes one cycle) is present from Zen 2 onwards.
This is just another 3dMK synthetic benchmark which may or may not depend on the game code itself to define whether this particular bench is relevant to a particular game, or it isn't, in the sense of evaluating the performance value of particular hardware combinations. Often, synthetic benchmarks may have little if any actual relevance to a particular game engine (or CPU/GPU), and 3DMk is no stranger to such benchmarks, imo! This looks to be the case here as a matter of fact. It's never going to happen that the majority of 3dMk's benches are going to be terribly useful for judging the performance and/or the usefulness of specific hardware or specific games, etc. But, then, 3dMK's best use is a sort of general synthetic benchmark grab-bag that most people seem to run for itself, every now and then! The history of 3dMK's benchmark base and development is a long and convoluted one, imo, with no particular design agenda as its "guiding hand."
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
29 Comments
Back to Article
philehidiot - Thursday, July 15, 2021 - link
It'll be used to sell CPUs for gaming, I'm sure. It may not be relevant to gaming, but it's very pretty.If I ever create a benchmark, it'll be based on the Hypnotoad.
Allll glllooory to the Hypnotoad.
RSAUser - Thursday, July 15, 2021 - link
This test seems to correlate with L1 cache (and then also cross chiplet latency), since need to do comparison with other threads? This doesn't seem like something that will correlate well with games as work like this would have a central thread orchestrating all the others since all dependent on the user input, and then also as mentioned the bottleneck is usually draw calls etc., not this.I would guess that AMD's next gen (4) will then render this test a bit worthless as they're intending to stack memory, don't know whether that will have an increase in L1 though.
philehidiot - Thursday, July 15, 2021 - link
That cross chiplet latency is making me want a new CPU that I don't need. It's interesting that the raw scores will probably be used in reviews yet in the majority of games, something cheaper *may* be as good or better. It's about buying the right CPU for the job, not what scores highest on raw benchmarks.mode_13h - Thursday, July 22, 2021 - link
The AMD CPUs with a single compute die seem to hold up well, until the # of threads exceeds the physical core count. Specifically, the R7 2700X, the R9 4900HS, and the R7 3700X. That supports the notion that core-to-core latency is important.To be honest, that seems odd to me. If I were implementing something like this, I'd compute the new position of each boid based on the results of the previous iteration. So, those previous results would be read-only and therefore wouldn't require any locking or synchronization to access. Then, you'd just batch the boids and enqueue the batches for a bunch of worker threads to consume. If the batch size is sensible, there shouldn't be much lock contention on the work queue, and that lock contention is likely where core-to-core latency becomes important.
Apart from lock-contention, the only other reason I can see why core-to-core latency would matter is if you have false-sharing (i.e. the datastructures for the different boids aren't aligned on 64-byte boundaries).
Arbie - Thursday, July 15, 2021 - link
Typo: after "You might be forgiven" you forgot "for thinking that".I was hoping it was a blanket dispensation, but don't think you really meant that.
Ryan Smith - Thursday, July 15, 2021 - link
Thanks!(And I forgive you)
AusMatt - Thursday, July 15, 2021 - link
Your results graph (https://images.anandtech.com/doci/16817/Graph1.jpg... is mis characterising the AMD R5 5600X as "Zen2" instead of "Zen3".HarryVoyager - Friday, July 16, 2021 - link
Interesting. The only games I know of that are typically CPU bound are flight sims, probably through a combination of high numbers of draw calls, and all the physics simulation going on in the background. And, as near as I can tell, they are all using AVX2 vector math extensions. It was enough that Zen 1/1+ chips just weren't usable, because they didn't support AVX.Unfortunately, there aren't very many good benchmarks for flight sims. The closest thing I've found is the Il-2 SYN_Vander Benchmark v6 from the forums. While the results are repeatable, it is not a quick and easy benchmark to run, and it does seem to be very sensitive to memory latency. (Going from a 3800X to a 5800X was something like a 40% boost on my CPU performance).
It will be interesting to see if the single thread numbers correlate at all with the Vander results.
Slash3 - Sunday, July 18, 2021 - link
Zen / Zen+ both feature full AVX and AVX2 feature compatibility. The older Phenom CPUs, however, do not, which has contributed (further) to making them age rather poorly.Sims love IPC, as most rely on a single core physics thread which the rest of the engine piggy backs onto. The changes to instruction pipelining and cache improvements on Zen 3 are fantastic for that, as you say. Some other games that really stretched their legs are Starcraft 2 and Guild Wars 2. Both of these saw a near 50% improvement gen on gen. Good stuff.
mode_13h - Thursday, July 22, 2021 - link
Phenoms performed on par with Core2 and aged just about as well. Instruction-wise, I think Phenom only went up to SSE3, which is not quite as far as Core2 went.Oxford Guy - Monday, July 26, 2021 - link
‘Zen / Zen+ both feature full AVX and AVX2 feature compatibility. The older Phenom CPUs, however, do not, which has contributed (further) to making them age rather poorly.’Piledriver ‘improved’ upon Bulldozer by making AVX slower than SSE-2. In other words, it broke AVX. Yet, AMD chose to keep Piledriver on the market as the only ‘high-performance’ CPU for many many years.
HarryVoyager - Tuesday, July 27, 2021 - link
I'll have to go check that then. I know the Zen 1 and Zen 1+ chips had reputations of very poor performance in flight sims and were specifically advised against, and I recalled it was tied to their AVX2 performance, but it's been a few years, so will need to go dig up the tests and see.The Zen 2 to Zen 3 transition was also pretty significant, with my CPU bound numbers going up by about 40%. I think that was more due to the memory latency improvements. Il-2 appears to be super sensitive to memory latency.
Oxford Guy - Wednesday, July 28, 2021 - link
All I know is that AVX was broken when AMD released the Piledriver revision of the Bulldozer architecture. Zen was supposed to do AVX vastly better, although perhaps you’re correct about AVX-2. My hazy recollection is that Zen 1 only could do 128-bit at a time. To do 256-bit required more latency. I used to know all these details but age is catching up with me quickly.CZEfanyz - Tuesday, August 3, 2021 - link
Zen/Zen+ cores do support AVX/AVX2 indeed, but in case of AVX2, the execution takes two cycles instead of one on these generations ( https://www.anandtech.com/show/14525/amd-zen-2-mic... ). Full AVX2 support (meaning AVX2 instruction takes one cycle) is present from Zen 2 onwards.Thunder 57 - Monday, July 19, 2021 - link
What are you talking about? Even Bulldozer supports AVX.mode_13h - Thursday, July 22, 2021 - link
Yeah, but only Excavator supported AVX2.WaltC - Friday, July 16, 2021 - link
This is just another 3dMK synthetic benchmark which may or may not depend on the game code itself to define whether this particular bench is relevant to a particular game, or it isn't, in the sense of evaluating the performance value of particular hardware combinations. Often, synthetic benchmarks may have little if any actual relevance to a particular game engine (or CPU/GPU), and 3DMk is no stranger to such benchmarks, imo! This looks to be the case here as a matter of fact. It's never going to happen that the majority of 3dMk's benches are going to be terribly useful for judging the performance and/or the usefulness of specific hardware or specific games, etc. But, then, 3dMK's best use is a sort of general synthetic benchmark grab-bag that most people seem to run for itself, every now and then! The history of 3dMK's benchmark base and development is a long and convoluted one, imo, with no particular design agenda as its "guiding hand."mode_13h - Thursday, July 22, 2021 - link
Is it that hard to type out "3DMark"? At first, I didn't even know what you meant by 3DMk.I just think they're cool demos to watch. It'd be interesting to see how well their different benchmarks correlate with actual game performance.
Alexvrb - Sunday, July 18, 2021 - link
Oh thank goodness, we were running low on useless synthetic benchmarks that spit out nearly meaningless numbers.boozed - Sunday, July 18, 2021 - link
But pretty colours!Oxford Guy - Monday, July 26, 2021 - link
One would hope Anandtech will boycott it until it is designed to be useful/relevant/meaningful.FLORIDAMAN85 - Wednesday, July 21, 2021 - link
A Flock of Sea Gulls. A CPU bench that I ran, I ran so far away.mode_13h - Thursday, July 22, 2021 - link
: )andrewaggb - Wednesday, July 21, 2021 - link
To be honest it looks to me like CPU's in the graph end up where you'd expect.mode_13h - Thursday, July 22, 2021 - link
There were definitely some surprises in there. If you'd asked me to predict it, my graph would've looked a fair bit different.Oxford Guy - Monday, July 26, 2021 - link
Finally, a benchmark designed for the FX 9590!Yes, you can experience the thrill of the two seconds it hits the 5 GHz clock on the box.
schleems18 - Tuesday, August 10, 2021 - link
What are your thoughts on the sisoftware Sandra benchmarks?