Wouldn't it be better to compare these CPUs to Intel's E7 series enterprise server CPUs? I ask this, because of how technically an Opteron 6386 SE is two CPUs in one. Therefore, two of these would actually be four CPUs and would be a direct competitor (at least in terms of class) to four E7-4870s. If you went even further, four of those Opterons would be a competitor to eight E7-8870s. I understand that, performance wise, these are more similar to the E5s, but it just makes more sense to me to place them higher as enterprise server CPUs.
It's actually the other way around: there may be 2 dies inside each CPU, but even combined they get less work done than the Intel chips in most situations. However, comparing a 4-socket Opti system with a 2-socket Intel system, which cost approximately the same to purchase, can get very interesting: massive memory capacity and bandwidth, lot's of threads for integer throughput and quite a few FPUs. With the drawback of much higher running costs through electricity costs, of course.
Happy that the reviewer correctly got the module/cores right (as the Integer cores are more like hyper threading but not)
in any case should compare the amd modules count to intel cpu cores (amd should be marketing them the same way, 4 module cpus with core assist, that are slower or the same as an phenom x4 real world use, its like saying an i7 is an 8 core cpu when its about the same speed of an i5 that lacks HT)
my co-worker's mother makes $60 an hour on the computer. She has been out of work for nine months but last month her pay was $13948 just working on the computer for a few hours. Read more on this (Click on menu Home more information) http://goo.gl/v6dOM
I don't know if I would say that. Course, I'm biased because I'm somewhat in HPC. But I think that the HPC will also give an idea of how well (or how poorly) a highly multi-threaded/multi-processor capable/aware application is going to perform.
In some HPC cases, having more integer cores is probably going to be WORSE since it's still fighting for FPU resources. And that running it on more processors isn't always necessarily better either (higher intercore communication traffic).
If you compare a 4-socket Opti to a 2-socket Intel (comparable purchase cost) you can get massive memory bandwidth, which might be enough to tip the scale in Optis favor in some HPC applications. They need to profit from many cores and require this bandwdith, though.
Personally for genereal HPC jobs I prefer less cores with higher IPC and clock speed (i.e. Intel), as they're more generally useful.
I can tell you from experience that it really depends on the type of HPC workload.
For FEA, if you can store the large matrices in the massive amount of memory that the Opterons can handle (upto 512 GB for a quad-socket system) - it can potentially help*.
*You have to disable swap so that you don't get bottlenecked by the swap I/O performance.
Then you'd really be able to rock 'n roll being able to solve the matrices entirely in-core.
For molecular dynamics though - it's not really that memory intensive (compared to structural mechanics FEA) but it's CPU intensive.
For CFD, that's also very CPU intensive.
And even then it also depends too on how the solvers are written and what you're doing.
CFD - you need to pass the pressure, velocity, and basically the information/state information about the fluid parcel from one cell to another; so if you partition the model at the junction and you need to transfer information from one cell on one core on one socket to another core sitting on another CPU sitting in another physical socket - then it's memory I/O limited. And most commercial CFD codes that I know of that enables MPI/MPP processing - they actually do somewhat of a local remesh at the partition boundaries so they actually create extra elements just to facilitate the data information transfer/exchange (and to make sure that the data/information is stable).
So there's a LOT that goes into it.
Same with crash safety simulations and explicit dynamics structural mechanics (like LS-DYNA) because that's an energy solver, so what happens to one element will influence what happens at your current element and that in turn will influence what happens at the next element. And for LS-DYNA, *MPP_DECOMPOSITION_<OPTION> can further tell how you want the problem to be broken down specifically (and you can do some pretty neat stuff with it) in order to make the MPI/MPP solver even more efficient.
If you have a problem where what happens with one doesn't really have that much of an impact on another element (such as fatigue analysis - done at the finite element level) - you can process all of the elements individually, so having lots of cores means you can run it at lot faster.
But for problems where there's a lot of "bi-directional" data/communication (hypersonic flow/shock wave for example) - then I THINK (if I remember correctly), the communication penalty is something like O(n^2) or O(n^3). So the CS side to an HPC problem is trying to optimize between these two. Run as many cores as possible, with as little communication as possible (so it doesn't slow you down), as fast as possible, as independently possible, and pass ONLY the information you NEED to pass along, WHEN you need to pass it along and try and do as much of it in-core as possible.
And then to throw a wrench into that whole thing - the physics of the simulations basically is a freakin' hurricane on that whole parade (the physics makes a lot of that either very difficult or near impossible or outright impossible).
I would not even dream of writing that! HPC software can be so much more fun than other enterprise software: no worries about backup or complex High Availability setups. No just focussing on performance and enjoying the power of your newest chip.
I was talking about the HPC benchmarks that AMD reports. Not all HPC benchmarks can be recompiled and not all of them will show such good results with FMA. Those are very interesting, but they only give a very limited view.
On top of that, there is licensing costs. Windows Server 2012, for example, can be licensed per processor rather than by core count. That that comes into play, it can quickly inflate the TCO when comparing 4-socket vs 2-socket servers.
There are a lot of programs that have different licensing methods.
Ansys is per core.
Windows actually makes it potentially quite cost effective - especially if you're running a virtualization server because you can throw a lot of VM tiles on a 8-module(?) Opteron 6300 so while you might have to pay more for the additional sockets, it might save you money because you don't have to run twice the number of servers to handle the same number of VM tiles. It really depends on what you're doing with it.
(I think that Enterprise Linux is also licensed in the same way (per socket).)
For some of our larger runs (both at my work and also my CFD runs at home, and also the research that I used to be doing for the university) - we had to write restart files on a regular interval in the event that something goes wrong or the power goes out or something like that.
That's our kind of "backup". Although unlike say...the financial sector where they want five 9's uptime, (99.99999%), our restriction isn't THAT bad, but the professional HPC centers will have HA of some kind implemented.
I think that you saw the last time that you ran the LS-DYNA benchmarks on the Opteron 6274 that the way that AMD are counting the cores (integer cores, not FP cores) - means that there was only like 7-8% performance benefit for HPC applications (which isn't much given twice the "core" count).
The FPU itself runs into something akin to thread contention issues. (It still boils down to fighting for FPU resources).
But if say...for example, you have a properly, well coded Photoshop - and they are learning on how to write MPP codes from HPC, it can take what they already do quite well, and make it run even better. Fewer cores perhaps, but if the cores ARE available, it will know how to best break up the problem so that it would be able to better run the same task in parallel vs. the more like...quasi-parallel (multi-threaded) approach that a lot of these programs use nowadays.
(Imagine if you're batch processing images and it's able to spawn multiple instances of the batch solver/processor so that you can work on multiple images at the same time rather than working on them one at a time, but working on them in a multi-threaded manner.)
Or imagine if the Flash plugin was multi-core capable/aware so when you have 146 tabs, it doesn't crash your browser session. ;o) (Oh the joys of being a researcher.)
Any idea when these will hit e-tail? I have a dual socket G34 board that two Opteron 6320 or two 6374's would be a good match. Still have decided between high clock and high core count. When you get up to 32 simultaneous threads, things really start to hit diminishing returns.
All of the new opteron chips can be used in 4P configurations. While none of the listed Xeons can. Can you add the Xeons that work in 4P configurations as well.
Right under "AMD Opteron 6300 versus 6200 SKUs" the leftmost column says Xeon E5, where it should say Opteron 6300. Anyway, now AMD can't even get a review sample out the door? Seriously? Either they're too incompetent or the benchmarks would be too embarrassing, either way it's not good.
Can you be more specific and tell me which CPU comparison you are talking about? The CPUs I compared had a 4 to 15% price difference ( 6386 SE vs 2665 or 6366HE vs 2630L).
Strange I have had 2 Opteron 6376 for about 3 weeks. So getting them out early shouldn't have been an issue. Of course I bought about 2 thousand of the 6274 of the last 12 months, may have something to do with it.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
22 Comments
Back to Article
Notperk - Monday, November 5, 2012 - link
Wouldn't it be better to compare these CPUs to Intel's E7 series enterprise server CPUs? I ask this, because of how technically an Opteron 6386 SE is two CPUs in one. Therefore, two of these would actually be four CPUs and would be a direct competitor (at least in terms of class) to four E7-4870s. If you went even further, four of those Opterons would be a competitor to eight E7-8870s. I understand that, performance wise, these are more similar to the E5s, but it just makes more sense to me to place them higher as enterprise server CPUs.MrSpadge - Monday, November 5, 2012 - link
It's actually the other way around: there may be 2 dies inside each CPU, but even combined they get less work done than the Intel chips in most situations. However, comparing a 4-socket Opti system with a 2-socket Intel system, which cost approximately the same to purchase, can get very interesting: massive memory capacity and bandwidth, lot's of threads for integer throughput and quite a few FPUs. With the drawback of much higher running costs through electricity costs, of course.leexgx - Tuesday, November 6, 2012 - link
Happy that the reviewer correctly got the module/cores right (as the Integer cores are more like hyper threading but not)in any case should compare the amd modules count to intel cpu cores
(amd should be marketing them the same way, 4 module cpus with core assist, that are slower or the same as an phenom x4 real world use, its like saying an i7 is an 8 core cpu when its about the same speed of an i5 that lacks HT)
RicardoNeuer - Thursday, November 8, 2012 - link
my co-worker's mother makes $60 an hour on the computer. She has been out of work for nine months but last month her pay was $13948 just working on the computer for a few hours. Read more on this(Click on menu Home more information)
http://goo.gl/v6dOM
thebluephoenix - Tuesday, November 6, 2012 - link
E7 is nehalem, old technology. E5-2687W and E5-2690 are actual competition (~Double 2600K vs ~Double FX-8350)JohanAnandtech - Tuesday, November 6, 2012 - link
Minor nitpick: E7 is Westmere, improved Nehalem.http://www.anandtech.com/show/4285/westmereex-inte...
But E5 is indeed the real competition. E7 is less about performance/Watt, but more about RAS and high scalability (corecounts of 40, up to 80 threads)
alpha754293 - Monday, November 5, 2012 - link
I don't know if I would say that. Course, I'm biased because I'm somewhat in HPC. But I think that the HPC will also give an idea of how well (or how poorly) a highly multi-threaded/multi-processor capable/aware application is going to perform.In some HPC cases, having more integer cores is probably going to be WORSE since it's still fighting for FPU resources. And that running it on more processors isn't always necessarily better either (higher intercore communication traffic).
MrSpadge - Monday, November 5, 2012 - link
If you compare a 4-socket Opti to a 2-socket Intel (comparable purchase cost) you can get massive memory bandwidth, which might be enough to tip the scale in Optis favor in some HPC applications. They need to profit from many cores and require this bandwdith, though.Personally for genereal HPC jobs I prefer less cores with higher IPC and clock speed (i.e. Intel), as they're more generally useful.
alpha754293 - Friday, November 9, 2012 - link
I can tell you from experience that it really depends on the type of HPC workload.For FEA, if you can store the large matrices in the massive amount of memory that the Opterons can handle (upto 512 GB for a quad-socket system) - it can potentially help*.
*You have to disable swap so that you don't get bottlenecked by the swap I/O performance.
Then you'd really be able to rock 'n roll being able to solve the matrices entirely in-core.
For molecular dynamics though - it's not really that memory intensive (compared to structural mechanics FEA) but it's CPU intensive.
For CFD, that's also very CPU intensive.
And even then it also depends too on how the solvers are written and what you're doing.
CFD - you need to pass the pressure, velocity, and basically the information/state information about the fluid parcel from one cell to another; so if you partition the model at the junction and you need to transfer information from one cell on one core on one socket to another core sitting on another CPU sitting in another physical socket - then it's memory I/O limited. And most commercial CFD codes that I know of that enables MPI/MPP processing - they actually do somewhat of a local remesh at the partition boundaries so they actually create extra elements just to facilitate the data information transfer/exchange (and to make sure that the data/information is stable).
So there's a LOT that goes into it.
Same with crash safety simulations and explicit dynamics structural mechanics (like LS-DYNA) because that's an energy solver, so what happens to one element will influence what happens at your current element and that in turn will influence what happens at the next element. And for LS-DYNA, *MPP_DECOMPOSITION_<OPTION> can further tell how you want the problem to be broken down specifically (and you can do some pretty neat stuff with it) in order to make the MPI/MPP solver even more efficient.
If you have a problem where what happens with one doesn't really have that much of an impact on another element (such as fatigue analysis - done at the finite element level) - you can process all of the elements individually, so having lots of cores means you can run it at lot faster.
But for problems where there's a lot of "bi-directional" data/communication (hypersonic flow/shock wave for example) - then I THINK (if I remember correctly), the communication penalty is something like O(n^2) or O(n^3). So the CS side to an HPC problem is trying to optimize between these two. Run as many cores as possible, with as little communication as possible (so it doesn't slow you down), as fast as possible, as independently possible, and pass ONLY the information you NEED to pass along, WHEN you need to pass it along and try and do as much of it in-core as possible.
And then to throw a wrench into that whole thing - the physics of the simulations basically is a freakin' hurricane on that whole parade (the physics makes a lot of that either very difficult or near impossible or outright impossible).
JohanAnandtech - Monday, November 5, 2012 - link
I would not even dream of writing that! HPC software can be so much more fun than other enterprise software: no worries about backup or complex High Availability setups. No just focussing on performance and enjoying the power of your newest chip.I was talking about the HPC benchmarks that AMD reports. Not all HPC benchmarks can be recompiled and not all of them will show such good results with FMA. Those are very interesting, but they only give a very limited view.
For the rest: I agree with you.
gamoniac - Tuesday, November 6, 2012 - link
On top of that, there is licensing costs. Windows Server 2012, for example, can be licensed per processor rather than by core count. That that comes into play, it can quickly inflate the TCO when comparing 4-socket vs 2-socket servers.alpha754293 - Friday, November 9, 2012 - link
There are a lot of programs that have different licensing methods.Ansys is per core.
Windows actually makes it potentially quite cost effective - especially if you're running a virtualization server because you can throw a lot of VM tiles on a 8-module(?) Opteron 6300 so while you might have to pay more for the additional sockets, it might save you money because you don't have to run twice the number of servers to handle the same number of VM tiles. It really depends on what you're doing with it.
(I think that Enterprise Linux is also licensed in the same way (per socket).)
alpha754293 - Friday, November 9, 2012 - link
uhh....it depends.For some of our larger runs (both at my work and also my CFD runs at home, and also the research that I used to be doing for the university) - we had to write restart files on a regular interval in the event that something goes wrong or the power goes out or something like that.
That's our kind of "backup". Although unlike say...the financial sector where they want five 9's uptime, (99.99999%), our restriction isn't THAT bad, but the professional HPC centers will have HA of some kind implemented.
I think that you saw the last time that you ran the LS-DYNA benchmarks on the Opteron 6274 that the way that AMD are counting the cores (integer cores, not FP cores) - means that there was only like 7-8% performance benefit for HPC applications (which isn't much given twice the "core" count).
The FPU itself runs into something akin to thread contention issues. (It still boils down to fighting for FPU resources).
But if say...for example, you have a properly, well coded Photoshop - and they are learning on how to write MPP codes from HPC, it can take what they already do quite well, and make it run even better. Fewer cores perhaps, but if the cores ARE available, it will know how to best break up the problem so that it would be able to better run the same task in parallel vs. the more like...quasi-parallel (multi-threaded) approach that a lot of these programs use nowadays.
(Imagine if you're batch processing images and it's able to spawn multiple instances of the batch solver/processor so that you can work on multiple images at the same time rather than working on them one at a time, but working on them in a multi-threaded manner.)
Or imagine if the Flash plugin was multi-core capable/aware so when you have 146 tabs, it doesn't crash your browser session. ;o) (Oh the joys of being a researcher.)
Kevin G - Monday, November 5, 2012 - link
Any idea when these will hit e-tail? I have a dual socket G34 board that two Opteron 6320 or two 6374's would be a good match. Still have decided between high clock and high core count. When you get up to 32 simultaneous threads, things really start to hit diminishing returns.MySchizoBuddy - Monday, November 5, 2012 - link
All of the new opteron chips can be used in 4P configurations. While none of the listed Xeons can. Can you add the Xeons that work in 4P configurations as well.Stuka87 - Monday, November 5, 2012 - link
Xeons that work in Quad Socket configs cost significantly more and do not really compete with the Opterons.But it would be interesting to see the cost to performance difference between the two.
Kjella - Monday, November 5, 2012 - link
Right under "AMD Opteron 6300 versus 6200 SKUs" the leftmost column says Xeon E5, where it should say Opteron 6300. Anyway, now AMD can't even get a review sample out the door? Seriously? Either they're too incompetent or the benchmarks would be too embarrassing, either way it's not good.PsiAmp - Tuesday, November 6, 2012 - link
Why are you comparing two CPUs that have 64% price difference and say cheaper one has 12% less performance and is not attractive?You need to compare products of similar price points. Or take into account price difference, which you didn't mention at all.
JohanAnandtech - Tuesday, November 6, 2012 - link
Can you be more specific and tell me which CPU comparison you are talking about? The CPUs I compared had a 4 to 15% price difference ( 6386 SE vs 2665 or 6366HE vs 2630L).DeaDSOuLz - Monday, November 12, 2012 - link
Strange I have had 2 Opteron 6376 for about 3 weeks. So getting them out early shouldn't have been an issue. Of course I bought about 2 thousand of the 6274 of the last 12 months, may have something to do with it.dig23 - Wednesday, December 12, 2012 - link
Can anybody tell me what is the Family and model number for Piledriver Abudhabi (opteron 6300) ?? and how to find for other models...and could not find BKDG for same ...any help is appreciated ?
Rabman - Thursday, December 20, 2012 - link
Like "Bulldozer" before it, "Piledriver" based cores are Family 15h but are in a different range of model numbers.The BKDG you're looking for is here:
http://support.amd.com/us/Processor_TechDocs/42300...