The revenge of AMD Barcelona's TLB?by Johan De Gelas on March 17, 2008 11:00 AM EST
- Posted in
- IT Computing general
- Low latency L1 TLB (Data and Instructions) 48 entries, supporting all pagesizes
- L2 TLB (Data and Instructions): 512 4k entries, or 128 2M entries
If you compare this with the Intel Penryn family:
- One instruction TLB: 128 entries (4 KB) but only 8 entries for 2MB pages.
- The Data TLB has 2 levels:
– 16 entries (4 KB)
– 256 entries (4 KB), but only 32 for larger pages(2 MB)
You can see that AMD’s K10 family has really massive TLBs compared to the Penryn and previous Intel CPUs, especially if you want to run with large pages. So while this will certainly not affect anyone behind a desktop or mobile, it may well have an impact in the serverworld.
VMWare 3.5 does not yet support Nested Paging, it will be present in an upcoming update. This kind of paging requires really massive TLBs as the page tables of each guest OS are cached in the TLB. But even with shadowpaging, having big TLBs should help when you have a lot of VMs running.
We still have to do quite a bit of benchmarking, but it is clear that the TLB architecture of Barcelona deserves some positive light too. It will be very interesting to see what kind of TLB architecture Nehalem will have, as Nehalem will be the first to support Intel’s Extended Page Tables (EPT, Intel’s version of Nested Pages).
It is interesting to note that Nehalem has a NEW second level 512 entry TLB…
Post Your CommentPlease log in or sign up to comment.
View All Comments
flicker180 - Monday, March 17, 2008 - linkJohan,
NPT is already present in ESX 3.5. You may have read some old slide somewhere, but on ALL VMWare documentation (both internal and external) VMWare acknowledges the execution and implementation of NPTs in ESX 3.5. staring right at a slide deck from VMWare right now that states as such. We're running it in our lab right now on B3 Barcelonas with no issue whatsoever.
Visual - Tuesday, March 18, 2008 - linkSorry for the completely off post.
But I'm really curious, Mr. Dave Graham, did you eventually get barcelona running on your quad-fx board?
I know quad-fx is officially dead now, but maybe if it's finally working with quadcores its still worth getting one of these ancient boards... and who knows, some mobo maker might even do a quad-fx mobo with some more recent chipset even despite amd giving up on the platform.
Alternatively, is it ever possible that a server board will work with normal unbuffered ram?
flicker180 - Tuesday, March 18, 2008 - linkVisual,
hey, never got around to it...got tied into testing Tyan's GT28 systems (the dual twin 1U Johan talked about with regards to CeBit.) Have spent most of my time trying to regress BIOS issues for production units there. However, ESX 3.5 is running quite happily on Barcelona B3 2352s using Tyan S3992-E boards. ;)
JohanAnandtech - Tuesday, March 18, 2008 - linkI am searching for a way to verify this, but I heard at VMWorld 2008 (March 2008) in a session by VMWare architect Richard brunner that is was going to be enabled in one of the upcoming updates of 3.5.
Do you have proof? :-) No issues doesn't mean that NPT is enabled.
flicker180 - Tuesday, March 18, 2008 - linkJohan,
i can't give you access to VMWare internal documentation, so, i'll present what I can to you. you can email me directly if you wish. Talk to Kris Kubicki for my EMC email address.
OndrejSc - Monday, March 17, 2008 - linkIn theory a marginal design advantage like this could result in a tangible performance benefit. But there are far greater design advantages (integrated memory controller) that still aren't able to redeem the lackluster overall speed.
JohanAnandtech - Tuesday, March 18, 2008 - linkIt is too little in most apps, but not in virtualized apps. NPT gives us a 10-20% performance difference.. far from marginal. If you consider that page tables updates can cost from 4 to a 1000 time more cycles in a virtualized environment than in a native one, it is clear that TLB flushes are a lot more costly.
You are right in a native environment, but wrong about virtualized servers: TLB size does matter there!