That Darn Memory Bus

Among the entire GTX 600 family, the GTX 660 Ti’s one unique feature is its memory controller layout. NVIDIA built GK104 with 4 memory controllers, each 64 bits wide, giving the entire GPU a combined memory bus width of 256 bits. These memory controllers are tied into the ROPs and L2 cache, with each controller forming part of a ROP partition containing 8 ROPs (or rather 1 ROP unit capable of processing 8 operations), 128KB of L2 cache, and the memory controller. To disable any of those things means taking out a whole ROP partition, which is exactly what NVIDIA has done.

The impact on the ROPs and the L2 cache is rather straightforward – render operation throughput is reduced by 25% and there’s 25% less L2 cache to store data in – but the loss of the memory controller is a much tougher concept to deal with. This goes for both NVIDIA on the design end and for consumers on the usage end.

256 is a nice power-of-two number. For video cards with power-of-two memory bus widths, it’s very easy to equip them with a similarly power-of-two memory capacity such as 1GB, 2GB, or 4GB of memory. For various minor technical reasons (mostly the sanity of the engineers), GPU manufacturers like sticking to power-of-two memory busses. And while this is by no means a true design constraint in video card manufacturing, there are ramifications for skipping from it.

The biggest consequence of deviating from a power-of-two memory bus is that under normal circumstances this leads to a card’s memory capacity not lining up with the bulk of the cards on the market. To use the GTX 500 series as an example, NVIDIA had 1.5GB of memory on the GTX 580 at a time when the common Radeon HD 5870 had 1GB, giving NVIDIA a 512MB advantage. Later on however the common Radeon HD 6970 had 2GB of memory, leaving NVIDIA behind by 512MB. This also had one additional consequence for NVIDIA: they needed 12 memory chips where AMD needed 8, which generally inflates the bill of materials more than the price of higher speed memory in a narrower design does. This ended up not being a problem for the GTX 580 since 1.5GB was still plenty of memory for 2010/2011 and the high pricetag could easily absorb the BoM hit, but this is not always the case.

Because NVIDIA has disabled a ROP partition on GK104 in order to make the GTX 660 Ti, they’re dropping from a power-of-two 256bit bus to an off-size 192bit bus. Under normal circumstances this means that they’d need to either reduce the amount of memory on the card from 2GB to 1.5GB, or double it to 3GB. The former is undesirable for competitive reasons (AMD has 2GB cards below the 660 Ti and 3GB cards above) not to mention the fact that 1.5GB is too small for a $300 card in 2012. The latter on the other hand incurs the BoM hit as NVIDIA moves from 8 memory chips to 12 memory chips, a scenario that the lower margin GTX 660 Ti can’t as easily absorb, not to mention how silly it would be for a GTX 680 to have less memory than a GTX 660 Ti.

Rather than take the usual route NVIDIA is going to take their own 3rd route: put 2GB of memory on the GTX 660 Ti anyhow. By putting more memory on one controller than the other two – in effect breaking the symmetry of the memory banks – NVIDIA can have 2GB of memory attached to a 192bit memory bus. This is a technique that NVIDIA has had available to them for quite some time, but it’s also something they rarely pull out and only use it when necessary.

We were first introduced to this technique with the GTX 550 Ti in 2011, which had a similarly large 192bit memory bus. By using a mix of 2Gb and 1Gb modules, NVIDIA could outfit the card with 1GB of memory rather than the 1.5GB/768MB that a 192bit memory bus would typically dictate.

For the GTX 660 Ti in 2012 NVIDIA is once again going to use their asymmetrical memory technique in order to outfit the GTX 660 Ti with 2GB of memory on a 192bit bus, but they’re going to be implementing it slightly differently. Whereas the GTX 550 Ti mixed memory chip density in order to get 1GB out of 6 chips, the GTX 660 Ti will mix up the number of chips attached to each controller in order to get 2GB out of 8 chips. Specifically, there will be 4 chips instead of 2 attached to one of the memory controllers, while the other controllers will continue to have 2 chips. By doing it in this manner, this allows NVIDIA to use the same Hynix 2Gb chips they already use in the rest of the GTX 600 series, with the only high-level difference being the width of the bus connecting them.

Of course at a low-level it’s more complex than that. In a symmetrical design with an equal amount of RAM on each controller it’s rather easy to interleave memory operations across all of the controllers, which maximizes performance of the memory subsystem as a whole. However complete interleaving requires that kind of a symmetrical design, which means it’s not quite suitable for use on NVIDIA’s asymmetrical memory designs. Instead NVIDIA must start playing tricks. And when tricks are involved, there’s always a downside.

The best case scenario is always going to be that the entire 192bit bus is in use by interleaving a memory operation across all 3 controllers, giving the card 144GB/sec of memory bandwidth (192bit * 6GHz / 8). But that can only be done at up to 1.5GB of memory; the final 512MB of memory is attached to a single memory controller. This invokes the worst case scenario, where only 1 64-bit memory controller is in use and thereby reducing memory bandwidth to a much more modest 48GB/sec.

How NVIDIA spreads out memory accesses will have a great deal of impact on when we hit these scenarios. In the past we’ve tried to divine how NVIDIA is accomplishing this, but even with the compute capability of CUDA memory appears to be too far abstracted for us to test any specific theories. And because NVIDIA is continuing to label the internal details of their memory bus a competitive advantage, they’re unwilling to share the details of its operation with us. Thus we’re largely dealing with a black box here, one where poking and prodding doesn’t produce much in the way of meaningful results.

As with the GTX 550 Ti, all we can really say at this time is that the performance we get in our benchmarks is the performance we get. Our best guess remains that NVIDIA is interleaving the lower 1.5GB of address while pushing the last 512MB of address space into the larger memory bank, but we don’t have any hard data to back it up. For most users this shouldn’t be a problem (especially since GK104 is so wishy-washy at compute), but it remains that there’s always a downside to an asymmetrical memory design. With any luck one day we’ll find that downside and be able to better understand the GTX 660 Ti’s performance in the process.

The GeForce GTX 660 Ti Review Meet The EVGA GeForce GTX 660 Ti Superclocked
Comments Locked


View All Comments

  • CeriseCogburn - Thursday, August 23, 2012 - link

    I really didn't read your rant just skimmed your crybaby whine.
    So who cares you had an emotional blowout. Take some midol.
  • Galidou - Thursday, August 23, 2012 - link

    Attacking and attacking again, you have so much respect it's almost admirable. Respect is the most important thing in the world, if you can't have some for even people you don't know, I'm sorry but you're missing on something here.
  • Galidou - Thursday, August 23, 2012 - link

    I love it when people state their disrespectful opinion as a fact. Really drives their point home, yep.
  • CeriseCogburn - Thursday, August 23, 2012 - link

    Take a look at your 7950 SKYRIM LOSS in triple monitor to the 660Ti and the 660Ti also beats the 7950 boost and the 7970 !

    5760x1080 4x aa 16x af


    YES, YOU DID YOUR "RESEARCH"... now you've lost every stupid argument you started. Stupid.
  • Galidou - Tuesday, September 4, 2012 - link

    Every review shows the 660ti under EVEN the 7870 and your review shows the 660 ti performing to the level of a 7970, flawed bullscrap. Your website has a problem, the same you have, it has a choosen side aka Fanboyism.

    I have both right now my wife uses the 660 ti in her pc for Guild wars 2 at 1080p and I bought the 7950 and overclocked both in my pc to test and the 7950 hands down tramples over the gtx 660 ti even both fully overclocked. I tested with skyrim on 3 monitor 5760*1080 and that's the only game I play.

    Now don't get MAD, I never said the gtx 660 ti is a bad card, it works wonders. But it gets trampled at 5760*1080 in skyrim end of the line...
  • TheJian - Monday, August 20, 2012 - link

    Actually I think they need to raise the clocks, and charge more, accepting the fact they will run hotter and use more watts. At least they can get more for the product, rather than having people saying you can OC them to 1100. Clock the normals at 900/1000 and the 7970@1050/1100 or so. Then charge more. Of course Nv is putting pricing pressure on them at the same time, but this move would allow them to be worth more out of the box so it wouldn't be as unreasonable. AT out of the box right now you can't charge more because they perform so poorly against what is being sold (and benchmarked) in the stores.

    With NV/Intel chewing them from both ends AMD isn't making money. But I think that's their fault with the mhz/pricing they're doing to themselves. They haven't ripped us off since the Athlon won for 3 years straight. Even then, they weren't getting real rich. Just making the profits they should have deserved. Check their 10yr profit summary and you'll see, they have lost 6bil. So I'd have to say they are NOT pricing/clocking their chips correctly, at least for this generation. These guys need to start making more money or they're going to be in bankruptcy by 2014 xmas.
    Last 12 months= sales 6.38bil = PROFITS= - 629 million! They aren't gouging us...They are losing their collective A$$es :(
    That's a LOSS of 629 million. Go back 10yrs its about a 6.x billion loss.

    While I hate the way Ryan did his review, AMD needs all the help they can get I guess... :) But Ryan needs to redo his recommendation (or lack of one) because he just looks like a buffoon when no monitors sell at 2560x1600 (30inchers? only 11, and less than this res), and shows less than 2% use this res also. He looks foolish at best not recommending based on 1920x1200 results which 98% of us use. He also needs to admit that Warhead is from 2008, and should have used Crysis 2 which is using an engine based on 27 games instead of CryEngine 2 from 2007 and only 7 games based on it. It's useless.
  • Galidou - Tuesday, August 21, 2012 - link

    ''profits they should have deserved''

    You speak like if they had to overcome Intel and Nvidia's performance is easy and it's all their fault because they work bad. AMD got a wonderful team, you speak like you ever worked there and they don't do shit, they sit on their chair and that's the result of their work.

    Well it isn't, if you wanan speak like that about AMD, do it if you work there. No one is better placed to say if a company is really good or bad than the employees themselves. So just stop speaking like if designing these over 3 billions transistor things is as easy as saying ''hello, my name is Nvidia fanboy and AMD is crap''.
  • CeriseCogburn - Thursday, August 23, 2012 - link

    AMD is crap. It's crap man, no getting around it.
  • Galidou - Thursday, August 23, 2012 - link

    Too late Cerise, you lost all credibility by not being able to have an objective(it means it is undistorted by emotions) opinion and you rather proved you're way too much emotive to speak about video cards manufacturer.

    You too speak like if you ever worked at AMD and sure it is not the case, just visiting their headquarters would make your eyes bleed because in your world, this place is related to hell, with an ambient temperature averaging 200 degrees celsius, surrounded by walls of flesh, where torture is a common thing. And in the end, the demons poop video cards and force you to buy or kill your family.
  • CeriseCogburn - Thursday, August 23, 2012 - link

    Your opinion - " i'm did my research ima getting my 7950 for my triple monitor SKYRIM..."

    Take a look at your 7950 SKYRIM LOSS in triple monitor to the 660Ti and the 660Ti also beats the 7950 boost and the 7970 !

    5760x1080 4x aa 16x af


    There isn't a palm big enough in the world to cover your face.

Log in

Don't have an account? Sign up now