I do not think Apple is even considering HBM at all. They just want to go for some absurd lenghts and lengths with standard memory and use it for GPU as well. They will just scale up what hey have with the A14. At premium that is to be said.
Also when we will see some HBM in phones ? Like a 256 bit interface tsv-ed directly to the SoC ? That would be probably a big benefit from efficiency standpoint
@GC2:CS: I am wondering the same thing about smartphones :
1. Would there be any significant power efficiency advantages using HBM memory in a smartphone ?
2. I would dream to have persistent memory like SOT-MRAM used in HBM memory stack used in a smartphone and Internet of Things (IoT) objects : it would provide a large fast memory that is non-volatile, it would clearly unlock new and tremendously improved user experiences !!!
Ex: smartphone with low power 128GB of HBM non volatile memory : no lpddr DRAM, neither flash storage…
HBM is about 10x more efficient per bit transferred than DDR. Varying by vendor and by chip, but roughly 4 pJ/bit for HBM2e vs. 40 pJ/bit for DDR4 or 5. HBM3 might be a bit higher energy than 2e due to the higher transfer rate: we will need to see the data sheets. But no way it is as inefficient as DDR4 or 5.
Even if you measure by capacity rather than by activity it is not clear which would win. Internally they use pretty much the same DRAM technology, and if they are not in use the interfaces idle very efficiently, so they probably have roughly the same idle power per GB. Overall, HBM is more efficient when it comes to getting work done, much more efficient.
Even RDRAM had power efficiency gains over SDRAM\DDR at the time, but it was down to power states. At full throttle, RAMBUS (and HBMx) will use more power but as TanjB said, at full power it is provides so much more bandwidth that the power efficiency is much better. It just isn't designed for power-sensitive applications.
BUT it could be. If the power states were tweaked and throttled, it could theoretically provide more bandwidth per watt than DDR and especially GDDR while probably undercutting the cost of GDDR assuming the memory controller is a cost-effective implementation (that's a rabbit hole I'm not jumping into - memory controller implementations are an even more complex topic, just ask AMD why their Ryzen architecture is so picky about memory latency, especially ranking)
Yes, HBM uses less power for an equivalent amount of non-HBM bandwidth. However, a single stack of HBM3 will deliver around 800GB/s. Four 16-bit channels of LPDDR5 deliver up to 51GB/s. Smartphones simply don't need HBM levels of bandwidth.
Maybe in the future, when they develop low-cost HBM, and if manufacturers chipletize smartphone SoC's, it might make sense to just put everything on the same interposer (I.e., to use HBM). Today's not that day.
As for MRAM: it isn't and won't be cost-competitive with current planar DRAM for *at the very least* a decade, and if we get 3D-DRAM and/or capacitorless DRAM, you can pretty much forget it.
8GB of SOT-MRAM would already be prohibitively expensive, 128GB would only be in a consumer product for a Saudi oil prince. (It would cost me 154k USD from Mouser, disregarding the probable bulk discount)
Don't get me wrong, MRAM is very interesting, maybe even close to a theoretically "perfect memory", but today it is better as a replacement for embedded SRAM, NOR flash, etc... (Where you only need 1MB of it anyway) and, in one or two decades, possibly replace L3 SRAM caches. It's otherwise simply too large/too complex of a cell (read: very costly) to be used as a main memory, not to speak of storage.
I'm more interested in seeing capacitorless DRAM, 3D DRAM, 3D PCM, and in a few years maybe some non-Intel competitor to Optane based on something like CeRAM, that is promised to be easier to manufacture. But the other technologies will come at most as a complement to DRAM for at least two more decades, partly thanks to the fact that real R&D (The lengthy work of material and process optimization) is tied down in what makes companies money in the short term. And I doubt anything is replacing NAND in the next decades.
Not all ECC is the same. It appears the built-in ECC is just SEC, same as in LPDDR5 and DDR5 chips, which can only correct single bits and cannot report when errors are not corrected. The industry does not publish data on expected error rates of their chips so we have to guess: it might be reasonable to guess that is 90% of the actual errors.
They'll have to put it in RTX 4080/4090. It would be stupid not to considering the absurd power requirements of 80/90 series cards. GDDR6X is here to stay for mainstream products though.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
19 Comments
Back to Article
lemurbutton - Wednesday, October 20, 2021 - link
Nice. Might go into the upcoming Mac Pro 40 CPU core / 128 GPU core monster SoC.web2dot0 - Wednesday, October 20, 2021 - link
Guarantee it will ... 8/16Channel HBM3 ...sonny73n - Thursday, October 21, 2021 - link
Keep dreaming, sheeple!GC2:CS - Wednesday, October 20, 2021 - link
I do not think Apple is even considering HBM at all. They just want to go for some absurd lenghts and lengths with standard memory and use it for GPU as well. They will just scale up what hey have with the A14. At premium that is to be said.Also when we will see some HBM in phones ? Like a 256 bit interface tsv-ed directly to the SoC ?
That would be probably a big benefit from efficiency standpoint
Diogene7 - Wednesday, October 20, 2021 - link
@GC2:CS: I am wondering the same thing about smartphones :1. Would there be any significant power efficiency advantages using HBM memory in a smartphone ?
2. I would dream to have persistent memory like SOT-MRAM used in HBM memory stack used in a smartphone and Internet of Things (IoT) objects : it would provide a large fast memory that is non-volatile, it would clearly unlock new and tremendously improved user experiences !!!
Ex: smartphone with low power 128GB of HBM non volatile memory : no lpddr DRAM, neither flash storage…
DanNeely - Wednesday, October 20, 2021 - link
IIRC HBM is significantly higher in power than standard DDR memory. It's better than GDDR; but that's a relatively low bar for power efficiency.TanjB - Wednesday, October 20, 2021 - link
HBM is about 10x more efficient per bit transferred than DDR. Varying by vendor and by chip, but roughly 4 pJ/bit for HBM2e vs. 40 pJ/bit for DDR4 or 5. HBM3 might be a bit higher energy than 2e due to the higher transfer rate: we will need to see the data sheets. But no way it is as inefficient as DDR4 or 5.Even if you measure by capacity rather than by activity it is not clear which would win. Internally they use pretty much the same DRAM technology, and if they are not in use the interfaces idle very efficiently, so they probably have roughly the same idle power per GB. Overall, HBM is more efficient when it comes to getting work done, much more efficient.
Samus - Thursday, October 21, 2021 - link
Even RDRAM had power efficiency gains over SDRAM\DDR at the time, but it was down to power states. At full throttle, RAMBUS (and HBMx) will use more power but as TanjB said, at full power it is provides so much more bandwidth that the power efficiency is much better. It just isn't designed for power-sensitive applications.BUT it could be. If the power states were tweaked and throttled, it could theoretically provide more bandwidth per watt than DDR and especially GDDR while probably undercutting the cost of GDDR assuming the memory controller is a cost-effective implementation (that's a rabbit hole I'm not jumping into - memory controller implementations are an even more complex topic, just ask AMD why their Ryzen architecture is so picky about memory latency, especially ranking)
Wereweeb - Wednesday, October 20, 2021 - link
Yes, HBM uses less power for an equivalent amount of non-HBM bandwidth. However, a single stack of HBM3 will deliver around 800GB/s. Four 16-bit channels of LPDDR5 deliver up to 51GB/s. Smartphones simply don't need HBM levels of bandwidth.Maybe in the future, when they develop low-cost HBM, and if manufacturers chipletize smartphone SoC's, it might make sense to just put everything on the same interposer (I.e., to use HBM). Today's not that day.
As for MRAM: it isn't and won't be cost-competitive with current planar DRAM for *at the very least* a decade, and if we get 3D-DRAM and/or capacitorless DRAM, you can pretty much forget it.
8GB of SOT-MRAM would already be prohibitively expensive, 128GB would only be in a consumer product for a Saudi oil prince. (It would cost me 154k USD from Mouser, disregarding the probable bulk discount)
Don't get me wrong, MRAM is very interesting, maybe even close to a theoretically "perfect memory", but today it is better as a replacement for embedded SRAM, NOR flash, etc... (Where you only need 1MB of it anyway) and, in one or two decades, possibly replace L3 SRAM caches. It's otherwise simply too large/too complex of a cell (read: very costly) to be used as a main memory, not to speak of storage.
I'm more interested in seeing capacitorless DRAM, 3D DRAM, 3D PCM, and in a few years maybe some non-Intel competitor to Optane based on something like CeRAM, that is promised to be easier to manufacture. But the other technologies will come at most as a complement to DRAM for at least two more decades, partly thanks to the fact that real R&D (The lengthy work of material and process optimization) is tied down in what makes companies money in the short term. And I doubt anything is replacing NAND in the next decades.
TanjB - Wednesday, October 20, 2021 - link
Not all ECC is the same. It appears the built-in ECC is just SEC, same as in LPDDR5 and DDR5 chips, which can only correct single bits and cannot report when errors are not corrected. The industry does not publish data on expected error rates of their chips so we have to guess: it might be reasonable to guess that is 90% of the actual errors.This work shows that additional, external ECC will still be needed to get high reliability with HBM3:
https://ieeexplore.ieee.org/abstract/document/9556...
Shmee - Wednesday, October 20, 2021 - link
Cool news, I hope they put this in video cards. Fury and Vega were pretty great with it.blanarahul - Thursday, October 21, 2021 - link
384 bit GDDR6X @ 21 GT/s = 1008 GB/sec @ 24 GB capacity2048 bit HBM3 @ 6.4 GT/s = 1638 GB/sec @ 32/48 GB capacity
They'll have to put it in RTX 4080/4090. It would be stupid not to considering the absurd power requirements of 80/90 series cards. GDDR6X is here to stay for mainstream products though.
TheinsanegamerN - Thursday, October 21, 2021 - link
Both got beaten easily by GDDR5 and GDDR5X equipped cards for lower prices and lower power consumption. What made them so great?Oxford Guy - Sunday, October 24, 2021 - link
The small size of Fiji made the Nano possible. It was pretty neat, I suppose.Oxford Guy - Sunday, October 24, 2021 - link
'Minimizing stack height is beneficial regardless of standards'Unless it reduces yield too much and/or worsens the cost/benefit (competitiveness) ratio too much in some other respect.
FLORIDAMAN85 - Monday, October 25, 2021 - link
Me: Oh, new HBM3, we'll be getting that in our next graphics cards, right?Tech Industry:.......
Me:... We'll be getting that in our next graphics cards, right?
Oxford Guy - Monday, October 25, 2021 - link
AMD: 'How do you like our latest PolarisForever™ cards?'Nvidia: 'Hey hey... How about that swell new OneKidney™ plan?'
Intel: 'Well... uh... We're supply constrained. Cool, right?'
FLORIDAMAN85 - Monday, October 25, 2021 - link
What can't we do with a terrabyte of bandwidth?Can't wait to make a RAMdisc with this.
Vitor - Thursday, October 28, 2021 - link
Imagine a Mac workstation with 256gb of this memory and soc built on tsmc 2nm. Technically feasiable in 3 years.