20nm Manufacturing Process

Both Samsung Semiconductor and TSMC delivered their first 20nm products in Q3 2014, but they don't represent the same jump in efficiency. Samsung's 28nm HKMG process varied a lot from TSMC's 28nm HPM process. While Samsung initially had a process lead with their gate-first approach when introducing 32nm HKMG and subsequently the 28nm shrink, TSMC went the route of gate-last approach. The advantage of the gate-last approach is that it allows for lower variance in the manufacturing process and being able to allow for better power characteristics. We've seen this as TSMC introduced the highly optimized HPM process in mobile. Qualcomm has been the biggest beneficiary as they've taken full advantage of this process jump with the Snapdragon 800 series as they moved from 28nm LP in previous SoCs.

In practical terms, Samsung is brought back on even terms with TSMC in terms of theoretical power consumption. In fact, 28nm HPM still has the same nominal transistor voltage as Samsung's new 20nm process.

Luckily Samsung provides useful power modeling values as part of the new Intelligent Power Allocation driver for the 5422 and 5430 so we can get a rough theoretical apples-to-apples comparison as to what their 20nm process brings over the 28nm one used in their previous SoCs.

I took the median chip bin for both SoCs to extract the voltage tables in the comparison and used the P=C*f*V² formula to compute the theoretical power figure, just as Samsung does in their IPA driver for the power allocation figures. The C coefficient values are also provided by the platform tables.

We can see that for the A15 cores, there's an average 24% power reduction over all frequencies, with the top frequencies achieving a good 29% reduction. The A7 cores see the biggest overall voltage drop, averaging around -125mV, resulting in an overall 40% power reduction and even 56% at the top frequency. It's also very likely that Samsung has been tweaking the layout of the cores for either power or die size; we've seen this as the block sizes of the CPUs have varied a lot between the 5410, 5420 and 5422, even though they were on the same process node.

While these figures provide quite a significant power reduction by themselves, they must be put into perspective with what Qualcomm is publishing for their Krait cores. The Snapdragon 805 on a median speed bin at 2.65GHz declares itself with a 965mW power consumption, going down to 57mW at 300MHz. While keeping in mind that these figures ignore L2 cache power consumption as Qualcomm feeds this on a dedicated voltage rail, it still gives us a good representation of how efficient the HPM process is. The highest voltages on the S805 are still lower than the top few frequencies found on both the 5430 and the 5433.

20nm does bring with itself a big improvement in die size. If we take the 5420 as the 28nm comparison part and match it against the 5430, we see a big 45% decrease on the A7 core size, and an even bigger 64% reduction on the A15 core size. The total cluster sizes remain relatively conservative in their scaling while shrinking about 15%; this is due to SRAM in the caches having a lower shrinking factor than pure logic blocks. One must keep in mind that auxiliary logic such as PLLs, bus interfaces, and various other small blocks are part of a CPU cluster and may also impact the effective scalability. Samsung also takes advantage of artificially scaling CPU core sizes to control power consumption, so we might not be looking at an apples-to-apples comparison, especially when considering that the 5430 is employing a newer major IP revision of the CPU cores.

Exynos 5420 vs Exynos 5430 block sizes
  Exynos 5420 Exynos 5430 Scaling Factor
A7 core 0.58mm² 0.4mm² 0.690
A7 cluster 3.8mm² 3.3mm² 0.868
A15 core 2.74mm² 1.67mm² 0.609
A15 cluster 16.49mm² 14.5mm² 0.879

The Mali T628 between the 5420 and the 5430 actually had an increase in die size despite the process shrink, but this is due to a big increase in the cache sizes.

Samsung regards their 20nm node as very short-lived and the 5430 and 5433 look to be the only high volume chips that will be coming out on the process as their attention is focused on shipping 14nm FinFET devices in the next few months. In fact at the Samsung Investor Forum 2014 they announced mass production of a new high-end SoC has already begun mid-November and will be ramping up to full volume in early 2015. I suspect this to be the Exynos 7420 as that is the successor SoC to the 5433.

All in all, the argument that this 20nm chip should be more power efficient than the competitors' 28nm is not completely factual and doesn't seem to hold up in practice. The process still seems young and unoptimized compared to what TSMC offers on 28nm.

Before we get to the performance and power figures, I'm handing things over to Ryan as we take a look at the architectural changes, starting with an analysis of the Cortex A53.

Note 4 with Exynos 5433 - An Overview Cortex A53 - Architecture
Comments Locked

135 Comments

View All Comments

  • aryonoco - Wednesday, February 11, 2015 - link

    Everyone is aware that developing a power-aware scheduler is a VERY hard problem. The Linux kernel doesn't have it, but neither does anyone else really.

    The problem is that when ARM developed big.LITTLE, they would have known that for it to work, it requires a well-designed power-aware scheduler. And they should have known that that's a very hard problem to solve in software. History is littered with great hardware architectures that should have performed a lot better, if only the software was up to it, e.g., Intel's Itanium or Transmeta's Crusoe. But history has time and time shown that architectures that require too clever a software solution around them just don't work (perhaps one should add AMD's Bulldozer to this list as well, seeming as AMD expected everyone to rewrite their software to become GPU aware).

    I remember back in the days KISS was a big mantra of Unix sysadmins, and for a good reason: you can optimize simple things very well. Witness the simple (by comparison) dual core Apple A8 that doesn't require any magical scheduler or a binary translator (Denver) and yet beats everyone else in practical tests. It's disheartening that the likes of ARM and Nvidia don't seem to have learnt this.
  • tuxRoller - Thursday, February 12, 2015 - link

    The article suggests that this (the scheduler) is work that Samsung (alone) should've done. I don't recall the author indicating that it's actually an unsolved problem in computer science (again, in a general purpose environment), as I indicated, BTW.
    big.LITTLE will certainly work best with such a scheduler, but even without you should expect to approach some efficiency that lies between the big and little cores. Even this half-hearted attempt isn't terribly worse than the android competition.
  • Andrei Frumusanu - Thursday, February 12, 2015 - link

    Power collapse is proper power gating on the individual cores in their respective CPUIdle states, your link is outdated and does not apply to new generation ARM cores.
  • tuxRoller - Thursday, February 12, 2015 - link

    Do you have a reference?
    To the best of my knowledge linaro are still working on hotplug.
    Also, for Linux, cpuidle refers to a specific governor that doesn't actually power down, but handles the c states, but it looks like arm uses it for suspension (http://events.linuxfoundation.org/sites/events/fil... slides 10 and 13).
  • Andrei Frumusanu - Friday, February 13, 2015 - link

    CPUIdle is the kernel framework that manages the CPU's idle states, such as WFI (Clock gating), core power collapse, cluster power collapse. CPUIdle states are C-states, but "C-states" is an ACPI denomination that is rarely used on ARM CPUs.

    Hotplug has been left for dead for a long time, it hasn't been used for PM in ARM CPUs since the A15/A7 generation. Today it's only used for like forcing cores off when in screen-off states or rare coarse power management for like battery savings modes for some vendors.
  • sgmuser - Thursday, February 12, 2015 - link


    I have Exynos and wanted that specifically for the Wolfson Audio. Sad to note that, its not exploited enough by Samsung.

    Did someone notice the RAM bandwidth. How much impact that it makes?
    Also, from personal experience, I find exynos seems to be smooth and never noticed lag for an user like me. Graphics performance could be better but again no visible issues for what I have been playing so far with such dense display.
  • giaf - Saturday, February 14, 2015 - link

    Very interesting and detailed article, thank you.

    I have been working with ARMv7A cores, and I am interested in the floating-point capabilities of new 64-bit ARM processors. What is the throughput of the main floating-point instructions (add, mul, MAC) for Cortex A53 and A57? Something similar to the test done in http://www.anandtech.com/show/6971/exploring-the-f...
  • thegeneral2010 - Wednesday, February 18, 2015 - link

    i just dont get it wat do u mean by "however it remains unclear whether we'll see this on the Note 4. My personal opinion remains that we won't be seeing this overhaul in Samsung's 5.0 Lollipop update." does this mean that exynos 5433 could be upgraded to 64bit on android 5.1 or later updates??
  • Andrei Frumusanu - Friday, February 20, 2015 - link

    At the time of the writing the Lolipop update was not yet released. Now it's out and it's not 64bit as I suspected. If they didn't update it now they won't ever update it and it will stay on AArch32.
  • thegeneral2010 - Sunday, February 22, 2015 - link

    so wat about that official patches in upstream linux?

Log in

Don't have an account? Sign up now