Last week, Apple had unveiled their new generation MacBook Pro laptop series, a new range of flagship devices that bring with them significant updates to the company’s professional and power-user oriented user-base. The new devices particularly differentiate themselves in that they’re now powered by two new additional entries in Apple’s own silicon line-up, the M1 Pro and the M1 Max. We’ve covered the initial reveal in last week’s overview article of the two new chips, and today we’re getting the first glimpses of the performance we’re expected to see off the new silicon.

The M1 Pro: 10-core CPU, 16-core GPU, 33.7bn Transistors

Starting off with the M1 Pro, the smaller sibling of the two, the design appears to be a new implementation of the first generation M1 chip, but this time designed from the ground up to scale up larger and to more performance. The M1 Pro in our view is the more interesting of the two designs, as it offers mostly everything that power users will deem generationally important in terms of upgrades.

At the heart of the SoC we find a new 10-core CPU setup, in a 8+2 configuration, with there being 8 performance Firestorm cores and 2 efficiency Icestorm cores. We had indicated in our initial coverage that it appears that Apple’s new M1 Pro and Max chips is using a similar, if not the same generation CPU IP as on the M1, rather than updating things to the newer generation cores that are being used in the A15. We seemingly can confirm this, as we’re seeing no apparent changes in the cores compared to what we’ve discovered on the M1 chips.

The CPU cores clock up to 3228MHz peak, however vary in frequency depending on how many cores are active within a cluster, clocking down to 3132 at 2, and 3036 MHz at 3 and 4 cores active. I say “per cluster”, because the 8 performance cores in the M1 Pro and M1 Max are indeed consisting of two 4-core clusters, both with their own 12MB L2 caches, and each being able to clock their CPUs independently from each other, so it’s actually possible to have four active cores in one cluster at 3036MHz and one active core in the other cluster running at 3.23GHz.

The two E-cores in the system clock at up to 2064MHz, and as opposed to the M1, there’s only two of them this time around, however, Apple still gives them their full 4MB of L2 cache, same as on the M1 and A-derivative chips.

One large feature of both chips is their much-increased memory bandwidth and interfaces – the M1 Pro features 256-bit LPDDR5 memory at 6400MT/s speeds, corresponding to 204GB/s bandwidth. This is significantly higher than the M1 at 68GB/s, and also generally higher than competitor laptop platforms which still rely on 128-bit interfaces.

We’ve been able to identify the “SLC”, or system level cache as we call it, to be falling in at 24MB for the M1 Pro, and 48MB on the M1 Max, a bit smaller than what we initially speculated, but makes sense given the SRAM die area – representing a 50% increase over the per-block SLC on the M1.

 

The M1 Max: A 32-Core GPU Monstrosity at 57bn Transistors

Above the M1 Pro we have Apple’s second new M1 chip, the M1 Max. The M1 Max is essentially identical to the M1 Pro in terms of architecture and in many of its functional blocks – but what sets the Max apart is that Apple has equipped it with much larger GPU and media encode/decode complexes. Overall, Apple has doubled the number of GPU cores and media blocks, giving the M1 Max virtually twice the GPU and media performance.

The GPU and memory interfaces of the chip are by far the most differentiated aspects of the chip, instead of a 16-core GPU, Apple doubles things up to a 32-core unit. On the M1 Max which we tested for today, the GPU is running at up to 1296MHz  - quite fast for what we consider mobile IP, but still significantly slower than what we’ve seen from the conventional PC and console space where GPUs now can run up to around 2.5GHz.

Apple also doubles up on the memory interfaces, using a whopping 512-bit wide LPDDR5 memory subsystem – unheard of in an SoC and even rare amongst historical discrete GPU designs. This gives the chip a massive 408GB/s of bandwidth – how this bandwidth is accessible to the various IP blocks on the chip is one of the things we’ll be investigating today.

The memory controller caches are at 48MB in this chip, allowing for theoretically amplified memory bandwidth for various SoC blocks as well as reducing off-chip DRAM traffic, thus also reducing power and energy usage of the chip.

Apple’s die shot of the M1 Max was a bit weird initially in that we weren’t sure if it actually represents physical reality – especially on the bottom part of the chip we had noted that there appears to be a doubled up NPU – something Apple doesn’t officially disclose. A doubled up media engine makes sense as that’s part of the features of the chip, however until we can get a third-party die shot to confirm that this is indeed how the chip looks like, we’ll refrain from speculating further in this regard.

Huge Memory Bandwidth, but not for every Block
POST A COMMENT

492 Comments

View All Comments

  • vlad42 - Monday, October 25, 2021 - link

    And there you go making pure speculative claims without any factual basis for the quality of the ports. I could similarly make absurd claims such as every benchmark Intel's CPU looses is because that is just a bad port. Provide documented evidence it is a bad port as you are the one making that claim (and not bad Apple drivers, thermal throttling because they would not turn on the fans until the chip hit 85C, etc.).

    Face it, in the real world benchmarks this article provides, AMD's and Nvidia's GPUs are roughly 50% faster than Apple's M1 Max GPU.

    Also, a full node shrink and integrating a dGPU into the SOC would make it much more energy efficient. The node shrink should be obvious and this site has repeatedly demonstrated the significant energy efficiency benefits of integrating discrete components, such as GPUs, into the SOCs.
    Reply
  • jospoortvliet - Wednesday, October 27, 2021 - link

    Well they are 100% sure bad ports as this gpu didn't exist. The games are written for a different platform, different gpus and different drivers. That they perform far from optimal must be obvious as fsck - driver optimization for specific games and game optimization for specific cards, vendors and even drivers usually make the difference between amd and nvidia - 20-50% between entirely unoptimized (this) and final is not even remotely rare. So yeah this is an absolute worst case. And Aztec Ruins shows the potential when (mildly?) optimized - nearly 3080 levels of performance. Reply
  • Blastdoor - Monday, October 25, 2021 - link

    Apple's GPU isn't magic, but the advantage is real and it's not just the node. Apple has made a design choice to achieve a given performance level through more transistors rather than more Hz. This is true of both their CPU and GPU designs, actually. PC OEMs would rather pay less for a smaller, hotter chip and let their customers eat the electricity costs and inconvenience of shorter battery life and hotter devices. Apple's customers aren't PC OEMs, though, they're real people. And not just any real people, real people with $$ to spend and good taste . Reply
  • markiz - Tuesday, October 26, 2021 - link

    When you say "Apple has made a design choice", who did in fact make that choice? Can it e attributed to an individual?
    Also, why is nobody else making this choice? Simply economics, or other reasons?
    Reply
  • markiz - Tuesday, October 26, 2021 - link

    Apple customers having $$ and taste, at a time where 60% of USA has an iphone can not exactly be true. Every loser these days has an iphone.

    I know you were likely being specific in regards to Macbooks Pros, so I guess both COULD be true, but does sound very bad to say it.
    Reply
  • michael2k - Monday, October 25, 2021 - link

    That would be true if there were and AMD or NVIDIA GPU manufactured on TSMC N5P node.

    Since there isn't, a 65W Apple GPU will perform like a 93W AMD GPU at N7, and slightly higher still for an NVIDIA GPU at Samsung 8nm.

    That is probably the biggest reason they're so competitive. At 5nm they can fit far more transistors and clock them far lower than AMD or NVIDIA. In a desktop you can imagine they can clock higher 1.3GHz to push performance even higher. 2x perf at 2.6GHz, and power usage would only go up from 57W to 114W if there is no need to increase voltage when driving the GPU that fast.
    Reply
  • Wrs - Monday, October 25, 2021 - link

    All the evidence says M1 Max has more resources and outperforms the RTX 3060 mobile. But throw crappy/Rosetta code at the former and performance can very well turn into a wash. I don't expect that to change as Macs are mainly mobile and AAA gaming doesn't originate on mobile because of the restrictive thermals. It's just that Windows laptops are optimized for the exact same code as the desktops, so they have an easy time outperforming the M1's on games originating on Windows.

    When I wanna game seriously, I use a Windows desktop or a console, which outperforms any laptop by the same margin as Windows beats Mac OS/Rosetta in game efficiency. TDP is 250-600w (the consoles are more efficient because of Apple-like integration). Any gaming I'd do on a Windows laptop or an M1 is just casual. There are plenty of games already optimized for M1 btw - they started on iOS. /shrug
    Reply
  • Blastdoor - Tuesday, October 26, 2021 - link

    As things stand now, the Windows advantage in gaming is huge, no doubt.

    But any doubt about Apple's commitment to the Mac must surely be gone now. Apple has invested serious resources in the Mac, from top to bottom. If they've gone to all the work of creating Metal and these killer SOCs, why not take one more step and invest some money+time in getting optimized AAA games available on these machines? At this point, with so many pieces in place, it almost seems silly not to make that effort.
    Reply
  • techconc - Monday, October 25, 2021 - link

    It's hard to speak about these GPUs for gaming performance when the games you choose to run for your benchmark are Intel native and have to run under emulation. That's not exactly a showcase for native gaming performance. Reply
  • sean8102 - Tuesday, October 26, 2021 - link

    What games could they have used? The only two somewhat demanding ARM native macOS games are WoW, and Baldur's Gate 3. Reply

Log in

Don't have an account? Sign up now