Apple Announces M1 Pro & M1 Max: Giant New Arm SoCs with All-Out Performance

Name: Apple Announces M1 Pro & M1 Max: Giant New Arm SoCs with All-Out Performance
Item: Apple Announces M1 Pro & M1 Max: Giant New Arm SoCs with All-Out Performance
Author: Andrei Frumusanu

by Andrei Frumusanu on October 18, 2021 4:00 PM EST

373 Comments | Add A Comment

373 Comments

Today’s Apple Mac keynote has been very eventful, with the company announcing a new line-up of MacBook Pro devices, powered by two different new SoCs in Apple’s Silicon line-up: the new M1 Pro and the M1 Max.

The M1 Pro and Max both follow-up on last year’s M1, Apple’s first generation Mac silicon that ushered in the beginning of Apple’s journey to replace x86 based chips with their own in-house designs. The M1 had been widely successful for Apple, showcasing fantastic performance at never-before-seen power efficiency in the laptop market. Although the M1 was fast, it was still a somewhat smaller SoC – still powering devices such as the iPad Pro line-up, and a corresponding lower TDP, naturally still losing out to larger more power-hungry chips from the competition.

Today’s two new chips look to change that situation, with Apple going all-out for performance, with more CPU cores, more GPU cores, much more silicon investment, and Apple now also increasing their power budget far past anything they’ve ever done in the smartphone or tablet space.

The M1 Pro: 10-core CPU, 16-core GPU, 33.7bn Transistors in 245mm²

The first of the two chips which were announced was the so-called M1 Pro – laying the ground-work for what Apple calls no-compromise laptop SoCs.

Apple started off the presentation with a showcase of the packaging, there the M1 Pro is shown to continue to feature very custom packaging, including the still unique characteristic that Apple is packaging the SoC die along with the memory dies on a single organic PCB, which comes in contrast to other traditional chips such as from AMD or Intel which feature the DRAM dies either in DIMM slots, or soldered onto the motherboard. Apple’s approach here likely improves power efficiency by a notable amount.

The company divulges that they’ve doubled up on the memory bus for the M1 Pro compared to the M1, moving from a 128-bit LPDDR4X interface to a new much wider and faster 256-bit LPDDR5 interface, promising system bandwidth of up to 200GB/s. We don’t know if that figure is exact or rounded, but an LPDDR5-6400 interface of that width would achieve 204.8GB/s.

In a much-appreciated presentation move, Apple actually showcased the die shots of both the M1 Pro and M1 Max, so we can have an immediate look at the chip’s block layout, and how things are partitioned. Let’s start off with the memory interfaces, which are now more consolidated onto two corners of the SoC, rather than spread out along two edges like on the M1. Because of the increased interface width, we’re seeing quite a larger portion of the SoC being taken up by the memory controllers. However, what’s even more interesting, is the fact that Apple now apparently employs two system level cache (SLC) blocks directly behind the memory controllers.

Apple’s system level cache blocks have been notable as they serve the whole SoC, able to amplify bandwidth, reduce latency, or simply just save power by avoiding memory transactions going off-chip, greatly improving power efficiency. This new generation SLC block looks quite a bit different to what we’ve seen on the M1. The SRAM cell areas look to be larger than that of the M1, so while we can’t exactly confirm this right now, it could signify that each SLC block has 16MB of cache in it – for the M1 Pro that would mean 32MB of total SLC cache.

On the CPU side of things, Apple has shrunk the number of efficiency cores from 4 to 2. We don’t know if these cores would be similar to that of the M1 generation efficiency cores, or if Apple adopted the newer generation IP from the A15 SoC – we had noted that the new iPhone SoC had some larger microarchitectural changes in that regard.

On the performance core side, Apple has doubled things up to 8 cores now. Apple’s performance cores were extremely impressive on the M1, however were lagging behind other 8-core SoCs in terms of multi-threaded performance. This doubling up of the cores should showcase immense MT performance boosts.

On the die shot, we’re seeing that Apple is seemingly mirroring two 4-core blocks, with the L2 caches also being mirrored. Although Apple quotes 24MB of L2 here, I think it’s rather a 2x12MB setup, with an AMD core-complex-like setup being used. This would mean that the coherency of the two performance clusters is going over the fabric and SLC instead. Naturally, this is speculation for now, but it’s what makes most sense given the presented layout.

In terms of CPU performance metrics, Apple made some comparisons to the competition – in particular the SKUs being compared here were Intel’s Core i7-1185G7, and the Core i7-11800H, 4-core and 8-core variants of Intel’s latest Tiger Lake 10nm 'SuperFin' CPUs.

Apple here claims, that in multi-threaded performance, the new chips both vastly outperform anything Intel has to offer, at vastly lower power consumption. The presented performance/power curves showcase that at equal power usage of 30W, the new M1 Pro and Max are 1.7x faster in CPU throughput than the 11800H, whose power curve is extremely steep. Whereas at an equal performance levels – in this case using the 11800H's peak performance – Apple says that the new M1 Pro/Max achieves the same performance with 70% lower power consumption. Both figures are just massive discrepancies and leap ahead of what Intel is currently achieving.

Alongside the powerful CPU complexes, Apple is also supersizing their custom GPU architecture. The M1 Pro now features a 16-core GPU, with an advertised compute throughput performance of 5.2 TFLOPs. What’s interesting here, is that this new much larger GPU would be supported by the much wider memory bus, as well as the presumably 32MB of SLC – this latter essentially acting similarly to what AMD is now achieving with their GPU Infinity Cache.

Apple’s GPU performance is claimed to vastly outclass any previous generation competitor integrated graphics performance, so the company opted to make direct comparisons to medium-end discrete laptop graphics. In this case, pitting the M1 Pro against a GeForce RTX 3050 Ti 4GB, with the Apple chip achieving similar performance at 70% less power. The power levels here are showcased as being at around 30W – it’s not clear if this is total SoC or system power or Apple just comparing the GPU block itself.

Alongside the GPU and CPUs, Apple also noted their much-improved media engine, which can now handle hardware accelerated decoding and encoding of ProRes and ProRes RAW, something that’s going to be extremely interesting to content creators and professional videographers. Apple Macs have generally held a good reputation for video editing, but hardware accelerated engines for RAW formats would be a killer feature that would be an immediate selling point for this audience, and something I’m sure we’ll hear many people talk about.

The M1 Max: A 32-Core GPU Monstrosity at 57bn Transistors & 432mm²

Alongside the M1 Pro, Apple also announced a bigger brother – the M1 Max. While the M1 Pro catches up and outpaces the laptop competition in terms of performance, the M1 Max is aiming at delivering something never-before seen: supercharging the GPU to a total of 32 cores. Essentially it’s no longer an SoC with an integrated GPU, rather it’s a GPU with an SoC around it.

The packaging for the M1 Max changes slightly in that it’s bigger – the most obvious change is the increase of DRAM chips from 2 to 4, which also corresponds to the increase in memory interface width from 256-bit to 512-bit. Apple is advertising a massive 400GB/s of bandwidth, which if it’s LPDDR5-6400, would possibly be more exact at 409.6GB/s. This kind of bandwidth is unheard of in an SoC, but quite the norm in very high-end GPUs.

On the die shot of the M1 Max, things look quite peculiar – first of all, the whole top part of the chip above the GPU essentially looks identical to the M1 Pro, pointing out that Apple is reusing most of the design, and that the Max variant simply grows downwards in the block layout.

The additional two 128-bit LPDDR5 blocks are evident, and again it’s interesting to see here that they’re also increasing the number of SLC blocks along with them. If indeed at 16MB per block, this would represent 64MB of on-chip generic cache for the whole SoC to make use of. Beyond the obvious GPU uses, I do wonder what the CPUs are able to achieve with such gigantic memory bandwidth resources.

The M1 Max is truly immense – Apple disclosed the M1 Pro transistor count to be at 33.7 billion, while the M1 Max bloats that up to 57 billion transistors. AMD advertises 26.8bn transistors for the Navi 21 GPU design at 520mm² on TSMC's 7nm process; Apple here has over double the transistors at a lower die size thanks to their use of TSMC's leading-edge 5nm process. Even compared to NVIDIA's biggest 7nm chip, the 54 billion transistor server-focused GA100, the M1 Max still has the greater transistor count.

In terms of die sizes, Apple presented a slide of the M1, M1 Pro and M1 Max alongside each other, and they do seem to be 1:1 in scale. In which case, the M1 we already know to be 120mm², which would make the M1 Pro 245mm², and the M1 Max about 432mm².

Most of the die size is taken up by the 32-core GPU, which Apple advertises as reaching 10.4TFLOPs. Going back at the die shot, it looks like Apple here has basically mirrored their 16-core GPU layout. The first thing that came to mind here was the idea that these would be 2 GPUs working in unison, but there does appear to be some shared logic between the two halves of the GPU. We might get more clarity on this once we see software behavior of the system.

In terms of performance, Apple is battling it out with the very best available in the market, comparing the performance of the M1 Max to that of a mobile GeForce RTX 3080, at 100W less power (60W vs 160W). Apple also includes a 100W TDP variant of the RTX 3080 for comparison, here, outperforming the NVIDIA discrete GPU, while still using 40% less power.

Today's reveal of the new generation Apple Silicon has been something we’ve been expecting for over a year now, and I think Apple has managed to not only meet those expectations, but also vastly surpass them. Both the M1 Pro and M1 Max look like incredibly differentiated designs, much different than anything we’ve ever seen in the laptop space. If the M1 was any indication of Apple’s success in their silicon endeavors, then the two new chips should also have no issues in laying incredible foundations for Apple’s Mac products, going far beyond what we’ve seen from any competitor.

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

373 Comments

View All Comments

Blark64 - Monday, October 18, 2021 - link
It’s clear from your comments here that you don’t know anything about modern high-end animation and vfx production. There’s a multiplicity of roles (animation, fx, lighting, layout, simulation, etc.), all with different workstation requirements. CUDA is mostly useless for an animator, say, who needs responsive viewport playback of animated characters, and which is CPU bound. An fx animator or simulation artist, on the other hand, could make use of CUDA. High end studios are mostly not using CUDA for rendering, as their huge scenes don’t fit in VRAM, and out of core memory reduces the GPU render advantage significantly. These new Mac laptops could render scenes in Octane or Redshift that are currently impractical on the vast majority of NVidia cards, due to their comparatively massive memory pool.
web2dot0 - Tuesday, October 19, 2021 - link
As others have pointed out. You are just talking out of your ass because your Anti-Apple fanboyism is showing.

If movie studios thinking Apple hardware is all shit, why do you think they are buying MacPro by the truckload? LOL.

Hope we have waken you up from your excessive koolaid drinking.

Specs are specs.

MBP16 64GB, M1Max 32GPU Cores, XDR Display 120Hz are gonna destroy any PC laptops you throw at it at their thermal envelope. Those are just facts buddy. Keep coping.

Competition is good. I thought almost everyone in this forum like a healthy dose of competitions.

Intel has been sleep on their wheels and AMD is coming late to the game. Apple is turning the table upside down.

Time for the industry to innovate. Apple took the lead, now it's time for the rest of the industry to wake up or be left behind.
R_Type - Monday, October 18, 2021 - link
Lets have some perspective here. This is their 2nd n5 SoC. Everybody else is playing this game on second class processes:

"AMD advertises 26.8bn transistors for the Navi 21 GPU design at 520mm², Apple here has over double the transistors at a lower die size."

The same goes for Intel (but that's their own fault).
Silver5urfer - Monday, October 18, 2021 - link
What are you talking ?

The laptops they are comparing to are crippled junk. If you want real PC Laptop look at Clevo X170SM which runs 10900K at 5.0GHz all core OC. That will destroy this overpriced garbage which beats a measly 11800H TGL 10nmSF with 48W PL1 and locked down clocks at 4.6GHz max Turbo. The GPU on this as per Apple claims is a 2080 Class not 3080. And Clevo X170SM has a closer to desktop 3080 MXM GPU.

Finally the NVMe Storage, PC laptops have replaceable SSDs, HDDs / 2.5" SATA SSDs. And replaceable components if anything goes wrong. This abomination is a soldered POS design at screaming $2500 cost for which I can build a real desktop powerhouse which can run Linux and Windows and run anything that I can throw at it. Or get a maxed out X170SM CLEVO.

AMD is going to have Zen 4 Raphael on TSMC 5N that will destroy these CPUs and once the Hopper and RDNA3 with MCM arrive in the pure performance aspect this will be obliterated, as Nvidia is having 150% CUDA core count increase over the GA102 on their new Hopper or whatever arch they call it.

Without power cable ? Did you even see M1 ? Once you run it at full speed the efficiency drops like a brick. It's basic Physics. There's no way this SoC is going to sip power and perform around RTX2070/2080 Class. The Clevo X170SM is going to wreck havoc if unleashed that is from a 2020 CPU and 2020 GPU performance.
robotManThingy - Monday, October 18, 2021 - link
And just think, this is only a mobile chip! They clearly have a high-end desktop chip coming for the MacPro. I'm guessing it will be called the M1 Extreme. Whatever they call it, it will completely redefine Apple's position in the market and you can bet that unified memory will play a huge role in it's success.
Bp_968 - Wednesday, October 20, 2021 - link
This isn't really true. Just like you can't expect to turn a 3080 down to 100w and get exactly 1/3rd the performance you can't expect to take a 30w part and goose it up to 300w and expect 10 times the performance.

Apple is specifically targeting this power usage. Intel or AMD are designing an arch to perform from 15w up to 200w, apple is designing an arch to work at one main target power usage (from tablet to lightweight laptop). Nvidia does the same thing in reverse and so their laptop parts always seem to struggle on a per watt basis VS a integrated gpu (i mean, is that really a surprise considering the laptop 3080 requires pcie access, its own memory subsystem, etc all that require significant power budget).

I suspect if AMD decided to make a APU with similar specs it could reach similar power levels. The potential weak link with x86 is that they dont have complete control over the OS like apple does. Of course many of us here also consider that a feature, not a weakness.

My wife asked about the new apple hardware and what it might mean. I said its a technical achievement, but not one thats likely to effect us in any meaningful way other than potentially pushing forward more competitive hardware from PC suppliers. And its true. If your not already an Apple "follower" its unlikely a faster, lower power laptop is suddenly going to turn you into one. Another poster said it would be a different thing altogether if they were releasing the hardware for direct sale, but apple has no desire or intention to do that.

Tldr: i can put the most powerful most efficient engine ever made in a truck and it doesn't really change much for you as a customer if what you need is a car.
Ppietra - Wednesday, October 20, 2021 - link
Apple doesn’t need to turn up the power consumption and decrease the efficiency of its individual GPU and CPU cores to increase performance.
The idea that is being talked about is that Apple will increase the number of cores by a factor of 4. 40 CPU cores, 128 GPU cores, in something similar to chiplet package.
It would loose some overall efficiency in performance per watt but Apple would continue to have an advantage in power consumption and would be extremely competitive in performance.
Tomatotech - Monday, October 18, 2021 - link
Silver5urfer: "PC laptops have replaceable SSDs, HDDs / 2.5" SATA SSDs."

Are you a comedian? Are you saying that proper laptops only have HDDs or SATA SSDs?

Do you realise how unbelievably slow a HDD is compared to the NVMe storage on these MBPs?

Let's take one of the best HDDs on the market, a Seagate Barracuda 7200.14, but feel free to use any other HDD, how does it do at 4K random read? 770KB/sec or so. Let's call it 0.8MB/sec to be generous.

Now take one of various NVMe PCIE4.0 SSDs, the WD Black SN850 or the Samsung 980 Pro. They all max around 1 million IOPs at 4k random read, which if I have my maths right, is about 4GB/sec. Or around 5,000 times faster than your 'fast HDD'. That's the kind of drive a proper modern performance laptop should have. Which is the kind of drive the Apple MBPs have.

What about your precious SATA SSDs? SATA tops out at 600MB/s for SATA III. The NVMe drives I mentioned above do 7+GB/s. That's 12 times faster.

As for the rest of your claims, you're making them before this new MBP has even been reviewed and independently benchmarked. And you're also comparing it to your imaginary Zen 4 Raphael which hasn't even been released yet.

You're full of the worst sort of FUD and vapourware combined with an extremely poor grasp of how tech is used in the real world.
Silver5urfer - Monday, October 18, 2021 - link
Dude what are you on ? I said SATA HDD or SSDs which is SATA III standard. PC laptops already have NVMe which I already mentioned. You are strawmanning to peak go and do your BS else where. A soldered POS is a soldered POS nothing is going to change that fact. 8TB of soldered junk if it goes kaput you and your precious $6000 laptop is a junk.
Henry 3 Dogg - Tuesday, October 19, 2021 - link
"8TB of soldered junk if it goes kaput you and your precious $6000 laptop is a junk."

No. An Apple motherboard repair is the same few hundred dollar price regardless of the amount of storage on your motherboard.

Yet again you don't know what you are talking about.

Apple Announces M1 Pro & M1 Max: Giant New Arm SoCs with All-Out Performance

The M1 Pro: 10-core CPU, 16-core GPU, 33.7bn Transistors in 245mm²

The M1 Max: A 32-Core GPU Monstrosity at 57bn Transistors & 432mm²

Post Your Comment

373 Comments

View All Comments

Blark64 - Monday, October 18, 2021 - link

web2dot0 - Tuesday, October 19, 2021 - link

R_Type - Monday, October 18, 2021 - link

Silver5urfer - Monday, October 18, 2021 - link

robotManThingy - Monday, October 18, 2021 - link

Bp_968 - Wednesday, October 20, 2021 - link

Ppietra - Wednesday, October 20, 2021 - link

Tomatotech - Monday, October 18, 2021 - link

Silver5urfer - Monday, October 18, 2021 - link

Henry 3 Dogg - Tuesday, October 19, 2021 - link

Log in

Don't have an account? Sign up now