The first M1 Macs have been a huge big success on a number of levels and the tech industry is keen to see the performance that Apple Silicon will unlock as the whole Mac lineup gets the custom chips. Now a former Apple engineer has shared interesting details on what key ARM advancements Apple made starting around 10 years ago that led to the magic of M1 Mac performance that we have today. And notably, Apple’s work really pushed the rest of the industry as it forged the leading edge with ARM.

Shac Ron, a former Apple kernel engineer shared some fascinating details about Apple’s work on its ARM chips over the years and gave some perspective on why the M1 chip is so powerful (h/t Steve Troughton-Smith).

The thread was kicked off with a response to a tweet about M1 Macs being impressive because of the cache, not ARM. Shac Ron disagrees and shared why.

Illustrating how far ahead of the curve Apple was, Ron notes that Apple’s first 64-bit ARM chip, the A7 launched in 2013 with its custom instruction set architecture (ISA). That meant ARM64 was birthed by Apple before ARM had its own “core design” ready to sell to third-parties.

Ron highlights that Apple started its work on ARM64 back in 2010 and by the time it launched in 2013, it really caught Qualcomm and Samsung off guard.

Arm64 didn’t appear out of nowhere, Apple contracted ARM to design a new ISA for its purposes. When Apple began selling iPhones containing arm64 chips, ARM hadn’t even finished their own core design to license to others.

— Shac Ron ₪ (@stuntpants) January 5, 2021

Going into more technical detail, Ron says that Apple’s bet on evolving ARM was to “go super-wide with low clocks” and “highly OoO.” That refers to Apple going with more and more cores and starting with lower clock speeds (that’s increased over time).

ARM designed a standard that serves its clients and gets feedback from them on ISA evolution. In 2010 few cared about a 64-bit ARM core. Samsung & Qualcomm, the biggest mobile vendors, were certainly caught unaware by it when Apple shipped in 2013.

The A7 had 2 cores at around 1.3GHz, now A14 has a 6-core CPU at up to 2.99 GHz, 4-core GPU, and 16-core Neural engine. Meanwhile, M1 Macs have 8-core CPUs with a 3.2GHz clock speed (4 high efficiency and 4 performance) 8-core GPU, and 16-core Neural engine.

By going with highly OoO (Out-of-Order) superscalar architecture Apple was able to leverage chips with ever-increasing transistors (16 billion on M1!!) Using OoO separates the front-end instruction set from the back-end execution. And all of that was possible with a custom ARM ISA designed by Apple.

Wrapping up, Ron believes that the incredible M1 Mac performance isn’t thanks to ARM ISA but rather ARM ISA is around because of the innovative work Apple started back in 2010.

Apple planned to go super-wide with low clocks, highly OoO, highly speculative. They needed an ISA to enable that, which ARM provided.

M1 performance is not so because of the ARM ISA, the ARM ISA is so because of Apple core performance plans a decade ago.