Deep Dive into why Apple M1 beat Intel/AMD

Although decoding x86 is a bit trickier than ARM, the decoder hasn't been a principal limiter to performance in x86 land in a looong while. It's basically less than 4% of the overall power/transistor budget for most newer Intel/AMD cores.

Intel requires more bandwidth internally when decoding the instructions that default into microcode, which is handled by the trace caches. Whereas M1 requires far larger external instruction bandwidth (which is why they just put just about the largest on die L1 cache ever).

The principal difference, in performance, between M1 and the latest Intel/AMD parts is aligned with M1's significantly larger out-of-order resources. Which makes sense since they designed the uArch to take advantages of the density that <7nm process were going to give them (and they bet correctly).

/r/hardware Thread Parent