Linus' reply on spinlocks vs mutexes

I've puzzled over this comment a bit and I don't get what you're hinting at. In the normal case of a "fast" mutex, which is the default (PTHREAD_MUTEX_TIMED_NP), you have a comparison which will fail, followed by a comparison which will succeed.

In the default case you have lock elision enabled, which will call into the lock elision macro. My gut says that using TSX for a completely uncontested lock is slower than the single cmpxchg of the traditional spinlock, but for argument's sake I'll say they're about the same.

If you disable lock ellision it calls into the normal lock macro, which is identical to what Malte called the "AMD Recommended" example. So you've got the exact same instructions or TSX, plus an unnecessary atomic load and two comparisons. How could that possibly be better than just using the spin lock in the first place and skipping that overhead?

/r/programming Thread Parent Link - realworldtech.com