The Windows Shutdown crapfest

Stability of main is a huge issue. It's an operating system, and almost all software running on your system transitively depends on it. A small break in the OS translates to a huge break in the user experience once it's been multiplied through a stack of 20 different layers of software.

Let's take a real-world example. You make a change in the graphics stack which changes the size of an internal data structure. This internal data structure is used only by the OS and is not exported or documented anywhere. But the Intel integrated driver grovels through private OS memory and flips bits on your internal data structure because they wanted to do something special for their bizarre hybrid-GPU setups. Now your seemingly innocuous change, which you tested and validated locally, is now causing the Intel kernel driver to crash on startup and an entire class of laptops is now refusing to boot. And no, you can't just break the Intel graphics driver. Even though it's their fault. There are business requirements to keep software compatible - that's part of the value of the OS.

This sort of thing happens All. The. Time. It demonstrates just one of the many ways a seemingly harmless change that was tested and validated locally and have catastrophic consequences elsewhere. Raymond Chen has been writing about this sort of thing on his blog The Old New Thing for years and years.

Windows is big, and extremely interconnected. The source tree is hundreds of gigabytes large. It takes 24 hours to perform a full build and produce installable media. Once a build is produced, it takes another 8-48 hours to run tests over the OS depending on which set of tests are run. And the test results are so large and complex that it requires a team of dev and quality managers to even interpret the results.

In other words, a developer can only run isolated and local tests and can't run tests across the entire product. The only entity with enough resources to even build and test your change across the product are the official nightly build servers and test labs.

The builds are spun and the tests are run once per day. If everyone checks in directly to main, then now we have a thousand untested checkins in mainline every day. If the build breaks or the tests fail, how do you investigate the regression in a sea of untested changes?

So a deep hierarchy is used. Perhaps 10-50 developers will share a branch 3 or 4 layers removed from main. Every night the builds happen, tests are run, and the results are interpreted by quality managers. If the results are green then on a regular cadence the changes from that branch are integrated upwards one level. The process repeats, over and over again, until code finally reaches main.

Can the branching structure be improved? Of course. But Windows is not only huge, it's used by literally billions of people on an enormous diversity of hardware and environments. It's hard.

/r/programming Thread Parent Link - moishelettvin.blogspot.com