It's Black Friday. If you're a IT admin for retail, knee deep into a crisis right now, please read this.

Can confirm. I work at AWS, and in a single week within a month of joining my current team, I managed to break the same canary three times, each time causing alarms which suggested a service may be experiencing a major, customer-facing outage. They were all false alarms, but every single one of them meant I got attention from director-level people.

I got promoted six or seven months later. The issues with the canary were, as far as I know, on my promo doc -- I fucked up, but I owned it, I wrote the documentation to prevent it in the future, I wrote the COE, and I learned from it. Failure is expected at the big shops, and dealing with it gracefully can be just as good for your career as avoiding it entirely.

/r/sysadmin Thread Parent