Thickheaded Thursday - July 12, 2018

Well /r/sysadmin, TIFU. We always talk about causing big production outages and how it's almost a ride of passage, but it sure doesn't feel that way when you're in the shit.

While I won't bore anyone with the details, I'm in the middle of a migration from a vmware cluster to a new hyper-v cluster. Yesterday night in my infinite wisdom, I decided to start one more migration at about 11 at night. And not just a migration, a domain controller. I had already transferred a secondary DC, so I didn't think much of it.

Experienced folks among you probably already see the mistake I made: I transferred the FSMO-holding domain controller using Microsoft Virtual Machine Converter. And then replication broke, ADFS broke, DNS broke and the DC isolated itself because of a USN rollback.

A long night, few hours of sleep and a long day later and the last thing to finally come online was ADFS. Everything is replicating fine again and on an operational level everything seems fine, but I still see things below the surface that don't look right. This is going to be a long weekend.

I learned a hard lesson in respecting active directory / domain controllers and to think twice and do some research before performing changes on it. While I'm definitely overworked, there's no excuse to just blindly start migrating critical pieces of infrastructure without doing my due diligence.

While I'm not superstitious, I don't think I'll be doing any major changes on Friday the 13th anymore.

/r/sysadmin Thread