I need advise... Took over a horrible infrastructure...

I've done a few clean up operations. It usually takes a while and gets better in small increments.

Disclaimer : it usually gets worse before you get better, depending how broken stuff is. You'll make mistakes and you need your management onboard and committed to cover your ass when it happens.

A good place to start is documenting the way things are right now with inventories, network diagrams, descriptive notes of the various setups. It will help you set up metric to track your own progress and justify keeping people involved in the cleanup.

Next, figure out where you want to take things. I'd recommend :

Then start planning your cleanup operations in small increments :

  • identify and remove retired devices
  • identify and address single point of failures
  • identify what is and isn't a business critical asset
  • clean up your ipam, plan re-IP operations if needed
  • clean up racks, including racking, cabling, airflows and power distribution
  • define and implement standardized platforms with config templates
  • implement config management tools
  • implement log management and reporting tools
  • set up a wiki for the documentation
  • create document templates (VPN request forms, RCA reports, Network diagrams, design documents, ...)
  • set up a credential/password repository
  • clean up the ticketing system backlog, figure out what works and
  • audit the monitoring tools, create check profiles for the different devices, add missing checks

It a lot of work, it will take you a while but you can do a lot of improvement even on a budget.

Good luck.

/r/sysadmin Thread