So I’ve been thinking about, and working on, this for a little while.
It was prompted mainly by the title of a paper Geoff Halprin gave at last year’s SAGE-AU conference. Not having had the time to attend the conference itself last year, I have no idea whether my approach bears any resemblance at all to Geoff’s.
Broadly speaking, systems administration consists of two main tasks: managing planned change to systems, and managing unplanned incidents on systems. Everything else we do is just arranging affairs so that change is simpler and more deterministic, and incidents are shorter and less frequent.
How, then, can a test-driven approach help with this? It seems to me that we need two things:
- A test-first workflow that makes sense for systems management
- A language and toolkit for expressing tests and collecting the output.
This seems like a pretty simple task. I’ll explore it a bit in the next few weeks.