Looking around for examples of test-driven sysadmin, all I can find is people recommending, rather sensibly, that you test systems changes before deploying them to production. I’m interested in both a broader and more narrow view of a systems testing toolkit.
Broader in the sense that I want to test many more things than just planned change. If we’re talking about a test-driven approach (and, ultimately, a behaviour-driven approach), then we should apply a testing mindset to all the activities of systems administration; incident management, problem management, change management. More specific repeated activities like simple health-checks, verifying the correctness of data changes (as opposed to configuration changes), and so on.
Narrower in the sense that I want a simple, flexible toolkit that lets me express tests with a common language and collect results in a common format.
In all of this, simplicity and ubiquity are key considerations. They influence choice of language (and even of coding style), choice of file formats, and infrastructure design (hint: there isn’t any). I’ll go into more detail in my next post.
Anyway, developers have long had unit testing toolkits available to them: JUnit, Test::Harness, RSpec, the list goes on. While none of them are a great fit for systems testing, there is plenty of inspiration we can take from looking at them.