This article is part of a series on Test-Driven Systems Administration.
Having decided that we’re going to test the hell out of everything, we need to settle on a language for both the tests we want to run, and the data we want to collect. In trying to building a systems test tool, there is a series of design decisions that flow from having a sytems admin perspective.
Basic Design Principles
- Low friction: it should be simple to write tests.
- Didactic: Examining existing tests should yield something meaningful. They should serve as a knowledge-sharing channel.
- Language independence: it should be easy to use your language of choice to write tests.
- Ubiquity: run-time dependencies should not be an obstacle to deploying and using a tool.
- Safety: it will be common for tests to run with elevated privileges.
This post is about all of these things, but principally about the testing language. At the highest level, we can divide language into vocabulary and grammar. It’s worth considering these separately when we look at systems testing.
Vocabulary
It occurs to me that most sysadmins already have a perfectly good vocabulary for systems testing: shell one-liners and scripts. There are a few … ah, let’s say … improvement opportunities there. I’ll start by confessing that I love one-liners. I would reckon that 80% of what I want to do to assess a Unix system can be done with one-liners. Preliminary health checks on most servers are done with a handful of standard commands. Putting together a 6-command pipeline usually gets me cackling with glee at my own ingenuity. So one-liners are probably going to be a big part of my testing toolkit.
As for shell scripts, there are a few classic shell scripting patterns. In general, they violate my “Low Friction” principle.
- One-offs, ranging from a simple for-loop at the shell prompt to single-purpose throwaway scripts (say, up to a couple of dozen lines).
- Write a 20-line shell script that basically just builds and executes one command. By the time you get to the bit that executes the command, it looks something like “$DO_CMD $CMD_OPTS $ARGS”, so you need to trace the script logic, include debug code, or insert “set -x” all over the place to figure out what it’s doing. My personal favourite is a 65-line shell script that does an rsync in a for-loop and subversion commit. This violates my “Didactic” principle.
- Write a 1000-line shell script that is effectively 100 little scripts in one, and sends you blind if you try and maintain it. Because it’s a shell script interpreted top-to-bottom, you have to put all your functions at the top and no one can find where the main loop starts. This violates the “Sanity” principle.
- Come to your senses, abandon the shell script, and rewrite it in Perl (Python, Ruby, whatever your systems scripting language of choice is).
In practice, shell scripting tends to involve reusing a relatively small set of common idioms over and over. If you’re lucky, you’ll have a set of common libraries. But shell libraries have a tendency to be a bit opaque and non-portable in their own right (for example, useful as some of the things in Red Hat’s /etc/init.d/functions might be, nobody in their right mind is going to use them for portable shell scripts). If you’re less lucky, you’ll have some skeleton scripts that you can plug your specifics into. If you’re less lucky still, you do it from scratch every time, so no two scripts work quite alike (or you take more of a productivity hit than you should need to to automate something). There are a few well-known problems with shell scripts in general, but I’m not necessarily going to attack them head-on just yet.
- Portability is awful. You have GNU and POSIX variants of utilities, varying directory locations, and no guarantees about the output format (see, for example, what Red Hat aliases “ls” to by default). GNU systems tend to let you get away with Bash-isms even when called as /bin/sh. Linux doesn’t have a “real” Korn shell. Solaris paths can be ugly. The list goes on.
- Shell quoting rules can get ugly.
There is another useful idiom for shell scripts, and that’s the “foo.d/” directory that gets executed by a “run-parts” or similar calling mechanism. That’s good, and goes a long way to solving the 1000-line script problem, but doesn’t solve the problem of writing the same 20-line script with minor variations over and over.
The general idea of putting a bunch of scripts in a directory and pointing a test-runner at it is goodness. Which leads me to another design decision: use the file system as a database. This satisfies the “Ubiquity” principle, and besides, I’m generally in the Databases-Are-Evil camp. That’s not to say that storing tests in a database couldn’t come later, but it’s certainly not necessary.
The second aspect of this vocabulary is how to interpret the results of running some command. The most obvious is exit codes. This is not without its problems (quick, what exit code does /usr/bin/host return for NXDOMAIN replies on your system?), but in a controlled environment it’s a good place to start. It’s also as applicable to more serious systems glue languages as it is to shell. There’s also the presence or absence of output, the contents of the output, whether anything gets written to STDERR, or the reply codes for application protocols. We should be able to deal with all of these.
So to summarize the “vocabulary”, we have a pretty standard set of building blocks: One-liners, shell/perl/python/whatever scripts, exit codes, command output, and protocol reply codes.
Grammar
By “grammar” in a sytems testing language, I really mean the file formats and system APIs we expect to deal with.
If we’re going to move our most common logic idioms back from the test cases into the harness, we need to settle on some sort of format to describe the specifics of a test. An example might be:
command: /bin/grep/ 'my\.dns\.server' /etc/resolv.conf
return-codes:
0: OK I have the right nameserver
1: FAIL my.dns.server missing from resolv.conf
2: FAIL something went wrong with grep
This example encapsulates pretty much everything we want to check, with no program logic or environment variables getting in the way.
The most likely candidate for the test description format is a data-serialisation format, like YAML, JSON, Perl’s Storable or Data::Dumper, or, God Forbid, XML. Remember, since we’re abstracting all the logic out into the harness, all we need in the test description is the thing to be tested (usually a command to run) and a little bit of metadata, such as how to interpret the results. For concise representation of not-too-deeply-nested data structures, YAML seems like the best fit to me as a starting point:
- It’s better for human consumption than XML, so it’s a good choice for human-editable inputs.
- It’s not executable (unlike JSON or the native Perl serialisation formats), which is in line with the “Safety” principle.
- It’s relatively ubiquitous; there are good-quality YAML libraries for all the popular systems glue languages
The other thing a test harness needs to do is produce output you can use. Pass/fail counters on STDOUT are fine, but not so useful for examining the test output in more detail, for audit trails, for capturing trends, or for tarting up into web pages. So I want something that can produce different styles of output for the different presentation contexts. The same set of data serialisation formats I mentioned above would be a good start, along with syslog and pretty(-ish) STDOUT output. There are also specific testing protocols like TAP, TET, or others which would be useful to implement.
Next post: test formats.