2010-01-27

Fuzz Testing

Fuzz Where can I find a free image that represents "fuzz"? This will have to suffice!

Hands up everyone who likes writing tests for their code!

Hello...?

Yeah. For the record, I'm a big fan of test-driven software creation... in theory. In practice, it doesn't always happen.

But let me tell you one of the main benefits of having tests in place for your software: peace of mind. It's easy to rerun the tests, and you have that much more security in the knowledge that, when you make a change, the software is correct.

One approach is to hand-code the tests you want. For example, if you have a circular queue, you could put stuff in it, test it for fullness, empty it, then test it for emptiness, put a value on, pull it off and make sure it's the same value, and so on. And when you have "exercised" the queue with enough tests, you can feel confident that it's working properly.

But instead of talking about that, let's talk about another approach: Fuzz Testing. With this kind of testing, you throw every type of input you possibly can at the code, often random data. And you check to make sure that the code is handling it properly. (E.g. if your software crashes, it's probably not handling it properly.)

When writing code for old PalmOS devices, you'd run the software on an emulator for debugging purposes. The emulator came with a fuzz tester that would randomly tap the screen for as long as you wanted. And it would randomly enter data in any text fields it clicked on, and randomly select any GUI elements it found. The idea was that eventually all this random tapping would eventually uncover some bugs, even weird and unlikely corner cases.

So what about that circular queue? Can we fuzz-test that? First a quick review: a queue is a FIFO data structure; the first thing added to the queue will be the first thing removed from the queue, just like people waiting in a line (or a queue, for that matter). A circular queue is one in which the items in the queue are arranged in an array with the special property that an item added after the item at the end of the array will wrap around to the front of the array. Once the head of the queue catches up to the tail, there is no more room in the queue and it is full.

With fuzz-testing, we want to carpet-bomb the queue with random inputs that cover the spectrum of what can be done to it. And this, for a queue, basically means "enqueue" items and "dequeue" items; these are really the only manipulative operations at our disposal for this data structure.

But queues often have routines that allow you to interrogate them. So we can test if a queue is empty or full, or look to see how many items are enqueued.

And finally, the operations on queues can return error codes, and we can test if those are returning as expected. For instance, if a queue is full, we'd expect a subsequent enqueue operation to fail.

So we do this stuff, and we do a lot of it. At random. Say we go for _n _million rounds, and each round we randomly choose to enqueue a value or dequeue a value. We should run it long enough that the queue probably completely fills and empties many many times.

And we can test the return values to make sure enqueues when the queue is full fail, and dequeues when the queue is empty also fail. But this means we should independently track the fullness of the queue so we can verify that the queue is tracking it correctly, as well. (That is, why should we trust the queue to know when it's full? Maybe its queue-is-full detection code is busted!) When we enqueue a value successfully, we'll add one to our count, and when we dequeue a value successfully, we'll subtract one from our count. Each round we might as well also check our recorded queue count against the queue's—they should match.

And the values we dequeue should be what we expect. We can be a little bit tricky here and enqueue data in a predictable order, so that it dequeues in an predictable order. (This way we don't have to have a queue to store our test values that we're using to test our queue!) For instance, we could first enqueue 0, then 1, then 2, etc., in order. And that way we know that when we dequeue 17, the next number we dequeue should be 18. If it's not, our test reports a failure.

Also we can interrogate the queue each round to see if it is reporting accurate data about itself. If our count is equal to the queue capacity, the queue should tell us it is full when asked. And if our count is 0, the queue should tell us it's empty when asked. And if our count is non-zero, the queue should tell us it's not empty, and if our count is less than the queue capacity, it should tell us it's not full. That covers that!

Now compare this approach with a standard set of human-created corner-case tests. Is it as good? What's missing? Well, if I were a human, I'd put in a bunch of tests that verified that enqueues and dequeues behaved properly at the circular queue wraparound point, since it's the obvious edge case. But in the fuzz tests, above, we didn't even consider it! But that's OK, because we're counting on the fact that during our 10 million rounds of fuzz testing, the behavior of the queue at the wraparound point will probably be tested many times over.

Enough human-talk! Let's have some code that does exactly this for a circular queue implementation! What follows is the fuzz-testing code only, written in C:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define FUZZ_TEST_ROUNDS (10000000)

void test_queue(int cond, char *str)
{
    if (!cond) {
        // this code could certainly be improved
        fprintf(stderr, "TEST FAIL: %s\n", str); fflush(stderr);
        exit(EXIT_FAILURE);
    }
}

int main(void)
{
    int count;
    int tailval;
    int headval;
    struct queue q1;
    int i;
    int rv;
    int val;
    int enqueue_count, dequeue_count, full_count, empty_count;

    srand(time(NULL));

    queue_init(&q1);

    count = 0;
    tailval = headval = 0;
    enqueue_count = dequeue_count = full_count = empty_count = 0;

    for (i = 0; i < FUZZ_TEST_ROUNDS; i++) {

        // perform a random operation on the queue:
        switch (rand()%2) {
            case 0: // enqueue
                rv = queue_enqueue(&q1, headval);

                // make sure enqueue failed if it was supposed to:
                if (count == QUEUE_SIZE) {
                    test_queue(rv == -1, "enqueue 1");
                } else {
                    test_queue(rv != -1, "enqueue 2");
                }

                if (rv == 0) {
                    enqueue_count++;
                    headval++;
                    count++;
                }
                break;

            case 1: // dequeue
                rv = queue_dequeue(&q1, &val);

                // make sure dequeue failed if it was supposed to:
                if (count == 0) {
                    test_queue(rv == -1, "dequeue 1");
                } else {
                    test_queue(rv != -1, "dequeue 2");
                }

                if (rv == 0) {
                    dequeue_count++;

                    // make sure we dequeued the expected value:
                    test_queue(val == tailval, "dequeue val");

                    tailval++;
                    count--;
                }
                break;
        }

        // make sure queue_full() is reporting properly:
        if (count == QUEUE_SIZE) {
            test_queue(queue_full(&q1), "queue full 1");
            full_count++;
        } else {
            test_queue(!queue_full(&q1), "queue full 2");
        }

        // make sure queue_empty() is reporting properly:
        if (count == 0) {
            test_queue(queue_empty(&q1), "queue empty 1");
            empty_count++;
        } else {
            test_queue(!queue_empty(&q1), "queue empty 2");
        }

        // make sure queue_count() is reporting properly:
        test_queue(queue_count(&q1) == count, "queue count");
    }

    // print summary statistics for fun
    // (and to verify stuff actually happened)
    printf("  Fuzz rounds: %10d\n", FUZZ_TEST_ROUNDS);
    printf("Enqueue count: %10d\n", enqueue_count);
    printf("Dequeue count: %10d\n", dequeue_count);
    printf("   Full count: %10d\n", full_count);
    printf("  Empty count: %10d\n", empty_count);

    return EXIT_SUCCESS;
}

Update: one thing I did not do here is write out a log file that would allow us to exactly reproduce any crashes. As Peep pointed out in the comments, you'll want to do this to help actually fix any issues that arise. You can also replay the commands by choosing the same seed for your random number generator, but if the bug didn't come up until after 12 hours of running, that might not be a feasible solution.

And for your viewing pleasure, here is a ZIP file that contains the complete queue and fuzz test code.

Enjoy!

Beej's Bit Bucket

⚡ Tech and Programming Fun

Fuzz Testing