fuzzing

Discovering errors in a program that takes some form of text input is usually a quite challenging task. The errors can often be seemingly random, and as such, it is not easy to come up with test cases that would discover them. So it might be a good idea to employ a machine that can generate (pseudo-)random inputs really quickly to examine the program.
One such technique of generating test inputs is fuzzing. A fuzzer is then a program that provides the random inputs and feeds them to the program we want to examine. This technique is often used for software testing. It's also used in some cybersecurity applications and hacking. In this blog, I will try to walk through a thought process for creating a fuzzer intended for software testing. Let's begin.

what do we want?

So at this point, we should probably think about what would be the desired properties of our fuzzer:
  1. It should be pretty easy to use - we want to reduce the complexity from which testing suffers so that the heavy lifting is offloaded to the tool
  2. It should be lightweight and fast - as fuzzing as a technique relies on randomness, it does not provide guarantees about the program that more complex tools might. So we should be able to generate inputs quickly to provide a "good enough" guarantee faster than the heavier tools achieve their goals.
  3. It should examine as many paths as possible - to give the "good enough" guarantee, we might want to measure how well have we explored the program so far. Quite often, some coverage metric is used to guide the fuzzer when generating the inputs but more about that later.
So we have outlined that we want a tool that gives us as much information possible as fast as possible. And all of that with minimal effort from the user. Great, what a revolutionary set of thoughts! But it is important to think about how they are related. In this case, we are willing to sacrifice the strength of the guarantees for speed and simplicity of use. Maybe to integrate it into a CI? I will leave that to the opinion of the reader.

crafting a fuzzer

After laying out what we want from our fuzzer, we can move on to the drawing board to design one. We should first think about what information will be available when running the fuzzer. Do we have some set of interesting inputs available? Do we have a formal specification of how the input can look? Do we have some knowledge of the structure of the program we are examining?
As we are designing a fuzzer for software testing, it is reasonable to expect that has some knowledge of the program we want to test. So let's say we have the program's source code, and we can easily create some initial set of inputs - often called corpus in this context.
The initial set of inputs helps us satisfy the first requirement, as it should be enough to provide some valid inputs to the fuzzer. Those are not usually that hard to create, although the quality of the corpus directly affects the results the fuzzer can provide.
Using the source code, we can measure some form of coverage achieved by a single input as well as coverage achieved by our tool so far during the examination. We can use the code coverage metric to guide our fuzzer and satisfy the third requirement. I will explain this in more detail later.
Satisfying the second requirement is now our job when creating the tool itself. The main loop of our outlined fuzzer would look something like this:
Of course, this loop is a bit simplified. Ideally, we would like to collect inputs that caused an error in a separate set.
As you might have noticed, the fuzzer on its own doesn't have a condition that says it should end and that the algorithm is done. So we have to set an arbitrary condition, usually a time limit. The resulting set of inputs is then stored in the maintained set.
Also, there is a magical function called mutate that takes a selected input and (potentially) the set of all inputs. So what does it do? Well, that depends. Generally, it creates a new input by modifying the selected input. There are some heuristics on how to generate the interesting inputs, but we won't go there. These heuristics often use some information from the total set of inputs. For our purpose, it is enough that we consider random mutations like bit flips and replacement or insertion of some interesting values - for example, a NULL or something like that.
What that gives us is a new input slightly different from the original, but even if we begin with a set of valid inputs, we most likely get inputs that are not syntactically valid. But they are invalid only "by little" if that makes sense. Meaning the new input might find some unexpected behaviour that will occur in a really specific situation. Of course, this also means that we mainly explore the parts where the program handles the syntactic validity of the input.
Earlier I talked about coverage. Measuring coverage is useful because of the random nature of a fuzzer. If left unguided, it would most likely quickly deteriorate our inputs into absolute garbage by repeated mutations. We don't really want that, so we track how many coverage points have we already covered and if the new input covered some additional place. If yes, we can save this input for future mutations, as it might help us to get to other interesting places. It could also do a better job of covering parts already covered, so we save it or replace some older inputs with it, even though it didn't help us discover anything new.
Not letting the size of the set of candidate inputs explode also provides some obvious performance benefits. This can be further optimised by reducing input to a shorter one that still provides the same behaviour of the program.

does it work well?

You might be wondering how thorough can a fuzzer be or how "deep" it can go when examining the program. Well, usually, it goes more into width than depth. The usual behaviour is that it examines most of the paths pretty early in its operation, and then the progress begins to die off. This makes sense as examining deeper parts of the program requires many "choices" or mutations to align correctly. This is, of course, a consequence of our requirements.
A more concrete answer is that the approach described here quite resembles how the state of the art fuzzer American Fuzzy Lop[https://github.com/google/AFL], or AFL for short, works.

but what if I am more interested in the semantics of the input?

So we have created a fuzzer that examines the syntactic part of the input handling. But we can also want to look into how the program handles various inputs that at least satisfy the input grammar (although they might still be completely nonsensical). As you might have already guessed, we can also create a fuzzer that helps us to do this.
First, let's take a quick detour to random input generators like QuickCheck.

generator based tools

These tools take a specification of the input language in a grammar-like specification. Based on that, they randomly generate the inputs. They also often have additional features like minimising the error inducing input, but those are not relevant to us.
We are more interested in the random generation of inputs and how that works. These generators use some source of random data to "navigate" through the grammar and return the generated input. We can look at the random data as a stream of "untyped" bits. This observation is interesting as it will allow us to bend the tool to our image.

back to fuzzing

Random generating is a viable approach - as can be witnessed by how many tools of such sort exist - but we want to do better. The approach we will take is based on working with the stream of untyped bits, but instead of never ending stream, we will have fixed sequences of the "untyped" bits.
We will use the sequences to navigate through the grammar in the same way as the random generator does. But instead of throwing these bits away and going forward, we will save the sequence and apply our fuzzing algorithm to it.
We can basically use the random generator as it is, with the only difference is that we need to force it to read the random data from our sequence instead of the usual source of random bytes.
The main difference from our previous fuzzer is that we are not actually handling the inputs in the "fuzzing loop", as those are created just before they are fed to the program. Instead, we are performing the mutations on the random sequence, which leads to high level changes in the resulting input.
Maybe an example would come in handy. Let's have a sequence of bytes:
    0000 1011 … 0000 0101 | 0100 1000 | 0110 0101 | 0110 1100
    | 0110 1100 | 0110 1111 | …
And have some generator that uses the colour sequence to determine the length of a string (in this case 5) and then asks for the corresponding characters which translates to a string 'Hello' appearing in the sequence at the corresponding place.
Now, if we flip a single bit in the sequence, we may get quite a different result but it should still be a syntactically valid input as it was created by the generator. Observe:
    0000 1011 … 0000 0«0»01 | 0100 1000 | 0110 0101 | 0110 1100
    | 0110 1100 |  0110 1111 | …
This generates a shorter string 'H', and the rest of the sequence is used by different parts of the generator.
If we had flipped a bit that represents a boolean value, it might change the structure in an even bigger fashion. There are also many mutations that would have little impact on the result For example, if we had flipped a bit inside one of the bytes used for characters, it probably wouldn't result in interesting semantic changes.
You might be thinking, "what happens if the sequence has already ended, but the generator needs to make another choice based on a random value?" We can easily solve this by appending more random bits to the sequence and saving the prolonged sequence into our set of inputs. The opposite case, when the sequence is too long, but the generator has already reached the final state, can simply be solved by ignoring the unused part of the sequence.
Actually, this is an approach taken by a tool called Zest on which I recently read up in a paper[https://dl.acm.org/doi/abs/10.1145/3293882.3330576]. The authors have a slightly more complex approach which they hypothesise that it biases the fuzzer towards generating valid inputs more often. It also provides better examples of how mutations affect the output (although a bit more drawn out) if you are interested.

other fuzzers

I have shown two variants of fuzzing I personally found interesting, but there are of course more variants of fuzzers as the term is quite benevolent. For example, there are black-box fuzzers that do not have any knowledge of the program and its structure. Some of those even try to analyse the program based on how it responds to the inputs and use that gained knowledge to bias the fuzzer towards more interesting inputs.
There are also other variants of how the mutation and generator based approaches can be done, and some might even combine both of them.