Much has been written about fuzzing compilers already, but there is not a lot that I could find about fuzzing compilers using more modern fuzzing techniques where coverage information is fed back into the fuzzer to find more bugs.
If you know me at all, you know I’ll throw anything I can get my hands on at AFL. So I tried gcc. (And clang, and rustc — but more about Rust in a later post.)
Levels of fuzzing
First let me summarise a post by John Regehr called Levels of Fuzzing, which my approach builds heavily on. Regehr presents a very important idea (which stems from earlier research/papers by others), namely that fuzzing can operate at different “levels”. These levels correspond somewhat loosely to the different stages of compilation, i.e. lexing, parsing, type checking, code generation, and optimisation. In terms of fuzzing, the source code that you pass to the compiler has to “pass” one stage before it can enter the next; if you give the compiler a completely random binary file, it is unlikely to even get past the lexing stage, never mind to the point where the compiler is actually generating code. So it is in our interest (assuming we want to fuzz more than just the lexer) to generate test cases more intelligently than just using random binary data.
…In a “naïve” approach, we simply compile gcc with AFL instrumentation and run afl-fuzz on it as usual. If we give a reasonable corpus of existing C code, it is possible that the fuzzer will find something interesting by randomly mutating the test cases. But more likely than not, it is mostly going to end up with random garbage like what we see above, and never actually progress to more interesting stages of compilation. I did try this — and the results were as expected. It takes a long time before the fuzzer hits anything interesting at all.
Read more at Vegard