Coverage analysis #
Gaining confidence in your code coverage archived during fuzzing is essential for two reasons. Firstly, you want to assess which parts of your applications your fuzzing harnesses execute.
For example, a magic value check, like the one shown in the following figure, may be hard for a fuzzer to overcome. Discovering such a check is important so that the values can be provided to the fuzzer through a dictionary or test cases in the seed corpus.
Secondly, when switching your fuzzer or updating your harness or SUT, you want to see whether coverage changes. If coverage decreases, then you may need to extend your harness because new features were introduced. If coverage increases, then you probably improved your harness or the SUT became easier to fuzz.
Fuzzing coverage is a proxy for the capability and performance of the fuzzer. Even though it is widely accepted that coverage is not ideal for measuring the performance of a fuzzing engine, coverage can tell you whether your harness works in a given setup.
The following flow chart shows an ideal coverage analysis workflow. The workflow uses the corpus generated after each fuzzing campaign to calculate the coverage, which is the preferred method.
PRO TIP: You should not use the statistics returned by your specific fuzzer to track fuzzing performance over a long time. For example, AFL++ outputs a value that indicates the code coverage. However, this value is non-comparable with other fuzzers because they may calculate this value differently.
The most comparable data is generated by tools specifically made for measuring coverage.
The cargo-fuzz tool can output coverage for your corpus. Based on the acquired coverage data, we can use standard Rust tools to generate an HTML report.
The cargo-fuzz tool requires the llvm-tools-preview component to be installed for the used nightly toolchain:
rustup toolchain install nightly --component llvm-tools-preview
Now, use cargo-fuzz to output coverage data. The cargo-fuzz tool will recompile your project with coverage instrumentation and then run through all test cases in the corpus.
cargo +nightly fuzz coverage fuzz_target_1
The cargo-fuzz tool will report that merged coverage has been written to a .profdata
file. Next, we need the tools
cargo-binutils and
rustfilt installed.
cargo install cargo-binutils
cargo install rustfilt
Create the following script and make it executable. We create a script because the manual invocation of the coverage generation is quite complex.
Finally, we can generate an HTML report and save it to fuzz_html/
.
./generate_html fuzz_target_1 src/lib.rs
PRO TIP: The following table lists the invoked cargo command in more detail:
cargo +nightly cov --
Invokes the llvm-cov tool from the Rust toolchain show
This subcommand can generate HTML reports -Xdemangler=rustfilt
Use the rustfilt demangler for better function names "target/$TARGET/coverage/$TARGET/release/$FUZZ_TARGET"
Path to the instrumented Rust binary or object file -instr-profile="fuzz/coverage/$FUZZ_TARGET/coverage.profdata"
Path to the merged coverage data -show-line-counts-or-regions
-show-instantiations
Options for Rust to make the output easier to understand -format=html -o fuzz_html/
Sets the format to HTML and outputs the HTML files to a directory src/lib.rs
Optionally, you can add paths to source files to filter the output. In this case, we are interested only in the lib.rs file.
Real-world examples #
Cargo crate: ogg #
In a previous section we fuzzed the ogg crate. Now we want to evaluate the coverage of our fuzzing campaign to verify we achieved good coverage.
First, we inspect the corpus and verify that we found test cases.
ls fuzz/corpus/fuzz_target_1/ | wc -l
Then we generate merged coverage data from the corpus:
cargo +nightly fuzz coverage fuzz_target_1
Finally, we generate an HTML report and use domain knowledge to assess the fuzzing performance by using the generate_html
script introduced in the
Coverage analysis section.
We may need to find more diverse seeds or fix bugs in our harness if the code coverage is unexpectedly low. However, no single number determines bad coverage; this depends significantly on how the crate is written and how difficult it is to reach certain code.