Intro to OSS-Fuzz-Gen

A Framework for Fuzz Target Generation and Evaluation

Konstantinos Chousos
sdi2000215@di.uoa.gr

Department of Informatics & Telecommunications, University of Athens

April 11, 2025

Overview

  1. Intro to fuzzing
  2. OSS-Fuzz
  3. OSS-Fuzz-Gen
    1. from_scratch branch
  4. Future work
  1. Εισαγωγή στο fuzzing
  2. Πλατφόρμα στην οποία πατάει το ofg
  3. Παραπάνω πληροφορίες, προβλήματα που έχει
  4. Προσωπικό πλάνο για αντιμετώπιση αυτών των προβλημάτων

Fuzzing

Fuzzing

What is fuzzing?

Fuzzing is the execution of a Program Under Test (PUT) using input(s) sampled from an input space (the “fuzz input space”) that protrudes the expected input space of the PUT [1].

Overview of a fuzz campaign.

Overview of a fuzz campaign.

Είδος testing όπου τρέχουμε το Program Under Test (PUT) με “τυχαία” inputs. Στόχος είναι να κάνουμε το πρόγραμμα να κρασάρει, άρα να βρούμε κάποιο error.

  • Αρχή με corpus -> fuzz -> Αν κράσαρε: input στο corpus
  • Τα inputs γίνονται mutate

Fuzzing

What is fuzzing?

  • These inputs are often generated or mutated automatically.

    Generational fuzzing
    Inputs generated randomly from a BNF grammar.
    Mutational fuzzing
    Inputs resulted from mutating inputs from a pre-existing corpus.
  • Goal: trigger unexpected behavior (e.g., crashes, hangs, memory errors).

Fuzzing

Why fuzz?

The purpose of fuzzing relies on the assumption that there are bugs within every program, which are waiting to be discovered. Therefore, a systematic approach should find them sooner or later.

— OWASP Foundation

  • Open Worldwide Application Security Project (OWASP)
  • nonprofit foundation, 2001

Fuzzing

Why fuzz?

Fuzz testing is valuable for:

  • Software that receives inputs from untrusted sources (security);
  • Sanity checking the equivalence of two complex algorithms (correctness);
  • Verifying the stability of a high-volume API that takes complex inputs (stability), e.g. a decompressor, even if all the inputs are trusted.

— Google

  1. Inputs από αναξιόπιστες πηγές
  2. Εξακρίβωση υλοποιήσεων
  3. Περίπλοκα projects/APIs

Fuzzing

Success stories

  • Heartbleed vulnerability, OpenSSL [2] (CVE-2014-0160)
    • Easily found with fuzzing ⇒ Preventable
  • Shellshock vulnerabilities, Bash (CVE-2014-6271)
  • Mayhem (FKA ForAllSecure) [3]
    1. Cloudflare
    2. OpenWRT

execute arbitrary commands and gain unauthorized access

Fuzzing

Fuzzer implementations

  • LibFuzzer [4].
    • In-process, coverage-guided, mutation-based fuzzer.
  • Americal Fuzzy Lop (AFL) [5].
    • Instrumented binaries for edge coverage.
    • Adds more fuzzing strategies, better speed, and QEMU/Unicorn support.
    • Superseded by AFL++ [6].

LibFuzzer

LibFuzzer is an in-process, coverage-guided, evolutionary fuzzing engine. LibFuzzer is linked with the library under test, and feeds fuzzed inputs to the library via a specific fuzzing entrypoint (fuzz target).

Used to fuzz library functions. The programmer writes a fuzz target to test their implementation.

LibFuzzer

Fuzz target

A function that accepts an array of bytes and does something interesting with these bytes using the API under test [4].

AKA fuzz driver, fuzzer entry point, harness.

LibFuzzer

Fuzz target structure

  • Entry point called repeatedly with mutated inputs.
  • Feedback-driven: uses coverage to guide mutations.
  • Best for libraries, not full programs.
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
  DoSomethingWithData(Data, Size);
  return 0;
}extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
  DoSomethingWithData(Data, Size);
  return 0;
}extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
  DoSomethingWithData(Data, Size);
  return 0;
}

Τέτοιο signature χρειάζεται το libfuzzer που φτιάχνει δικό του binary. Το AFL++ τρέχει το original binary.

Πρόσεχε, τα lines έχουν highlight

AFL++

AFL fuzzes programs/binaries. The inputs are taken from the seeds_dir and their mutations.

$ ./afl-fuzz -i seeds_dir -o output_dir -- /path/to/tested/program
  • Works on black-box or instrumented binaries.
  • Uses fork-server model for speed.
  • Supports persistent mode, QEMU, and Unicorn modes.

Μπορεί επίσης να χρησιμοποιηθεί για fuzzing βιβλιοθηκών κτλ., απλά αντί για LLVMFuzzerTestOneInput έχουμε την main.

Μπορεί να χρησιμοποιήσει και LLVMFuzzerTestOneInput harnesses.

OSS-Fuzz

OSS-Fuzz

Continuous fuzzing for open source software

Scalable, distributed, CI fuzzing solution for open-source projects [7].

  • Supports LibFuzzer, AFL++, Honggfuzz and Centipede fuzzing engines.
  • Supports C/C++, Rust, Go, Python and Java/JVM projects.
  • Based on ClusterFuzz [8].
  • Started in 2016, in response to the Heartbleed vulnerability [2].

The vulnerability had the potential to affect almost every internet user, yet was caused by a relatively simple memory buffer overflow bug that could have been detected by fuzzing [9].

OSS-Fuzz

OSS-Fuzz

Problems

  • Upfront cost of writing fuzz targets.
  • Integration specifications1:
    • project.yaml
    • Dockerfile
    • build.sh
  • Only “big” (stars/loc) projects.
  • Required Google developer account.

Χρειάζεται ο προγραμματιστής να γράψει τα fuzz targets και να σετάρει το project για integration με το OSS-Fuzz.

Google account to access to the ClusterFuzz web interface.

  1. Must transform project to ClusterFuzz’s [8] structure.

OSS-Fuzz-Gen

OSS-Fuzz-Gen

This framework generates fuzz targets for real-world C/C++, Java, Python projects with various Large Language Models (LLM) and benchmarks them via the OSS-Fuzz platform [10].

  • Goal: Take as input a GitHub repository and output an OSS-Fuzz project as well as a ClusterFuzzLite project with a meaningful fuzz harness [11].

OSS-Fuzz-Gen

Architecture

Warning

The project must come with preexisting fuzz targets. Fuzz-Introspector gives the LLM info about the harnesses, not the main program/functions.

Δεδομένου ενός github repo link, γίνονται τα ακόλουθα:

  1. compile το project με βάση κάποια predefined generic scripts κι άλλα “build heuristics”
  2. ξανά compile με Fuzz Introspector για program analysis -> json report file με στατιστικά για κάθε συνάρτηση, καθώς πληροφορίες για το signature, τα ορίσματα κτλ.
  3. το report χρησιμοποιείται σε ένα prompt που δίνεται στο LLM για να παράξει harness για κάποια συγκεκριμένη συνάρτηση.
  4. Κάθε harness τεστάρετε για το αν δουλεύει και δεν κρασάρει κατευθείαν. Μετά γίνονται integrated σε OSS-Fuzz/ClusterFuzzLite projects.

OSS-Fuzz-Gen

LLM Prompting

  1. Input: Fuzz-Introspector json code reports.
  2. Include the above in prompt templates → send to LLM.
  3. Result: Harness returned from LLM.

we have implemented several “harness-generators” that take as input the introspector reports and use this to create human-readable (LLM-readable) prompts which direct the LLM towards creating fuzz harnesses. The high-level idea is to generate textual descriptions of the target functions that are likely to produce a good harness by the LLM.

  • Description of the target function’s signature, with complete types, of the target program
  • Description of specifically which header files are available in the target project.
  • Examples of cross-references that use the target function to present sample code patterns involving the target function.
  • The actual source code of the target function.
  • Provide basic guidance to the LLM, such as the need for wrapping it in LLVMFuzzerTestOneInput.

OSS-Fuzz-Gen

Results

One of our sample projects, tinyxml2, went from 38% line coverage to 69% without any interventions from our team.

OSS-Fuzz-Gen

Problems

  • Project needs to be part of OSS-Fuzz to use OSS-Fuzz-Gen’s capabilities.
    • Same hinderances as OSS-Fuzz.
  • Project needs preexisting harnesses.
  • Results range from good to bad.

from_scratch Branch

from_scratch Branch

Future plans for OSS-Fuzz-Gen include bootstrapping a project fuzz-wise, meaning generating harnesses for a codebase without harnesses.

The work for this feature is located in https://github.com/google/oss-fuzz-gen/blob/main/experimental/from_scratch. The latest commits do not work. Known working commit: 171aac2.

Demo Time

Demo Time

  1. Clone and install Fuzz-Introspector.
  2. Clone and setup OSS-Fuzz-Gen.
    1. Checkout working commit: $ git checkout 171aac2.
    2. Export API key.
  3. Prepare a target project. README uses dvhar/dateparse.
  4. Execute the script:
❯ python3 -m experimental.from_scratch.generate \
              --language c++ \
              --model gpt-4 \
              --function dateparse \
              --target-dir ../../dvhar/dateparse/ \
              --out-dir out

Demo Time

Result

// out/01.rawoutput
<code>
#include <stdint.h>
#include <stddef.h>

typedef struct {
    int sec;     /* seconds after the minute - [ 0 to 59 ] */
    int min;     /* minutes after the hour - [ 0 to 59 ] */
    int hour;    /* hours since midnight - [ 0 to 23] */
    int mday;    /* day of the month - [ 1 to 31 ] */
    int mon;     /* months since January - [ 0 to 11 ] */
    int year;    /* years */
    int wday;    /* days since Sunday - [ 0 to 6 ] */
    int yday;    /* days since January 1 - [ 0 to 365 ] */
} date_t;

extern int dateparse(const char* datestr, date_t* t, int *offset, int stringlen);

int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    // Ignore input if it is less than 1
    if (size < 1) {
        return 0;
    }

    // Convert data to string
    char *datestr = (char *)data;

    // Initialize a date_t struct and an offset integer
    date_t t;
    int offset = 0;

    // Call the function-under-test
    dateparse(datestr, &t, &offset, (int)size);

    return 0;
}
</code>

Demo Time

Problems

  1. Response wrapped in <code> tags.
  2. Even without them, harness does not compile.
  3. Missing headers.

Where do we go from here?

Future work

High-level goal

A GitHub action that when integrated to a C/C++ project will:

  1. Use LLMs to create fuzz targets from scratch.
  2. Build and run them, evaluate them based on runtime, coverage etc.
  3. Create PRs to integrate them to the project.

Future work

“Good to have” features

  1. No strict prerequisites.
    • E.g. project structure, build system.
  2. Support for Python projects using the Atheris [12] fuzzer.

Future work

Flowchart

False

True

Start

Add action

Project info

LLM

Gen harnesses

Pass?

PR

End

References

[1]
V. J. M. Manes et al., “The Art, Science, and Engineering of Fuzzing: A Survey.” [Online]. Available: http://arxiv.org/abs/1812.00140
[2]
“Heartbleed Bug.” [Online]. Available: https://heartbleed.com/
[3]
T. Simonite, “This Bot Hunts Software Bugs for the Pentagon,” Wired, Jun. 01, 2020. Available: https://www.wired.com/story/bot-hunts-software-bugs-pentagon/
[4]
“libFuzzer – a library for coverage-guided fuzz testing. — LLVM 21.0.0git documentation.” [Online]. Available: https://llvm.org/docs/LibFuzzer.html
[5]
“American fuzzy lop.” [Online]. Available: https://lcamtuf.coredump.cx/afl/
[6]
M. Heuse, H. Eißfeldt, A. Fioraldi, and D. Maier, AFL++. (Jan. 2022). Available: https://github.com/AFLplusplus/AFLplusplus
[7]
A. Arya, O. Chang, J. Metzman, K. Serebryany, and D. Liu, OSS-Fuzz. (Apr. 08, 2025). Available: https://github.com/google/oss-fuzz
[8]
Google/clusterfuzz. (Apr. 09, 2025). Google. Available: https://github.com/google/clusterfuzz
[9]
“OSS-Fuzz Documentation.” [Online]. Available: https://google.github.io/oss-fuzz/
[10]
D. Liu, O. Chang, J. metzman, M. Sablotny, and M. Maruseac, OSS-fuzz-gen: Automated fuzz target generation. (May 2024). Available: https://github.com/google/oss-fuzz-gen
[11]
OSS-Fuzz Maintainers, “Introducing LLM-based harness synthesis for unfuzzed projects.” [Online]. Available: https://blog.oss-fuzz.com/posts/introducing-llm-based-harness-synthesis-for-unfuzzed-projects/
[12]
Google/atheris. (Apr. 09, 2025). Google. Available: https://github.com/google/atheris

These slides can be found at: https://kchousos.github.io/ofg-presentation/

Thank you!

Konstantinos Chousos - OSS-Fuzz-Gen

1 / 36
Intro to OSS-Fuzz-Gen A Framework for Fuzz Target Generation and Evaluation Konstantinos Chousos sdi2000215@di.uoa.gr Department of Informatics & Telecommunications, University of Athens April 11, 2025

  1. Slides

  2. Tools

  3. Close
  • Intro to OSS-Fuzz-Gen
  • Overview
  • Fuzzing
  • Fuzzing
  • Fuzzing
  • Fuzzing
  • Fuzzing
  • Fuzzing
  • Fuzzing
  • LibFuzzer
  • LibFuzzer
  • LibFuzzer
  • AFL++
  • OSS-Fuzz
  • OSS-Fuzz
  • OSS-Fuzz
  • OSS-Fuzz
  • OSS-Fuzz-Gen
  • OSS-Fuzz-Gen
  • OSS-Fuzz-Gen
  • OSS-Fuzz-Gen
  • OSS-Fuzz-Gen
  • OSS-Fuzz-Gen
  • from_scratch Branch
  • from_scratch Branch
  • Demo Time
  • Demo Time
  • Demo Time
  • Demo Time
  • Where do we go from here?
  • Future work
  • Future work
  • Future work
  • References
  • These slides can...
  • Thank you!
  • f Fullscreen
  • s Speaker View
  • o Slide Overview
  • e PDF Export Mode
  • r Scroll View Mode
  • ? Keyboard Help