OverHAuL

Harnessing Automation for C Libraries with Large Language Models

Published

July 27, 2025

Abstract
Software vulnerabilities remain pervasive and challenging to detect, making robust testing approaches imperative. Fuzzing is an established software testing method for uncovering such vulnerabilities, through random input execution. Recent research has leveraged Large Language Models (LLMs) to enhance fuzz driver generation. However, most contemporary tools rely on additional resources beyond the target code, such as client programs or preexisting harnesses, limiting their scalability and applicability. In this thesis, we present OverHAuL, a neurosymbolic AI system that employs LLM agents to automatically generate fuzzing harnesses directly from library code, eliminating the need for auxiliary artifacts. To comprehensively evaluate OverHAuL, we construct a benchmark suite consisting of ten open-source C libraries. Our empirical analysis demonstrates that OverHAuL achieves an 81.25% success rate in harness generation across the evaluated projects, underscoring its effectiveness and potential to facilitate more efficient vulnerability discovery.
Keywords

LLMs, Fuzzing, Automation, Security, Neurosymbolic AI

Preface

This thesis was prepared in Athens, Greece, during the academic year 2024–2025, fulfilling a requirement for the Bachelor of Science degree at the Department of Informatics and Telecommunications of the National and Kapodistrian University of Athens. The research presented herein was carried out under the supervision of Prof. Thanassis Avgerinos and in accordance with the guidelines stipulated by the department. All processes and methodologies adopted during the research adhere to the academic and ethical standards of the university. The final version of this thesis is hosted online and is also archived in the department’s records, made publicly accessible through the university’s digital repository Pergamos.

Acknowledgments

I would like to express my gratitude to my supervisor, Prof. Thanassis Avgerinos, for his insightful guidance, patience, and unwavering encouragement throughout this journey. His openness and our shared passion for the subject greatly enhanced my enjoyment of the thesis process.

I am also thankful to my fellow group members in Prof. Avgerinos’ weekly meetings, whose willingness to exchange ideas and offer support was invaluable. My appreciation extends to Jorgen and Phaedon, friends who provided thoughtful input and advice along the way.

A special thank you goes to my parents Giannis and Gianna, Christina, and my friends for their constant support and understanding. Their patience and encouragement helped me persevere through this challenging period.

Citation

BibTeX citation:
@thesis{chousos2025,
  type = {BSc Thesis},
  title = {{{OverHAuL}}: {{Harnessing}} Automation for {{C}} Libraries with Large Language Models},
  shorttitle = {{{OverHAuL}}},
  author = {Chousos, Konstantinos},
  date = {2025-07-27},
  institution = {{National and Kapodistrian University of Athens}},
  location = {Athens, Greece},
  url = {https://pergamos.lib.uoa.gr/uoa/dl/object/5300250},
  langid = {english},
  pagetotal = {79},
  note = {Also available at: \href{https://kchousos.github.io/BSc-Thesis/}{https://kchousos.github.io/BSc-Thesis/}}
}
For attribution, please cite this work as:
K. Chousos, “OverHAuL: Harnessing automation for C libraries with large language models,” BSc Thesis, National and Kapodistrian University of Athens, Athens, Greece, 2025. [Online]. Available: https://pergamos.lib.uoa.gr/uoa/dl/object/5300250