Sakkas, Georgios

Neurosymbolic Tools for Effective Coding and Debugging

2024

Sakkas, Georgios
Advisor(s): Jhala, Ranjit

Abstract

This dissertation presents neurosymbolic approaches for developing tools that enhance programming and debugging by combining symbolic reasoning with machine learning techniques. As modern programming languages grow more complex, the need for automated tools that can efficiently identify and fix errors becomes more critical. By integrating traditional program analysis methods with the predictive power of machine learning models, neurosymbolic approaches offer a robust solution for automated program repair and synthesis. This work focuses on creating tools that target common errors in OCaml and Python, and aiming to reduce manual intervention in Haskell program verification, while improving the accuracy and efficiency of error detection and correction.

The first contribution of this research is Rite, a tool that provides type error feedback in OCaml programs through a data-driven approach to program repair. Rite uses a training dataset of ill-typed programs and their fixes to predict and generate repairs for new errors. The second contribution is Seq2Parse, a neurosymbolic tool that addresses syntax errors in Python by combining neural sequence models with symbolic error-correcting parsers. This hybrid method can efficiently pinpoint relevant corrections and generate accurate fixes. Lastly, this dissertation introduces LHC, a tool that uses large language models (LLMs) to automatically generate refinement type annotations in Haskell programs. LHC drastically reduces the time and expertise needed to perform formal verification by leveraging LLMs and symbolic refinement type checking.

Each tool demonstrates the effectiveness of neurosymbolic approaches in simplifying the programming and debugging process. Evaluations show that these methods not only improve the accuracy of repairs but also provide users with clear and useful feedback. This work concludes with an exploration of future directions for neurosymbolic tools, particularly their potential to scale automated program repair techniques across different programming languages and development environments. These findings highlight the promise of neurosymbolic methods in optimizing software development and improving the overall efficiency of programming tasks.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC San Diego

Neurosymbolic Tools for Effective Coding and Debugging