Bug Detection and Localization using Pre-trained Code Language Models

Campos, Viola

Konferenzbeitrag

Bug Detection and Localization using Pre-trained Code Language Models

Dokumententyp

Text/Conference Paper

Dateien

Campos_Bug_Detection_and_Localization.pdf (825.65 KB)

Datum

2024

Autor:innen

Campos, Viola

Quelle

INFORMATIK 2024

AI@WORK

Verlag

Gesellschaft für Informatik e.V.

Zusammenfassung

Language models for source code have improved significantly with the emergence of Transformer-based Large Language Models (LLMs). These models are trained on large amounts of code in which defects are relatively rare, causing them to perceive faulty code as unlikely and correct code as more 'natural,' thus assigning it a higher likelihood. We hypothesize that the likelihood scores generated by an LLM can be directly used as a lightweight approach to detect and localize bugs in source code. In this study, we evaluate various methods to construct a suspiciousness score for faulty code segments based on LLM likelihoods. Our results demonstrate that these methods can detect buggy methods in a common benchmark with up to 78% accuracy. However, using LLMs directly for fault localization raises concerns about training data leakage, as common benchmarks are often already incorporated into the training data of such models and thus learned. By additionally evaluating our experiments on a small, non-public dataset of student submissions to programming exercises, we show that leakage is indeed an issue, as the evaluation results on both datasets differ significantly.

Campos, Viola (2024): Bug Detection and Localization using Pre-trained Code Language Models. INFORMATIK 2024. DOI: 10.18420/inf2024_124. Bonn: Gesellschaft für Informatik e.V.. ISSN: 2944-7682. PISSN: 1617-5468. EISSN: 2944-7682. ISBN: 978-3-88579-746-3. pp. 1419-1429. AI@WORK. Wiesbaden. 24.-26. September 2024

Schlagwörter

Fault Detection , Fault Localization , AI4SE , LLM4SE

DOI

10.18420/inf2024_124

Sammlungen

P352 - INFORMATIK 2024 - Lock in or log out? Wie digitale Souveränität gelingt

Komplettanzeige

Bug Detection and Localization using Pre-trained Code Language Models

Volltext URI

Dokumententyp

Dateien

Zusatzinformation

Datum

Autor:innen

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Quelle

Verlag

Zusammenfassung

Beschreibung

Schlagwörter

Zitierform

DOI

Tags

Sammlungen