Logo des Repositoriums
 

Bug Detection and Localization using Pre-trained Code Language Models

dc.contributor.authorCampos, Viola
dc.contributor.editorKlein, Maike
dc.contributor.editorKrupka, Daniel
dc.contributor.editorWinter, Cornelia
dc.contributor.editorGergeleit, Martin
dc.contributor.editorMartin, Ludger
dc.date.accessioned2024-10-21T18:24:13Z
dc.date.available2024-10-21T18:24:13Z
dc.date.issued2024
dc.description.abstractLanguage models for source code have improved significantly with the emergence of Transformer-based Large Language Models (LLMs). These models are trained on large amounts of code in which defects are relatively rare, causing them to perceive faulty code as unlikely and correct code as more 'natural,' thus assigning it a higher likelihood. We hypothesize that the likelihood scores generated by an LLM can be directly used as a lightweight approach to detect and localize bugs in source code. In this study, we evaluate various methods to construct a suspiciousness score for faulty code segments based on LLM likelihoods. Our results demonstrate that these methods can detect buggy methods in a common benchmark with up to 78% accuracy. However, using LLMs directly for fault localization raises concerns about training data leakage, as common benchmarks are often already incorporated into the training data of such models and thus learned. By additionally evaluating our experiments on a small, non-public dataset of student submissions to programming exercises, we show that leakage is indeed an issue, as the evaluation results on both datasets differ significantly.en
dc.identifier.doi10.18420/inf2024_124
dc.identifier.eissn2944-7682
dc.identifier.isbn978-3-88579-746-3
dc.identifier.issn2944-7682
dc.identifier.pissn1617-5468
dc.identifier.urihttps://dl.gi.de/handle/20.500.12116/45097
dc.language.isoen
dc.publisherGesellschaft für Informatik e.V.
dc.relation.ispartofINFORMATIK 2024
dc.relation.ispartofseriesLecture Notes in Informatics (LNI) - Proceedings, Volume P-352
dc.subjectFault Detection
dc.subjectFault Localization
dc.subjectAI4SE
dc.subjectLLM4SE
dc.titleBug Detection and Localization using Pre-trained Code Language Modelsen
dc.typeText/Conference Paper
gi.citation.endPage1429
gi.citation.publisherPlaceBonn
gi.citation.startPage1419
gi.conference.date24.-26. September 2024
gi.conference.locationWiesbaden
gi.conference.sessiontitleAI@WORK

Dateien

Originalbündel
1 - 1 von 1
Lade...
Vorschaubild
Name:
Campos_Bug_Detection_and_Localization.pdf
Größe:
825.65 KB
Format:
Adobe Portable Document Format