LARGE LANGUAGE MODEL DRIVEN PROGRAM SYNTHESIS VIA BAYESIAN OPTIMIZATION

Shlok Tomar

doi:10.7273/000007218

Back

Thesis

Open access

LARGE LANGUAGE MODEL DRIVEN PROGRAM SYNTHESIS VIA BAYESIAN OPTIMIZATION

Shlok Tomar

Washington State University

Master of Science (MS), Washington State University

12/2024

DOI:

https://doi.org/10.7273/000007218

Files and links (1)

pdf

Shlok_MS_Thesis1.10 MBDownload View

CC BY V4.0, Open Access

Abstract

Bayesian Optimization

Chat-GPT

Dimensionality scale priors

large language Model LLM

Llama

Random Projections

We consider the task of generating functionally correct code using large language models (LLMs). The correctness of generated code critically depends on the prompt used to query the given base LLM. We formulate the problem of finding the appropriate prompt as a combinatorial search process and propose a novel Bayesian optimization (BO) approach referred to as BO for Code GENeration (BODE-GEN). BODE-GEN performs an adaptive data-driven search over prompts guided by training data in the form of prompts tried and the functional accuracy of the generated code over a set of given test cases. The key insight is to perform BO in continuous embedding space by using an auxiliary LLM to bridge the gap between discrete prompt space and continuous embedding space. We leverage two synergistic ideas, namely, random projections and dimensionality scaled priors, to build effective Gaussian process-based surrogate models over the high-dimensional embedding space. Our extensive experiments on the HumanEval+ benchmark using multiple base LLMs show that BODE-GEN significantly improves code generation accuracy compared to fixed prompts and manual prompt engineering. Additionally, we demonstrate that BODE-GEN is sample-efficient, requiring relatively few iterations of BO to achieve substantial gains in code accuracy.

Metrics

5 File views/ downloads

12 Record Views

Details

Title: LARGE LANGUAGE MODEL DRIVEN PROGRAM SYNTHESIS VIA BAYESIAN OPTIMIZATION
Creators: Shlok Tomar
Contributors: Janarthan Roa Dopp (Co-Chair)
Haipeng Cai (Co-Chair)
Ganapati Bhat (Committee Member)
Awarding Institution: Washington State University
Academic Unit: School of Electrical Engineering and Computer Science
Theses and Dissertations: Master of Science (MS), Washington State University
Publisher: Washington State University
Number of pages: 59
Identifiers: 99901195439401842
Language: English
Resource Type: Thesis