Paper citation: Dhulipala, Hridya, Aashish Yadavally, and Tien N. Nguyen. “Planning to Guide LLM for Code Coverage Prediction.” In Proceedings of the 2024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering, pp. 24–34. 2024.

Summary

CodePilot is a novel prompting approach. It uses program semantics to improve code coverage prediction with a Large Language Model (LLM). Experimental results show significant accuracy improvements over existing models.

Key Points:

  • Existing coverage profilers falter when the full codebase is unavailable.
  • CodePilot uses a planning mechanism for better program understanding.
  • Achieved up to 55% exact-match accuracy and 89% statement-match accuracy.
  • Outperformed baseline models and identified statements with low coverage.

Approach

CodePilot integrates reasoning and action through the following steps:

Input Preparation

Necessary inputs include:

  • A set of instructions
  • Code snippets
  • Exemplary plans
  • Coverage data
# Preparing the inputs for CodePilot
instructions = "For the given code snippet, create a step-by-step plan to predict code coverage."
exemplar_code = "def example_function(x): return x + 1"
exemplar_plan = ["Step 1: Check input value.", "Step 2: Return incremented value."]
exemplar_coverage = "Lines 1 and 2 covered."
test_code = "def test_function(y): return y * 2"

CodePilot Planning

Generating a plan for the test code involves the LLM.

# Function to generate a plan from the LLM for the test code
def generate_plan(test_code):
    plan = []
    plan.append("Step 1: Check if 'y' is valid.")
    plan.append("Step 2: Multiply 'y' by 2 and return the result.")
    return plan

Predicting Code Coverage

Predict code coverage using the generated plan.

# Function that uses the generated plan to predict code coverage
def predict_coverage(plan):
    coverage = {}
    for step in plan:
        if "valid" in step:
            coverage['Step 1'] = "Covered"
        if "Multiply" in step:
            coverage['Step 2'] = "Covered"
    return coverage

Running CodePilot

Integrate the steps to start CodePilot.

# Operating CodePilot
plan = generate_plan(test_code)
coverage = predict_coverage(plan)
print(f"Code Coverage Prediction: {coverage}")

Summary of the Evaluation

CodePilot was evaluated through experiments against existing models. Results showed:

  • 55% exact-match accuracy
  • 89% statement-match accuracy
  • Enhanced performance due to planning capabilities.

Analysis: Pros

  • High Accuracy: Exceptional accuracy metrics surpass traditional methods.
  • Practical Utility: Helps prioritize testing efforts effectively.
  • Autonomous Planning: Manages complex execution flows efficiently.

Analysis: Cons

  • Handling Complex Cases: Struggles with runtime errors due to input validity issues.
  • Limited Scope: Focused on Python, limiting application to other languages.
  • Challenges with Recursion: Difficulty in assessing recursive functions.

In conclusion, CodePilot offers a promising approach for enhancing code coverage prediction. It combines advanced planning techniques with LLMs, showing strong potential for improving software testing.


This article is also published on Medium


Leave a Reply

Your email address will not be published. Required fields are marked *