Paper citation: Dhulipala, Hridya, Aashish Yadavally, and Tien N. Nguyen. “Planning to Guide LLM for Code Coverage Prediction.” In Proceedings of the 2024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering, pp. 24–34. 2024.
Summary
CodePilot is a novel prompting approach. It uses program semantics to improve code coverage prediction with a Large Language Model (LLM). Experimental results show significant accuracy improvements over existing models.
Key Points:
- Existing coverage profilers falter when the full codebase is unavailable.
- CodePilot uses a planning mechanism for better program understanding.
- Achieved up to 55% exact-match accuracy and 89% statement-match accuracy.
- Outperformed baseline models and identified statements with low coverage.
Approach
CodePilot integrates reasoning and action through the following steps:
Input Preparation
Necessary inputs include:
- A set of instructions
- Code snippets
- Exemplary plans
- Coverage data
# Preparing the inputs for CodePilot
instructions = "For the given code snippet, create a step-by-step plan to predict code coverage."
exemplar_code = "def example_function(x): return x + 1"
exemplar_plan = ["Step 1: Check input value.", "Step 2: Return incremented value."]
exemplar_coverage = "Lines 1 and 2 covered."
test_code = "def test_function(y): return y * 2"
CodePilot Planning
Generating a plan for the test code involves the LLM.
# Function to generate a plan from the LLM for the test code
def generate_plan(test_code):
plan = []
plan.append("Step 1: Check if 'y' is valid.")
plan.append("Step 2: Multiply 'y' by 2 and return the result.")
return plan
Predicting Code Coverage
Predict code coverage using the generated plan.
# Function that uses the generated plan to predict code coverage
def predict_coverage(plan):
coverage = {}
for step in plan:
if "valid" in step:
coverage['Step 1'] = "Covered"
if "Multiply" in step:
coverage['Step 2'] = "Covered"
return coverage
Running CodePilot
Integrate the steps to start CodePilot.
# Operating CodePilot
plan = generate_plan(test_code)
coverage = predict_coverage(plan)
print(f"Code Coverage Prediction: {coverage}")
Summary of the Evaluation
CodePilot was evaluated through experiments against existing models. Results showed:
- 55% exact-match accuracy
- 89% statement-match accuracy
- Enhanced performance due to planning capabilities.
Analysis: Pros
- High Accuracy: Exceptional accuracy metrics surpass traditional methods.
- Practical Utility: Helps prioritize testing efforts effectively.
- Autonomous Planning: Manages complex execution flows efficiently.
Analysis: Cons
- Handling Complex Cases: Struggles with runtime errors due to input validity issues.
- Limited Scope: Focused on Python, limiting application to other languages.
- Challenges with Recursion: Difficulty in assessing recursive functions.
In conclusion, CodePilot offers a promising approach for enhancing code coverage prediction. It combines advanced planning techniques with LLMs, showing strong potential for improving software testing.
This article is also published on Medium
Leave a Reply