Hello World to LLM Agent Code Generation

Multi-Agent LLM Framework using LangGraph

5 min readApr 28, 2024

There are lots of coding agents in the marketplace. How can you build one? You can ask LLM to generate some code, but usually it won’t be good at first run. How can you iteratively but automatically generate good code? Let’s review the approach with a popular multi-agent framework called langgraph.

On high-level, think of following patterns with worker and superviser. Worker provides output to the supervisor. Supervisor evaluates the output, provide feedback to the worker. Worker adjusts the output and supervisor evaluates again. This cycle continues until supervisor has no more feedback. In agent world, there is edge case that worker result is never good (maybe task is too complex for the LLM), so you need to cap how many iterations. Let’s review how you can implement this pattern in langgraph.

Good community example

Future of Coding — Multi-Agent LLM Framework using LangGraph

Artificial intelligence (AI) is rapidly transforming the way we live and work, and the domain of software engineering…

medium.com

Agent programmer: write code based on requirement

Agent tester: generate input test cases and expected output based on requirement and code to test

Agent executor: execute code in Python environment with code to test, input test cases and expected output

Agent debugger: debug code error using LLM knowledge, send back to executor

The core piece is reflection from the executor using test cases execution and result from execution environment (success or error). Debugger then uses error message to improve the code.

Let’s review agent code to understand pattern and internals

class Code(BaseModel):
    """Plan to follow in future"""

    code: str = Field(
        description="Detailed optmized error-free Python code on the provided requirements"
    )from langchain.chains.openai_functions import create_structured_output_runnable
from langchain_core.prompts import ChatPromptTemplatecode_gen_prompt = ChatPromptTemplate.from_template(
    '''**Role**: You are a expert software python programmer. You need to develop python code
**Task**: As a programmer, you are required to complete the function. Use a Chain-of-Thought approach to break
down the problem, create pseudocode, and then write the code in Python language. Ensure that your code is
efficient, readable, and well-commented.**Instructions**:
1. **Understand and Clarify**: Make sure you understand the task.
2. **Algorithm/Method Selection**: Decide on the most efficient way.
3. **Pseudocode Creation**: Write down the steps you will follow in pseudocode.
4. **Code Generation**: Translate your pseudocode into executable Python code*REQURIEMENT*
{requirement}'''
)
coder = create_structured_output_runnable(
    Code, llm, code_gen_prompt
)
code_ = coder.invoke({'requirement':'Generate fibbinaco series'})

The “Code” agent has the attribute code which has the description. The description is very important as after LLM generates the output, it needs to be parsed returned code (using OpenAI function) into structured field “code”. Then we use agent.invoke and pass parameter. The “code” field has the generated code.

# print(code_.code)
def fibonacci(n):
    fib = [0, 1]
    for i in range(2, n+1):
        fib.append(fib[i-1] + fib[i-2])
    return fib

n = int(input('Enter the number of terms: '))
result = fibonacci(n)
print(result)

The Tester agent has two fields: Inputs, Outputs

class Test(BaseModel):
    """Plan to follow in future"""

    Input: List[List] = Field(
        description="Input for Test cases to evaluate the provided code"
    )
    Output: List[List] = Field(
        description="Expected Output for Test cases to evaluate the provided code"
    )from langchain.chains.openai_functions import create_structured_output_runnable
from langchain_core.prompts import ChatPromptTemplatetest_gen_prompt = ChatPromptTemplate.from_template(
    '''**Role**: As a tester, your task is to create Basic and Simple test cases based on provided Requirement and Python Code. 
These test cases should encompass Basic, Edge scenarios to ensure the code's robustness, reliability, and scalability.
**1. Basic Test Cases**:
- **Objective**: Basic and Small scale test cases to validate basic functioning 
**2. Edge Test Cases**:
- **Objective**: To evaluate the function's behavior under extreme or unusual conditions.
**Instructions**:
- Implement a comprehensive set of test cases based on requirements.
- Pay special attention to edge cases as they often reveal hidden bugs.
- Only Generate Basics and Edge cases which are small
- Avoid generating Large scale and Medium scale test case. Focus only small, basic test-cases
*REQURIEMENT*
{requirement}
**Code**
{code}
'''
)
tester_agent = create_structured_output_runnable(
    Test, llm, test_gen_prompt
)

The parameters are passed in as JSON

test_ = tester_agent.invoke({'requirement':'Generate fibbinaco series','code':code_.code})

Result

# print(test_)
Input=[[0], [1], [2], [5], [10], [-1], [1.5], ['three'], [None]], Output=[[[]], [[0]], [[0, 1]], [[0, 1, 1, 2, 3]], [[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]], [[]], [[]], [[]], [[]]]

Executor and debugger agent have similar structures.

Now it is time to define state graph to connect agents. How agent communicates between each other is using state, e.g. executor agent uses “code” and “tests.input”, “tests.output”, and returns “code”, “errors”, which will be stored in state, which will be fetched by debugger agent.

class AgentCoder(TypedDict):
    requirement: str
    code: str
    tests: Dict[str, any]
    errors: Optional[str]

def executer(state):
    print(f'Entering in Executer')
    tests = state['tests']
    input_ = tests['input']
    output_ = tests['output']
    code = state['code']
    executable_code = execution.invoke({"code":code,"input":input_,'output':output_})
    #print(f"Executable Code - {executable_code.code}")
    error = None
    try:
        exec(executable_code.code)
        print("Code Execution Successful")
    except Exception as e:
        print('Found Error While Running')
        error = f"Execution Error : {e}"
    return {'code':executable_code.code,'errors':error}def debugger(state):
    print(f'Entering in Debugger')
    errors = state['errors']
    code = state['code']
    refine_code_ = refine_code.invoke({'code':code,'error':errors})
    return {'code':refine_code_.code,'errors':None}

The workflow begins with programmer agent, then tester, then executor.

from langgraph.graph import END, StateGraph

workflow = StateGraph(AgentCoder)workflow.set_entry_point("programmer")
workflow.add_edge("programmer", "tester")
workflow.add_edge("tester", "executer")

The main reflection piece is between debugger and executor. When workflow reaches to executor agent, it executes the code and returns error if there is. Then workflow will check the state, if there is no error, the workflow ends. If there is error, the conditional edge will move to debugger agent, which generates new code based on current code and error, then workflow goes back to executor.

def decide_to_end(state):
    print(f'Entering in Decide to End')
    if state['errors']:
        return 'debugger'
    else:
        return 'end'

workflow.add_edge("debugger", "executer")workflow.add_conditional_edges(
    "executer",
    decide_to_end,
    {
        "end": END,
        "debugger": "debugger",
    },
)# Compile
app = workflow.compile()

You should set recursion limit so it won’t debug forever (sometimes LLM may be stuck in fixing bugs).

from langchain_core.messages import HumanMessage
config = {"recursion_limit": 50}
inputs = {"requirement": requirement}
running_dict = {}
async for event in app.astream(inputs, config=config):
    for k, v in event.items():
        running_dict[k] = v
        if k != "__end__":
            print(v)
            print('----------'*20)

Langgraph official sample

LangGraph for Code Generation

Key Links * LangGraph cookbook * Video Motivation Code generation and analysis are two of most important applications…

blog.langchain.dev

Similar idea like above but with following differences

No test cases generation. Only check if code can run.
Code generation agent separates imports and code into different attributes
Code execution check and import check are combined to “code check” agent: imports tests and code execution tests.
Reflect is providing suggestions based on code and error messages instead of generating new code. It is up to code generation agent to fix the code.
You can also skip reflect agent, then the error message is directly sent to code generation agent.

langgraph/examples/code_assistant/langgraph_code_assistant.ipynb at main · langchain-ai/langgraph

Contribute to langchain-ai/langgraph development by creating an account on GitHub.

github.com