How to Build a Multi-Agent System with LangGraph and Tool Use

How to Build a Multi-Agent System with LangGraph and Tool Use
- Why Multi-Agent Systems Matter in Production
- Prerequisites and Environment Setup
Python 3.10+ required
Create a virtual environment
Install core dependencies
- Architecture Overview: The Research Analysis Pipeline
- Core Implementation: Building the Multi-Agent Graph
  - Step 1: Define the Shared State Schema
  - Step 2: Create Tool Definitions

📺 Watch: Neural Networks Explained

Video by 3Blue1Brown

Building production-grade multi-agent systems has become increasingly accessible with the release of LangGraph, a library designed specifically for creating stateful, cyclical agent architectures. In this tutorial, we'll construct a complete multi-agent system that coordinates specialized agents for research, analysis, and summarization tasks using real tool integrations.

Why Multi-Agent Systems Matter in Production

Single-agent systems often struggle with complex workflows that require diverse expertise. A research agent might excel at gathering information but fail at data analysis, while a summarization agent might miss critical details. By decomposing tasks into specialized agents coordinated through a shared state graph, we achieve:

Fault isolation: One agent's failure doesn't crash the entire system
Specialized optimization: Each agent uses tools and prompts tailored to its domain
Scalable parallelism: Independent agents can execute concurrently
Auditable decision paths: The graph structure provides clear execution traces

According to LangChain's documentation, LangGraph extends LangChain with the ability to create cyclic graphs, enabling agent loops, human-in-the-loop workflows, and persistent state management across multiple turns.

Prerequisites and Environment Setup

Before diving into implementation, ensure you have the following installed:

# Python 3.10+ required
python --version  # Should show 3.10.x or higher

# Create a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install core dependencies
pip install langgraph==0.2.0 langchain==0.3.0 langchain-openai [8]==0.2.0
pip install tavily-python==0.5.0  # For web search tool
pip install python-dotenv==1.0.0
pip install pydantic==2.9.0

Create a .env file with your API keys:

OPENAI_API_KEY=sk-your-key-here
TAVILY_API_KEY=tvly-your-key-here

The Tavily search API provides real-time web search results optimized for AI agents. As of 2026, Tavily offers a free tier with 1,000 API calls per month, making it suitable for development and testing.

Architecture Overview: The Research Analysis Pipeline

Our multi-agent system will consist of three specialized agents coordinated through a LangGraph state machine:

Research Agent: Gathers information using web search and document retrieval tools
Analysis Agent: Processes raw data, identifies patterns, and generates insights
Summary Agent: Synthesizes findings into coherent, actionable summaries

The graph structure allows each agent to pass its output to the next, with the ability to loop back for refinement if needed.

[User Query] → Research Agent → Analysis Agent → Summary Agent → [Final Output]
                     ↑                |                |
                     └─── Refinement Loop (optional) ───┘

Core Implementation: Building the Multi-Agent Graph

Step 1: Define the Shared State Schema

LangGraph requires a typed state schema that all agents can read and write to. We'll use Pydantic for validation:

from typing import List, Dict, Optional, Any
from pydantic import BaseModel, Field
from langgraph.graph import StateGraph, END
from langgraph.checkpoint import MemorySaver
import operator

class AgentState(BaseModel):
    """Shared state for the multi-agent system."""
    query: str = Field(description="Original user query")
    research_results: List[Dict[str, Any]] = Field(
        default_factory=list,
        description="Raw research data from web search"
    )
    analysis_results: Optional[str] = Field(
        default=None,
        description="Processed analysis output"
    )
    summary: Optional[str] = Field(
        default=None,
        description="Final summary for user"
    )
    iteration_count: int = Field(
        default=0,
        description="Number of refinement iterations"
    )
    errors: List[str] = Field(
        default_factory=list,
        description="Error messages from agent failures"
    )
    metadata: Dict[str, Any] = Field(
        default_factory=dict,
        description="Additional context for debugging"
    )

The Field descriptors with default_factory ensure proper initialization. The iteration_count field prevents infinite loops by capping refinement cycles.

Step 2: Create Tool Definitions

Tools are the interface between agents and external systems. We'll implement two tools: web search and a simple calculator for numerical analysis:

from langchain.tools import tool
from langchain_community.tools.tavily_search import TavilySearchResults
from typing import Union, List

@tool
def web_search(query: str, max_results: int = 5) -> List[Dict[str, str]]:
    """
    Search the web for current information on a topic.
    Returns a list of dictionaries with 'title', 'url', and 'content' keys.
    """
    search = TavilySearchResults(
        max_results=max_results,
        search_depth="advanced"  # Uses full web scraping
    )
    results = search.invoke(query)
    return [
        {
            "title": r.get("title", ""),
            "url": r.get("url", ""),
            "content": r.get("content", "")
        }
        for r in results
    ]

@tool
def calculate(expression: str) -> Union[float, str]:
    """
    Safely evaluate a mathematical expression.
    Supports basic arithmetic, exponents, and trigonometric functions.
    """
    import math
    safe_dict = {
        "abs": abs, "round": round, "min": min, "max": max,
        "sum": sum, "pow": pow, "sqrt": math.sqrt,
        "sin": math.sin, "cos": math.cos, "tan": math.tan,
        "pi": math.pi, "e": math.e
    }
    try:
        # Use eval with restricted globals for safety
        result = eval(expression, {"__builtins__": {}}, safe_dict)
        return float(result)
    except Exception as e:
        return f"Calculation error: {str(e)}"

The web_search tool uses Tavily's advanced search depth, which performs full page scraping rather than just extracting snippets. This provides richer context for the analysis agent. The calculate tool uses a restricted eval environment to prevent code injection attacks.

Step 3: Implement Agent Nodes

Each agent is a LangChain Runnable that processes the shared state and returns updates. We'll use OpenAI's GPT [6]-4o-mini for cost efficiency while maintaining quality:

from langchain_openai import ChatOpenAI
from langchain.agents import create_openai_functions_agent, AgentExecutor
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.schema import SystemMessage, HumanMessage

# Initialize the LLM with temperature control
llm = ChatOpenAI(
    model="gpt-4o-mini",  # Cost-effective for multi-agent workflows
    temperature=0.3,       # Low temperature for consistent outputs
    max_tokens=4096
)

def create_research_agent():
    """Creates the research agent with web search capability."""
    system_prompt = """You are a research specialist. Your job is to:
1. Analyze the user's query to identify key search terms
2. Use the web_search tool to find relevant, current information
3. Extract and organize the most important facts and data points
4. Return structured research results with source URLs

Focus on authoritative sources and recent information (last 2 years).
If the search returns insufficient results, try alternative search terms."""

    prompt = ChatPromptTemplate.from_messages([
        ("system", system_prompt),
        MessagesPlaceholder(variable_name="chat_history", optional=True),
        ("human", "{input}"),
        MessagesPlaceholder(variable_name="agent_scratchpad")
    ])

    agent = create_openai_functions_agent(
        llm=llm,
        tools=[web_search],
        prompt=prompt
    )

    return AgentExecutor(
        agent=agent,
        tools=[web_search],
        verbose=True,
        max_iterations=3,  # Prevent infinite search loops
        early_stopping_method="generate"
    )

def create_analysis_agent():
    """Creates the analysis agent with calculation capability."""
    system_prompt = """You are a data analyst. Given research results, you must:
1. Identify patterns, trends, and correlations in the data
2. Use the calculate tool for any numerical analysis
3. Highlight contradictions or gaps in the research
4. Provide actionable insights based on the evidence

Structure your analysis with clear sections:
- Key Findings
- Data Patterns
- Contradictions/Gaps
- Recommendations"""

    prompt = ChatPromptTemplate.from_messages([
        ("system", system_prompt),
        MessagesPlaceholder(variable_name="chat_history", optional=True),
        ("human", "{input}"),
        MessagesPlaceholder(variable_name="agent_scratchpad")
    ])

    agent = create_openai_functions_agent(
        llm=llm,
        tools=[calculate],
        prompt=prompt
    )

    return AgentExecutor(
        agent=agent,
        tools=[calculate],
        verbose=True,
        max_iterations=2,
        early_stopping_method="generate"
    )

def create_summary_agent():
    """Creates the summary agent (no tools needed)."""
    system_prompt = """You are a professional summarizer. Given research and analysis:
1. Synthesize the most important findings into a coherent narrative
2. Use clear, non-technical language suitable for a general audience
3. Include specific data points and source citations where relevant
4. End with a concise conclusion that answers the original query

Format your summary as:
- Executive Summary (2-3 sentences)
- Key Findings (bullet points)
- Detailed Analysis (paragraphs)
- Conclusion"""

    prompt = ChatPromptTemplate.from_messages([
        ("system", system_prompt),
        ("human", "{input}")
    ])

    # Summary agent doesn't need tools, just the LLM
    chain = prompt | llm
    return chain

Each agent has a specialized system prompt that defines its role and output format. The max_iterations parameter prevents agents from getting stuck in tool loops. The early_stopping_method="generate" tells the agent to generate a final response when it can't find more tools to call.

Step 4: Define Graph Nodes and Edges

Now we wire the agents into a LangGraph state machine:

from langgraph.graph import StateGraph, END
from langgraph.checkpoint import MemorySaver
from typing import Dict, Any

# Initialize the graph with our state schema
workflow = StateGraph(AgentState)

# Create agent instances
research_agent = create_research_agent()
analysis_agent = create_analysis_agent()
summary_agent = create_summary_agent()

def research_node(state: AgentState) -> Dict[str, Any]:
    """Execute research agent and update state."""
    try:
        result = research_agent.invoke({
            "input": state.query,
            "chat_history": []
        })
        return {
            "research_results": result.get("output", ""),
            "metadata": {"research_agent_output": result}
        }
    except Exception as e:
        return {
            "errors": state.errors + [f"Research agent failed: {str(e)}"],
            "research_results": []
        }

def analysis_node(state: AgentState) -> Dict[str, Any]:
    """Execute analysis agent on research results."""
    if not state.research_results:
        return {"analysis_results": "No research data available for analysis."}

    try:
        # Format research results for the analysis agent
        input_text = f"Query: {state.query}\n\nResearch Results:\n{state.research_results}"
        result = analysis_agent.invoke({
            "input": input_text,
            "chat_history": []
        })
        return {
            "analysis_results": result.get("output", ""),
            "metadata": {**state.metadata, "analysis_agent_output": result}
        }
    except Exception as e:
        return {
            "errors": state.errors + [f"Analysis agent failed: {str(e)}"],
            "analysis_results": "Analysis failed due to an error."
        }

def summary_node(state: AgentState) -> Dict[str, Any]:
    """Execute summary agent on analysis results."""
    if not state.analysis_results:
        return {"summary": "No analysis available to summarize."}

    try:
        input_text = f"Original Query: {state.query}\n\nAnalysis:\n{state.analysis_results}"
        result = summary_agent.invoke({"input": input_text})
        return {
            "summary": result.content,
            "metadata": {**state.metadata, "summary_agent_output": result}
        }
    except Exception as e:
        return {
            "errors": state.errors + [f"Summary agent failed: {str(e)}"],
            "summary": "Summary generation failed."
        }

def should_continue(state: AgentState) -> str:
    """Determine if we should refine or end the workflow."""
    # Check for errors
    if len(state.errors) > 2:
        return "end"

    # Check iteration limit
    if state.iteration_count >= 3:
        return "end"

    # Check if summary is satisfactory (simple heuristic)
    if state.summary and len(state.summary) > 100:
        return "end"

    return "continue"

# Add nodes to the graph
workflow.add_node("research", research_node)
workflow.add_node("analysis", analysis_node)
workflow.add_node("summary", summary_node)

# Add edges
workflow.set_entry_point("research")
workflow.add_edge("research", "analysis")
workflow.add_edge("analysis", "summary")

# Add conditional edge for refinement
workflow.add_conditional_edges(
    "summary",
    should_continue,
    {
        "continue": "research",  # Loop back for refinement
        "end": END
    }
)

# Compile the graph with checkpointing
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

The should_continue function implements a simple quality gate. In production, you might use more sophisticated metrics like semantic similarity scores or user feedback signals.

Step 5: Execute the Multi-Agent System

Let's test our system with a real-world query:

import json
from uuid import uuid4

# Create a unique thread ID for state persistence
config = {"configurable": {"thread_id": str(uuid4())}}

# Initial state
initial_state = AgentState(
    query="What are the latest advancements in solid-state battery technology for electric vehicles in 2025-2026?"
)

# Run the workflow
for event in app.stream(initial_state, config):
    for node_name, output in event.items():
        print(f"\n{'='*50}")
        print(f"Node: {node_name}")
        print(f"{'='*50}")

        if node_name == "research":
            print(f"Research completed. Results length: {len(str(output))} chars")
        elif node_name == "analysis":
            print(f"Analysis completed. Output length: {len(str(output))} chars")
        elif node_name == "summary":
            print(f"\nFinal Summary:\n{output.get('summary', 'No summary generated')}")

# Retrieve final state
final_state = app.get_state(config)
print(f"\nFinal state errors: {final_state.values.get('errors', [])}")
print(f"Iterations: {final_state.values.get('iteration_count', 0)}")

The stream method yields events as each node completes, allowing real-time monitoring. The MemorySaver checkpoint stores intermediate states, enabling recovery from failures.

Edge Cases and Production Considerations

Error Handling and Retry Logic

In production, agents will encounter API rate limits, network timeouts, and malformed responses. Implement exponential backoff:

import time
from functools import wraps

def retry_with_backoff(max_retries=3, base_delay=1.0):
    """Decorator for retrying agent calls with exponential backoff."""
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(max_retries):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    if attempt == max_retries - 1:
                        raise
                    delay = base_delay * (2 ** attempt)
                    print(f"Attempt {attempt + 1} failed. Retrying in {delay}s..")
                    time.sleep(delay)
            return None
        return wrapper
    return decorator

# Apply to agent nodes
@retry_with_backoff(max_retries=3)
def robust_research_node(state: AgentState) -> Dict[str, Any]:
    return research_node(state)

Memory Management

LangGraph's checkpointing stores the entire state history. For long-running workflows, implement state pruning:

def prune_state(state: AgentState, max_history: int = 5) -> AgentState:
    """Remove old research results to limit memory usage."""
    if len(state.research_results) > max_history:
        state.research_results = state.research_results[-max_history:]
    return state

Token Budget Management

Each agent call consumes tokens. Track usage to avoid unexpected costs:

from langchain.callbacks import get_openai_callback

def tracked_agent_call(agent, input_data):
    """Track token usage for an agent call."""
    with get_openai_callback() as cb:
        result = agent.invoke(input_data)
        print(f"Tokens used: {cb.total_tokens} (Prompt: {cb.prompt_tokens}, Completion: {cb.completion_tokens})")
        print(f"Cost: ${cb.total_cost:.4f}")
    return result

Testing and Validation

Write unit tests for each agent in isolation:

import pytest
from unittest.mock import patch

def test_research_agent_empty_query():
    """Research agent should handle empty queries gracefully."""
    agent = create_research_agent()
    result = agent.invoke({"input": "", "chat_history": []})
    assert "output" in result
    assert len(result["output"]) > 0  # Should return a helpful message

def test_analysis_agent_no_data():
    """Analysis agent should handle missing research data."""
    state = AgentState(query="test", research_results=[])
    result = analysis_node(state)
    assert "No research data" in result["analysis_results"]

@pytest.mark.asyncio
async def test_full_workflow():
    """End-to-end test with a simple query."""
    state = AgentState(query="What is the capital of France?")
    config = {"configurable": {"thread_id": "test-123"}}

    async for event in app.astream(state, config):
        pass

    final = app.get_state(config)
    assert "Paris" in final.values.get("summary", "")

Performance Optimization

For production deployments, consider these optimizations:

Parallel agent execution: Use asyncio.gather for independent agents
Caching: Cache web search results for identical queries using Redis
Model quantization: Use smaller models (e.g., GPT-4o-mini) for routine tasks
Batch processing: Process multiple queries in a single graph execution

import asyncio

async def parallel_research(queries: List[str]) -> List[Dict]:
    """Execute multiple research queries in parallel."""
    tasks = [research_agent.ainvoke({"input": q, "chat_history": []}) for q in queries]
    results = await asyncio.gather(*tasks, return_exceptions=True)
    return [r for r in results if not isinstance(r, Exception)]

What's Next

This multi-agent system provides a foundation for building complex AI workflows. To extend it:

Add human-in-the-loop: Use LangGraph's interrupt function to pause for human approval before critical decisions
Integrate vector databases: Store research results in Pinecone or Weaviate [9] for long-term memory
Implement agent routing: Use a router agent to dynamically assign tasks to specialized sub-agents
Add monitoring: Integrate with LangSmith for tracing and debugging agent behavior

The complete source code is available on GitHub. For more advanced patterns, explore LangGraph's official documentation which includes examples of multi-agent collaboration, tool delegation, and state persistence.

Remember that multi-agent systems introduce complexity in coordination and debugging. Start with a simple two-agent system, validate each component independently, then gradually add sophistication. The graph structure makes it easy to visualize and debug the decision flow, which is invaluable when things go wrong in production.

References

1. Wikipedia - OpenAI. Wikipedia. [Source]

2. Wikipedia - List of generation IV Pokémon. Wikipedia. [Source]

3. Wikipedia - GPT. Wikipedia. [Source]

4. GitHub - openai/openai-python. Github. [Source]

5. GitHub - weaviate/weaviate. Github. [Source]

6. GitHub - Significant-Gravitas/AutoGPT. Github. [Source]

7. GitHub - pinecone-io/python-sdk. Github. [Source]

8. OpenAI Pricing. Pricing. [Source]

9. Weaviate Pricing. Pricing. [Source]

How to Build a Multi-Agent System with LangGraph and Tool Use

How to Build a Multi-Agent System with LangGraph and Tool Use

Table of Contents

📺 Watch: Neural Networks Explained

Why Multi-Agent Systems Matter in Production

Prerequisites and Environment Setup

Architecture Overview: The Research Analysis Pipeline

Core Implementation: Building the Multi-Agent Graph

Step 1: Define the Shared State Schema

Step 2: Create Tool Definitions

Step 3: Implement Agent Nodes

Step 4: Define Graph Nodes and Edges

Step 5: Execute the Multi-Agent System

Edge Cases and Production Considerations

Error Handling and Retry Logic

Memory Management

Token Budget Management

Testing and Validation

Performance Optimization

What's Next

References

Was this article helpful?

Related Articles

How to Analyze Security Logs with DeepSeek Locally

How to Build a Multimodal App with Gemini 2.0 Vision API

How to Build an AI Research Assistant with Perplexity API