How to Build a Multi-Agent System with LangGraph and Tool Use
Practical tutorial: Create a multi-agent system with LangGraph and tool use
How to Build a Multi-Agent System with LangGraph and Tool Use
Table of Contents
- How to Build a Multi-Agent System with LangGraph and Tool Use
- Python 3.10+ required
- Create a virtual environment
- Install core dependencies
πΊ Watch: Neural Networks Explained
Video by 3Blue1Brown
Building production-grade multi-agent systems has become increasingly accessible with the release of LangGraph, a library designed specifically for creating stateful, cyclical agent architectures. In this tutorial, we'll construct a complete multi-agent system that coordinates specialized agents for research, analysis, and summarization tasks using real tool integrations.
Why Multi-Agent Systems Matter in Production
Single-agent systems often struggle with complex workflows that require diverse expertise. A research agent might excel at gathering information but fail at data analysis, while a summarization agent might miss critical details. By decomposing tasks into specialized agents coordinated through a shared state graph, we achieve:
- Fault isolation: One agent's failure doesn't crash the entire system
- Specialized optimization: Each agent uses tools and prompts tailored to its domain
- Scalable parallelism: Independent agents can execute concurrently
- Auditable decision paths: The graph structure provides clear execution traces
According to LangChain's documentation, LangGraph extends LangChain with the ability to create cyclic graphs, enabling agent loops, human-in-the-loop workflows, and persistent state management across multiple turns.
Prerequisites and Environment Setup
Before diving into implementation, ensure you have the following installed:
# Python 3.10+ required
python --version # Should show 3.10.x or higher
# Create a virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install core dependencies
pip install langgraph==0.2.0 langchain==0.3.0 langchain-openai [8]==0.2.0
pip install tavily-python==0.5.0 # For web search tool
pip install python-dotenv==1.0.0
pip install pydantic==2.9.0
Create a .env file with your API keys:
OPENAI_API_KEY=sk-your-key-here
TAVILY_API_KEY=tvly-your-key-here
The Tavily search API provides real-time web search results optimized for AI agents. As of 2026, Tavily offers a free tier with 1,000 API calls per month, making it suitable for development and testing.
Architecture Overview: The Research Analysis Pipeline
Our multi-agent system will consist of three specialized agents coordinated through a LangGraph state machine:
- Research Agent: Gathers information using web search and document retrieval tools
- Analysis Agent: Processes raw data, identifies patterns, and generates insights
- Summary Agent: Synthesizes findings into coherent, actionable summaries
The graph structure allows each agent to pass its output to the next, with the ability to loop back for refinement if needed.
[User Query] β Research Agent β Analysis Agent β Summary Agent β [Final Output]
β | |
ββββ Refinement Loop (optional) ββββ
Core Implementation: Building the Multi-Agent Graph
Step 1: Define the Shared State Schema
LangGraph requires a typed state schema that all agents can read and write to. We'll use Pydantic for validation:
from typing import List, Dict, Optional, Any
from pydantic import BaseModel, Field
from langgraph.graph import StateGraph, END
from langgraph.checkpoint import MemorySaver
import operator
class AgentState(BaseModel):
"""Shared state for the multi-agent system."""
query: str = Field(description="Original user query")
research_results: List[Dict[str, Any]] = Field(
default_factory=list,
description="Raw research data from web search"
)
analysis_results: Optional[str] = Field(
default=None,
description="Processed analysis output"
)
summary: Optional[str] = Field(
default=None,
description="Final summary for user"
)
iteration_count: int = Field(
default=0,
description="Number of refinement iterations"
)
errors: List[str] = Field(
default_factory=list,
description="Error messages from agent failures"
)
metadata: Dict[str, Any] = Field(
default_factory=dict,
description="Additional context for debugging"
)
The Field descriptors with default_factory ensure proper initialization. The iteration_count field prevents infinite loops by capping refinement cycles.
Step 2: Create Tool Definitions
Tools are the interface between agents and external systems. We'll implement two tools: web search and a simple calculator for numerical analysis:
from langchain.tools import tool
from langchain_community.tools.tavily_search import TavilySearchResults
from typing import Union, List
@tool
def web_search(query: str, max_results: int = 5) -> List[Dict[str, str]]:
"""
Search the web for current information on a topic.
Returns a list of dictionaries with 'title', 'url', and 'content' keys.
"""
search = TavilySearchResults(
max_results=max_results,
search_depth="advanced" # Uses full web scraping
)
results = search.invoke(query)
return [
{
"title": r.get("title", ""),
"url": r.get("url", ""),
"content": r.get("content", "")
}
for r in results
]
@tool
def calculate(expression: str) -> Union[float, str]:
"""
Safely evaluate a mathematical expression.
Supports basic arithmetic, exponents, and trigonometric functions.
"""
import math
safe_dict = {
"abs": abs, "round": round, "min": min, "max": max,
"sum": sum, "pow": pow, "sqrt": math.sqrt,
"sin": math.sin, "cos": math.cos, "tan": math.tan,
"pi": math.pi, "e": math.e
}
try:
# Use eval with restricted globals for safety
result = eval(expression, {"__builtins__": {}}, safe_dict)
return float(result)
except Exception as e:
return f"Calculation error: {str(e)}"
The web_search tool uses Tavily's advanced search depth, which performs full page scraping rather than just extracting snippets. This provides richer context for the analysis agent. The calculate tool uses a restricted eval environment to prevent code injection attacks.
Step 3: Implement Agent Nodes
Each agent is a LangChain Runnable that processes the shared state and returns updates. We'll use OpenAI's GPT [6]-4o-mini for cost efficiency while maintaining quality:
from langchain_openai import ChatOpenAI
from langchain.agents import create_openai_functions_agent, AgentExecutor
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.schema import SystemMessage, HumanMessage
# Initialize the LLM with temperature control
llm = ChatOpenAI(
model="gpt-4o-mini", # Cost-effective for multi-agent workflows
temperature=0.3, # Low temperature for consistent outputs
max_tokens=4096
)
def create_research_agent():
"""Creates the research agent with web search capability."""
system_prompt = """You are a research specialist. Your job is to:
1. Analyze the user's query to identify key search terms
2. Use the web_search tool to find relevant, current information
3. Extract and organize the most important facts and data points
4. Return structured research results with source URLs
Focus on authoritative sources and recent information (last 2 years).
If the search returns insufficient results, try alternative search terms."""
prompt = ChatPromptTemplate.from_messages([
("system", system_prompt),
MessagesPlaceholder(variable_name="chat_history", optional=True),
("human", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad")
])
agent = create_openai_functions_agent(
llm=llm,
tools=[web_search],
prompt=prompt
)
return AgentExecutor(
agent=agent,
tools=[web_search],
verbose=True,
max_iterations=3, # Prevent infinite search loops
early_stopping_method="generate"
)
def create_analysis_agent():
"""Creates the analysis agent with calculation capability."""
system_prompt = """You are a data analyst. Given research results, you must:
1. Identify patterns, trends, and correlations in the data
2. Use the calculate tool for any numerical analysis
3. Highlight contradictions or gaps in the research
4. Provide actionable insights based on the evidence
Structure your analysis with clear sections:
- Key Findings
- Data Patterns
- Contradictions/Gaps
- Recommendations"""
prompt = ChatPromptTemplate.from_messages([
("system", system_prompt),
MessagesPlaceholder(variable_name="chat_history", optional=True),
("human", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad")
])
agent = create_openai_functions_agent(
llm=llm,
tools=[calculate],
prompt=prompt
)
return AgentExecutor(
agent=agent,
tools=[calculate],
verbose=True,
max_iterations=2,
early_stopping_method="generate"
)
def create_summary_agent():
"""Creates the summary agent (no tools needed)."""
system_prompt = """You are a professional summarizer. Given research and analysis:
1. Synthesize the most important findings into a coherent narrative
2. Use clear, non-technical language suitable for a general audience
3. Include specific data points and source citations where relevant
4. End with a concise conclusion that answers the original query
Format your summary as:
- Executive Summary (2-3 sentences)
- Key Findings (bullet points)
- Detailed Analysis (paragraphs)
- Conclusion"""
prompt = ChatPromptTemplate.from_messages([
("system", system_prompt),
("human", "{input}")
])
# Summary agent doesn't need tools, just the LLM
chain = prompt | llm
return chain
Each agent has a specialized system prompt that defines its role and output format. The max_iterations parameter prevents agents from getting stuck in tool loops. The early_stopping_method="generate" tells the agent to generate a final response when it can't find more tools to call.
Step 4: Define Graph Nodes and Edges
Now we wire the agents into a LangGraph state machine:
from langgraph.graph import StateGraph, END
from langgraph.checkpoint import MemorySaver
from typing import Dict, Any
# Initialize the graph with our state schema
workflow = StateGraph(AgentState)
# Create agent instances
research_agent = create_research_agent()
analysis_agent = create_analysis_agent()
summary_agent = create_summary_agent()
def research_node(state: AgentState) -> Dict[str, Any]:
"""Execute research agent and update state."""
try:
result = research_agent.invoke({
"input": state.query,
"chat_history": []
})
return {
"research_results": result.get("output", ""),
"metadata": {"research_agent_output": result}
}
except Exception as e:
return {
"errors": state.errors + [f"Research agent failed: {str(e)}"],
"research_results": []
}
def analysis_node(state: AgentState) -> Dict[str, Any]:
"""Execute analysis agent on research results."""
if not state.research_results:
return {"analysis_results": "No research data available for analysis."}
try:
# Format research results for the analysis agent
input_text = f"Query: {state.query}\n\nResearch Results:\n{state.research_results}"
result = analysis_agent.invoke({
"input": input_text,
"chat_history": []
})
return {
"analysis_results": result.get("output", ""),
"metadata": {**state.metadata, "analysis_agent_output": result}
}
except Exception as e:
return {
"errors": state.errors + [f"Analysis agent failed: {str(e)}"],
"analysis_results": "Analysis failed due to an error."
}
def summary_node(state: AgentState) -> Dict[str, Any]:
"""Execute summary agent on analysis results."""
if not state.analysis_results:
return {"summary": "No analysis available to summarize."}
try:
input_text = f"Original Query: {state.query}\n\nAnalysis:\n{state.analysis_results}"
result = summary_agent.invoke({"input": input_text})
return {
"summary": result.content,
"metadata": {**state.metadata, "summary_agent_output": result}
}
except Exception as e:
return {
"errors": state.errors + [f"Summary agent failed: {str(e)}"],
"summary": "Summary generation failed."
}
def should_continue(state: AgentState) -> str:
"""Determine if we should refine or end the workflow."""
# Check for errors
if len(state.errors) > 2:
return "end"
# Check iteration limit
if state.iteration_count >= 3:
return "end"
# Check if summary is satisfactory (simple heuristic)
if state.summary and len(state.summary) > 100:
return "end"
return "continue"
# Add nodes to the graph
workflow.add_node("research", research_node)
workflow.add_node("analysis", analysis_node)
workflow.add_node("summary", summary_node)
# Add edges
workflow.set_entry_point("research")
workflow.add_edge("research", "analysis")
workflow.add_edge("analysis", "summary")
# Add conditional edge for refinement
workflow.add_conditional_edges(
"summary",
should_continue,
{
"continue": "research", # Loop back for refinement
"end": END
}
)
# Compile the graph with checkpointing
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)
The should_continue function implements a simple quality gate. In production, you might use more sophisticated metrics like semantic similarity scores or user feedback signals.
Step 5: Execute the Multi-Agent System
Let's test our system with a real-world query:
import json
from uuid import uuid4
# Create a unique thread ID for state persistence
config = {"configurable": {"thread_id": str(uuid4())}}
# Initial state
initial_state = AgentState(
query="What are the latest advancements in solid-state battery technology for electric vehicles in 2025-2026?"
)
# Run the workflow
for event in app.stream(initial_state, config):
for node_name, output in event.items():
print(f"\n{'='*50}")
print(f"Node: {node_name}")
print(f"{'='*50}")
if node_name == "research":
print(f"Research completed. Results length: {len(str(output))} chars")
elif node_name == "analysis":
print(f"Analysis completed. Output length: {len(str(output))} chars")
elif node_name == "summary":
print(f"\nFinal Summary:\n{output.get('summary', 'No summary generated')}")
# Retrieve final state
final_state = app.get_state(config)
print(f"\nFinal state errors: {final_state.values.get('errors', [])}")
print(f"Iterations: {final_state.values.get('iteration_count', 0)}")
The stream method yields events as each node completes, allowing real-time monitoring. The MemorySaver checkpoint stores intermediate states, enabling recovery from failures.
Edge Cases and Production Considerations
Error Handling and Retry Logic
In production, agents will encounter API rate limits, network timeouts, and malformed responses. Implement exponential backoff:
import time
from functools import wraps
def retry_with_backoff(max_retries=3, base_delay=1.0):
"""Decorator for retrying agent calls with exponential backoff."""
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
for attempt in range(max_retries):
try:
return func(*args, **kwargs)
except Exception as e:
if attempt == max_retries - 1:
raise
delay = base_delay * (2 ** attempt)
print(f"Attempt {attempt + 1} failed. Retrying in {delay}s..")
time.sleep(delay)
return None
return wrapper
return decorator
# Apply to agent nodes
@retry_with_backoff(max_retries=3)
def robust_research_node(state: AgentState) -> Dict[str, Any]:
return research_node(state)
Memory Management
LangGraph's checkpointing stores the entire state history. For long-running workflows, implement state pruning:
def prune_state(state: AgentState, max_history: int = 5) -> AgentState:
"""Remove old research results to limit memory usage."""
if len(state.research_results) > max_history:
state.research_results = state.research_results[-max_history:]
return state
Token Budget Management
Each agent call consumes tokens. Track usage to avoid unexpected costs:
from langchain.callbacks import get_openai_callback
def tracked_agent_call(agent, input_data):
"""Track token usage for an agent call."""
with get_openai_callback() as cb:
result = agent.invoke(input_data)
print(f"Tokens used: {cb.total_tokens} (Prompt: {cb.prompt_tokens}, Completion: {cb.completion_tokens})")
print(f"Cost: ${cb.total_cost:.4f}")
return result
Testing and Validation
Write unit tests for each agent in isolation:
import pytest
from unittest.mock import patch
def test_research_agent_empty_query():
"""Research agent should handle empty queries gracefully."""
agent = create_research_agent()
result = agent.invoke({"input": "", "chat_history": []})
assert "output" in result
assert len(result["output"]) > 0 # Should return a helpful message
def test_analysis_agent_no_data():
"""Analysis agent should handle missing research data."""
state = AgentState(query="test", research_results=[])
result = analysis_node(state)
assert "No research data" in result["analysis_results"]
@pytest.mark.asyncio
async def test_full_workflow():
"""End-to-end test with a simple query."""
state = AgentState(query="What is the capital of France?")
config = {"configurable": {"thread_id": "test-123"}}
async for event in app.astream(state, config):
pass
final = app.get_state(config)
assert "Paris" in final.values.get("summary", "")
Performance Optimization
For production deployments, consider these optimizations:
- Parallel agent execution: Use
asyncio.gatherfor independent agents - Caching: Cache web search results for identical queries using Redis
- Model quantization: Use smaller models (e.g., GPT-4o-mini) for routine tasks
- Batch processing: Process multiple queries in a single graph execution
import asyncio
async def parallel_research(queries: List[str]) -> List[Dict]:
"""Execute multiple research queries in parallel."""
tasks = [research_agent.ainvoke({"input": q, "chat_history": []}) for q in queries]
results = await asyncio.gather(*tasks, return_exceptions=True)
return [r for r in results if not isinstance(r, Exception)]
What's Next
This multi-agent system provides a foundation for building complex AI workflows. To extend it:
- Add human-in-the-loop: Use LangGraph's
interruptfunction to pause for human approval before critical decisions - Integrate vector databases: Store research results in Pinecone or Weaviate [9] for long-term memory
- Implement agent routing: Use a router agent to dynamically assign tasks to specialized sub-agents
- Add monitoring: Integrate with LangSmith for tracing and debugging agent behavior
The complete source code is available on GitHub. For more advanced patterns, explore LangGraph's official documentation which includes examples of multi-agent collaboration, tool delegation, and state persistence.
Remember that multi-agent systems introduce complexity in coordination and debugging. Start with a simple two-agent system, validate each component independently, then gradually add sophistication. The graph structure makes it easy to visualize and debug the decision flow, which is invaluable when things go wrong in production.
References
Was this article helpful?
Let us know to improve our AI generation.
Related Articles
How to Analyze Security Logs with DeepSeek Locally
Practical tutorial: Analyze security logs with DeepSeek locally
How to Build a Multimodal App with Gemini 2.0 Vision API
Practical tutorial: Build a multimodal app with Gemini 2.0 Vision API
How to Build an AI Research Assistant with Perplexity API
Practical tutorial: Create an AI research assistant with Perplexity API