How to Build Your First Multi-Agent AI System: A Step-by-Step Guide for Developers
Learn how to build a multi-agent AI system from scratch using Python — research, write, and review agents working together in an orchestrated pipeline.
By CoddyKit · 5 min read · 995 wordsWhy Multi-Agent AI Is the Biggest Shift in 2026
If you have been following GitHub trending repositories this year, one pattern is impossible to miss: multi-agent AI frameworks are exploding. OpenAI's openai-agents-python crossed 21,000 stars, and over 73 percent of developers hired into AI-adjacent roles in late 2025 cited open-source agent repositories as their primary learning resource. The AI coding market passed 90 percent developer adoption in April 2026, and the frontier has moved past single-prompt assistants toward orchestrated teams of specialised agents.
This tutorial walks you through building your first multi-agent system from scratch using Python. By the end, you will have a working system where three agents collaborate to solve a real task.
What Is a Multi-Agent System?
Instead of asking one large language model to do everything, you assign different roles to different agents, each with its own instructions, tools, and output format. A simple example:
- Researcher Agent — gathers information and sources.
- Writer Agent — drafts content based on the research.
- Reviewer Agent — critiques and improves the draft.
The result is higher quality output, because each agent focuses on one thing and passes structured results to the next.
Prerequisites
- Python 3.11 or higher
- An API key from any LLM provider (OpenAI, Anthropic, or an open-weight model via a local endpoint)
- Basic familiarity with async Python (
asyncio)
Step 1: Install Dependencies
We will use the OpenAI Agents SDK for orchestration, though the same patterns work with any framework.
pip install openai-agents python-dotenv
Create a .env file with your API key:
OPENAI_API_KEY=sk-your-key-here
Step 2: Define the Agent Class
Each agent is an object with a name, a system prompt, and optionally a set of tools it can call.
import os
from openai_agents import Agent, Runner
from dotenv import load_dotenv
load_dotenv()
class MultiAgentPipeline:
def __init__(self):
self.researcher = Agent(
name="Researcher",
instructions=(
"You are a research specialist. Given a topic, "
"find and summarise the 3-5 most important facts. "
"Return only a numbered list of facts, nothing else."
),
model="gpt-4o",
)
self.writer = Agent(
name="Writer",
instructions=(
"You are a technical writer. Given a topic and a list "
"of research facts, write a clear, engaging blog post "
"of approximately 500 words. Use Markdown formatting."
),
model="gpt-4o",
)
self.reviewer = Agent(
name="Reviewer",
instructions=(
"You are an editor. Review the draft for clarity, "
"accuracy, and tone. Return the improved version "
"with a brief summary of changes at the top."
),
model="gpt-4o",
)
Step 3: Build the Orchestration Pipeline
The pipeline runs agents sequentially, passing each agent's output as context to the next.
async def run(self, topic: str):
# Phase 1: Research
research_result = await Runner.run(
self.researcher,
f"Research this topic: {topic}",
)
print(f"[Researcher done — {len(research_result.final_output)} chars]")
# Phase 2: Write
write_prompt = (
f"Topic: {topic}\n\n"
f"Research findings:\n{research_result.final_output}\n\n"
f"Write the blog post now."
)
draft_result = await Runner.run(self.writer, write_prompt)
print(f"[Writer done — {len(draft_result.final_output)} chars]")
# Phase 3: Review
review_prompt = (
f"Here is a draft blog post:\n\n{draft_result.final_output}\n\n"
f"Review and improve it."
)
final_result = await Runner.run(self.reviewer, review_prompt)
print(f"[Reviewer done — {len(final_result.final_output)} chars]")
return final_result.final_output
Step 4: Run the System
Add an entry point and watch three agents collaborate:
async def main():
pipeline = MultiAgentPipeline()
result = await pipeline.run("The rise of AI coding agents in 2026")
print("\n=== FINAL OUTPUT ===\n")
print(result)
if __name__ == "__main__":
import asyncio
asyncio.run(main())
Step 5: Add Tools to Agents
Agents become dramatically more useful when they can call external tools. Here is how to add a web-search tool to the Researcher:
from openai_agents import function_tool
@function_tool
async def web_search(query: str) -> str:
"""Search the web and return top results."""
import httpx
resp = await httpx.AsyncClient().get(
"https://api.search.com/search",
params={"q": query, "limit": 5},
)
return resp.json()["results"]
# Attach to the agent:
self.researcher = Agent(
name="Researcher",
instructions="...",
model="gpt-4o",
tools=[web_search],
)
Now the Researcher can perform real web searches instead of relying on training data alone.
Architecture Overview
| Phase | Agent | Input | Output |
|---|---|---|---|
| 1 | Researcher | Topic string | List of facts |
| 2 | Writer | Topic + facts | Draft article |
| 3 | Reviewer | Draft article | Final polished article |
Best Practices for Production
1. Keep Prompts Short and Specific
Long system prompts confuse models. Aim for 3-5 sentences per agent, describing the role, the expected output format, and any constraints.
2. Use Structured Output
Ask agents to return JSON or clearly delimited text. This makes it easy to parse results and pass them to the next agent without hallucination risk.
3. Add Retry Logic
LLM calls can fail or return malformed output. Wrap each pipeline step in a retry decorator with a maximum of 3 attempts:
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=2))
async def safe_run(agent, prompt):
return await Runner.run(agent, prompt)
4. Monitor Costs
Multi-agent systems consume more tokens than single-prompt setups. Track token usage per agent and set budget limits. Most SDKs expose token counts in the result object.
When Should You Use Multi-Agent?
Not every task needs multiple agents. Use a multi-agent approach when:
- The task naturally breaks into distinct sub-tasks (research, write, review).
- Different agents need different tools (web search, code execution, database queries).
- Quality matters more than speed — the overhead of multiple LLM calls is justified.
For simple Q&A or code completion, a single well-prompted model is faster and cheaper.
What Comes Next
Once your sequential pipeline works, explore these patterns:
- Supervisor pattern — a manager agent that decides which worker agent to call next, enabling dynamic workflows.
- Parallel execution — run independent agents concurrently with
asyncio.gather()for speed. - Human-in-the-loop — pause the pipeline at review points for human approval before continuing.
- Memory layers — add a shared vector store so agents can reference previous conversations and build long-term context.
Conclusion
Multi-agent AI systems are not hype — they are the practical evolution of how developers interact with LLMs in 2026. With the OpenAI Agents SDK (or any equivalent framework), you can build a working pipeline in under 100 lines of code. Start with three agents, add tools incrementally, and measure output quality against your single-agent baseline. The results will speak for themselves.