Building Your First AI Agent: A Step-by-Step Guide to Tool Calling, Memory, and Skills
AI agents have moved from research papers to production systems in record time. The top open-source agent frameworks on GitHub are attracting thousands of developers daily. If you want to understand how they work — not just how to prompt them — this tutorial walks you through building a functional AI agent from scratch.
By the end, you will have a Python-based agent that can call external tools, maintain conversation memory across sessions, and load reusable skill modules.
What You Will Build
A command-line agent with three core capabilities:
- Tool calling — the agent decides when to use a calculator, file reader, or web search
- Persistent memory — conversation history survives restarts via JSON storage
- Skill system — pluggable modules the agent can load on demand
Step 1: Set Up the Project
Create a new directory and initialize the project:
mkdir my-ai-agent
cd my-ai-agent
python3 -m venv venv
source venv/bin/activate
pip install openai requests
Create the project structure:
my-ai-agent/
├── agent.py
├── memory.py
├── tools/
│ ├── __init__.py
│ ├── calculator.py
│ └── file_reader.py
├── skills/
│ └── __init__.py
└── memory_store.json
Step 2: Define the Tool Interface
Every tool the agent can use follows a common interface. This is the foundation of tool calling:
from abc import ABC, abstractmethod
from typing import Any
class Tool(ABC):
"""Base class for all agent tools."""
@property
@abstractmethod
def name(self) -> str:
pass
@property
@abstractmethod
def description(self) -> str:
pass
@property
@abstractmethod
def parameters(self) -> dict:
pass
@abstractmethod
def execute(self, **kwargs) -> Any:
pass
def to_openai_schema(self) -> dict:
return {
"type": "function",
"function": {
"name": self.name,
"description": self.description,
"parameters": self.parameters,
}
}
This abstract base class ensures every tool describes itself in a format the LLM can understand. The to_openai_schema() method translates the tool definition into the function-calling schema expected by most modern APIs.
Step 3: Implement Concrete Tools
Here are two practical tools:
class CalculatorTool(Tool):
@property
def name(self):
return "calculator"
@property
def description(self):
return "Evaluate a mathematical expression and return the result."
@property
def parameters(self):
return {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "A mathematical expression, e.g. 2 + 3 * 4"
}
},
"required": ["expression"]
}
def execute(self, expression: str) -> str:
import re
cleaned = re.sub(r"[^0-9+\-*/().\s]", "", expression)
try:
result = eval(cleaned)
return f"Result: {result}"
except Exception as e:
return f"Error evaluating expression: {e}"
import os
class FileReaderTool(Tool):
@property
def name(self):
return "file_reader"
@property
def description(self):
return "Read the contents of a file at a given path."
@property
def parameters(self):
return {
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "The file path to read"
}
},
"required": ["path"]
}
def execute(self, path: str) -> str:
if not os.path.exists(path):
return f"Error: File not found: {path}"
try:
with open(path, "r") as f:
return f.read()
except Exception as e:
return f"Error reading file: {e}"
Step 4: Build Persistent Memory
Agents need to remember conversations between sessions. Here is a simple but effective memory system:
import json
import os
from datetime import datetime
class AgentMemory:
"""Persistent conversation memory stored as JSON."""
def __init__(self, store_path="memory_store.json"):
self.store_path = store_path
self.messages = self._load()
def _load(self) -> list:
if os.path.exists(self.store_path):
with open(self.store_path, "r") as f:
return json.load(f)
return []
def _save(self):
with open(self.store_path, "w") as f:
json.dump(self.messages, f, indent=2)
def add(self, role: str, content: str):
self.messages.append({
"role": role,
"content": content,
"timestamp": datetime.now().isoformat()
})
self._save()
def get_messages(self, max_turns=20) -> list:
"""Return the most recent messages."""
return self.messages[-max_turns:]
def clear(self):
self.messages = []
self._save()
The memory auto-saves after every message and loads on startup. The max_turns parameter prevents context window overflow by keeping only recent history.
Step 5: Assemble the Agent
Now bring everything together in the main agent loop:
import json
import os
from openai import OpenAI
from memory import AgentMemory
class Agent:
def __init__(self, api_key: str):
self.client = OpenAI(api_key=api_key)
self.tools = [CalculatorTool(), FileReaderTool()]
self.memory = AgentMemory()
self._build_system_prompt()
def _build_system_prompt(self):
tool_desc = "\n".join(
f"- {t.name}: {t.description}" for t in self.tools
)
self.system_prompt = (
"You are a helpful AI assistant. "
"You have access to these tools:\n"
f"{tool_desc}\n\n"
"When you need a tool, respond with ONLY JSON:\n"
'{"tool": "tool_name", "args": {"key": "value"}}\n\n'
"Otherwise, respond normally."
)
def _parse_tool_call(self, response: str) -> dict | None:
try:
return json.loads(response)
except json.JSONDecodeError:
return None
def _execute_tool(self, call: dict) -> str:
tool_name = call.get("tool")
args = call.get("args", {})
for tool in self.tools:
if tool.name == tool_name:
return tool.execute(**args)
return f"Unknown tool: {tool_name}"
def chat(self, user_input: str) -> str:
self.memory.add("user", user_input)
messages = [
{"role": "system", "content": self.system_prompt},
*self.memory.get_messages(),
]
response = self.client.chat.completions.create(
model="gpt-4",
messages=messages,
)
assistant_msg = response.choices[0].message.content
tool_call = self._parse_tool_call(assistant_msg)
if tool_call:
result = self._execute_tool(tool_call)
self.memory.add("assistant", f"Used tool: {tool_call.get('tool')}")
self.memory.add("system", f"Tool result: {result}")
return result
self.memory.add("assistant", assistant_msg)
return assistant_msg
if __name__ == "__main__":
api_key = os.environ.get("OPENAI_API_KEY")
if not api_key:
print("Set OPENAI_API_KEY environment variable")
exit(1)
agent = Agent(api_key)
print("Agent ready. Type quit to exit.")
while True:
user = input("You: ")
if user.lower() == "quit":
break
reply = agent.chat(user)
print(f"Agent: {reply}")
Step 6: Run It
Set your API key and start chatting:
export OPENAI_API_KEY="your-key-here"
python agent.py
Try asking:
- "What is 42 * 17 + 8?" → uses the calculator tool
- "Read the contents of README.md" → uses the file reader tool
- "Hello, how are you?" → normal conversation, no tool needed
Comparison: Manual Tools vs Native Function Calling
| Aspect | JSON-in-Text (This Tutorial) | Native Function Calling |
|---|---|---|
| Setup complexity | Simple | Requires API support |
| Reliability | Good with clear prompts | Excellent |
| Multi-provider | Works with any LLM | Provider-specific |
| Learning value | High — you see the mechanics | Lower — abstracted away |
Extending the Agent
From here you can add more tools and capabilities:
- Web search tool — integrate the DuckDuckGo or Serper API for real-time information
- Code execution — use the
subprocessmodule to run shell commands safely - Skill modules — create a
skills/directory where each file defines a capability the agent loads at startup - Multi-agent coordination — run multiple agents that communicate via a message bus
Key Takeaways
- Tool calling is fundamentally about giving the LLM structured choices and parsing its decisions
- Persistent memory is just message history saved between turns — keep it simple first
- Every tool follows the same interface: name, description, parameters, and an execute method
- Start with text-based JSON tool calling, then migrate to native function-calling APIs when ready
The pattern you learned here is the same foundation used by production frameworks like LangChain, AutoGen, and CrewAI. Understanding the mechanics at this level makes you a better agent developer regardless of which framework you eventually choose.