Building Your First AI Agent: A Step-by-Step Guide to Tool Calling, Memory, and Skills

AI agents have moved from research papers to production systems in record time. The top open-source agent frameworks on GitHub are attracting thousands of developers daily. If you want to understand how they work — not just how to prompt them — this tutorial walks you through building a functional AI agent from scratch.

By the end, you will have a Python-based agent that can call external tools, maintain conversation memory across sessions, and load reusable skill modules.

What You Will Build

A command-line agent with three core capabilities:

  • Tool calling — the agent decides when to use a calculator, file reader, or web search
  • Persistent memory — conversation history survives restarts via JSON storage
  • Skill system — pluggable modules the agent can load on demand

Step 1: Set Up the Project

Create a new directory and initialize the project:

mkdir my-ai-agent
cd my-ai-agent
python3 -m venv venv
source venv/bin/activate
pip install openai requests

Create the project structure:

my-ai-agent/
├── agent.py
├── memory.py
├── tools/
│   ├── __init__.py
│   ├── calculator.py
│   └── file_reader.py
├── skills/
│   └── __init__.py
└── memory_store.json

Step 2: Define the Tool Interface

Every tool the agent can use follows a common interface. This is the foundation of tool calling:

from abc import ABC, abstractmethod
from typing import Any

class Tool(ABC):
    """Base class for all agent tools."""

    @property
    @abstractmethod
    def name(self) -> str:
        pass

    @property
    @abstractmethod
    def description(self) -> str:
        pass

    @property
    @abstractmethod
    def parameters(self) -> dict:
        pass

    @abstractmethod
    def execute(self, **kwargs) -> Any:
        pass

    def to_openai_schema(self) -> dict:
        return {
            "type": "function",
            "function": {
                "name": self.name,
                "description": self.description,
                "parameters": self.parameters,
            }
        }

This abstract base class ensures every tool describes itself in a format the LLM can understand. The to_openai_schema() method translates the tool definition into the function-calling schema expected by most modern APIs.

Step 3: Implement Concrete Tools

Here are two practical tools:

class CalculatorTool(Tool):
    @property
    def name(self):
        return "calculator"

    @property
    def description(self):
        return "Evaluate a mathematical expression and return the result."

    @property
    def parameters(self):
        return {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "A mathematical expression, e.g. 2 + 3 * 4"
                }
            },
            "required": ["expression"]
        }

    def execute(self, expression: str) -> str:
        import re
        cleaned = re.sub(r"[^0-9+\-*/().\s]", "", expression)
        try:
            result = eval(cleaned)
            return f"Result: {result}"
        except Exception as e:
            return f"Error evaluating expression: {e}"
import os

class FileReaderTool(Tool):
    @property
    def name(self):
        return "file_reader"

    @property
    def description(self):
        return "Read the contents of a file at a given path."

    @property
    def parameters(self):
        return {
            "type": "object",
            "properties": {
                "path": {
                    "type": "string",
                    "description": "The file path to read"
                }
            },
            "required": ["path"]
        }

    def execute(self, path: str) -> str:
        if not os.path.exists(path):
            return f"Error: File not found: {path}"
        try:
            with open(path, "r") as f:
                return f.read()
        except Exception as e:
            return f"Error reading file: {e}"

Step 4: Build Persistent Memory

Agents need to remember conversations between sessions. Here is a simple but effective memory system:

import json
import os
from datetime import datetime

class AgentMemory:
    """Persistent conversation memory stored as JSON."""

    def __init__(self, store_path="memory_store.json"):
        self.store_path = store_path
        self.messages = self._load()

    def _load(self) -> list:
        if os.path.exists(self.store_path):
            with open(self.store_path, "r") as f:
                return json.load(f)
        return []

    def _save(self):
        with open(self.store_path, "w") as f:
            json.dump(self.messages, f, indent=2)

    def add(self, role: str, content: str):
        self.messages.append({
            "role": role,
            "content": content,
            "timestamp": datetime.now().isoformat()
        })
        self._save()

    def get_messages(self, max_turns=20) -> list:
        """Return the most recent messages."""
        return self.messages[-max_turns:]

    def clear(self):
        self.messages = []
        self._save()

The memory auto-saves after every message and loads on startup. The max_turns parameter prevents context window overflow by keeping only recent history.

Step 5: Assemble the Agent

Now bring everything together in the main agent loop:

import json
import os
from openai import OpenAI
from memory import AgentMemory

class Agent:
    def __init__(self, api_key: str):
        self.client = OpenAI(api_key=api_key)
        self.tools = [CalculatorTool(), FileReaderTool()]
        self.memory = AgentMemory()
        self._build_system_prompt()

    def _build_system_prompt(self):
        tool_desc = "\n".join(
            f"- {t.name}: {t.description}" for t in self.tools
        )
        self.system_prompt = (
            "You are a helpful AI assistant. "
            "You have access to these tools:\n"
            f"{tool_desc}\n\n"
            "When you need a tool, respond with ONLY JSON:\n"
            '{"tool": "tool_name", "args": {"key": "value"}}\n\n'
            "Otherwise, respond normally."
        )

    def _parse_tool_call(self, response: str) -> dict | None:
        try:
            return json.loads(response)
        except json.JSONDecodeError:
            return None

    def _execute_tool(self, call: dict) -> str:
        tool_name = call.get("tool")
        args = call.get("args", {})
        for tool in self.tools:
            if tool.name == tool_name:
                return tool.execute(**args)
        return f"Unknown tool: {tool_name}"

    def chat(self, user_input: str) -> str:
        self.memory.add("user", user_input)

        messages = [
            {"role": "system", "content": self.system_prompt},
            *self.memory.get_messages(),
        ]

        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=messages,
        )
        assistant_msg = response.choices[0].message.content

        tool_call = self._parse_tool_call(assistant_msg)
        if tool_call:
            result = self._execute_tool(tool_call)
            self.memory.add("assistant", f"Used tool: {tool_call.get('tool')}")
            self.memory.add("system", f"Tool result: {result}")
            return result

        self.memory.add("assistant", assistant_msg)
        return assistant_msg

if __name__ == "__main__":
    api_key = os.environ.get("OPENAI_API_KEY")
    if not api_key:
        print("Set OPENAI_API_KEY environment variable")
        exit(1)

    agent = Agent(api_key)
    print("Agent ready. Type quit to exit.")
    while True:
        user = input("You: ")
        if user.lower() == "quit":
            break
        reply = agent.chat(user)
        print(f"Agent: {reply}")

Step 6: Run It

Set your API key and start chatting:

export OPENAI_API_KEY="your-key-here"
python agent.py

Try asking:

  • "What is 42 * 17 + 8?" → uses the calculator tool
  • "Read the contents of README.md" → uses the file reader tool
  • "Hello, how are you?" → normal conversation, no tool needed

Comparison: Manual Tools vs Native Function Calling

AspectJSON-in-Text (This Tutorial)Native Function Calling
Setup complexitySimpleRequires API support
ReliabilityGood with clear promptsExcellent
Multi-providerWorks with any LLMProvider-specific
Learning valueHigh — you see the mechanicsLower — abstracted away

Extending the Agent

From here you can add more tools and capabilities:

  1. Web search tool — integrate the DuckDuckGo or Serper API for real-time information
  2. Code execution — use the subprocess module to run shell commands safely
  3. Skill modules — create a skills/ directory where each file defines a capability the agent loads at startup
  4. Multi-agent coordination — run multiple agents that communicate via a message bus

Key Takeaways

  • Tool calling is fundamentally about giving the LLM structured choices and parsing its decisions
  • Persistent memory is just message history saved between turns — keep it simple first
  • Every tool follows the same interface: name, description, parameters, and an execute method
  • Start with text-based JSON tool calling, then migrate to native function-calling APIs when ready

The pattern you learned here is the same foundation used by production frameworks like LangChain, AutoGen, and CrewAI. Understanding the mechanics at this level makes you a better agent developer regardless of which framework you eventually choose.