AI Agents · Lesson

Edge Deployment of Lightweight Agents

Running small models on Raspberry Pi and edge devices for low-latency response.

Edge AI Agents

An edge agent runs directly on the IoT device or local gateway — a Raspberry Pi, Jetson Nano, or industrial PC — instead of in the cloud. Benefits: lower latency (no round trip), works offline, lower bandwidth cost. Trade-off: limited compute and RAM require smaller, more efficient models.

Choosing a Model for Edge Deployment

Edge devices cannot run GPT-4o or Claude Opus. Small models that fit on a Raspberry Pi 5 (8 GB RAM): Phi-3-mini (3.8B params), Gemma-2B, TinyLlama-1.1B. These models, when quantised to 4-bit, require 1–3 GB RAM and run at 5–15 tokens/second on CPU.

# Model size reference for edge selection:
EDGE_MODELS = {
    'tinyllama-1.1b-q4': {
        'params': '1.1B', 'quantization': 'Q4_K_M',
        'ram_gb': 0.8, 'tokens_per_sec_cpu': 15,
        'use_case': 'simple classification, keyword detection'
    },
    'phi-3-mini-q4': {
        'params': '3.8B', 'quantization': 'Q4_K_M',
        'ram_gb': 2.5, 'tokens_per_sec_cpu': 8,
        'use_case': 'reasoning, multi-step decisions'
    },
    'gemma-2b-q4': {
        'params': '2B', 'quantization': 'Q4_K_M',
        'ram_gb': 1.5, 'tokens_per_sec_cpu': 10,
        'use_case': 'general assistant tasks'
    }
}

for name, info in EDGE_MODELS.items():
    print(f'{name}: {info["ram_gb"]}GB RAM, '
          f'{info["tokens_per_sec_cpu"]} tok/s — {info["use_case"]}')

All lessons in this course

← Back to AI Agents