Edge Deployment of Lightweight Agents
Running small models on Raspberry Pi and edge devices for low-latency response.
Edge AI Agents
An edge agent runs directly on the IoT device or local gateway — a Raspberry Pi, Jetson Nano, or industrial PC — instead of in the cloud. Benefits: lower latency (no round trip), works offline, lower bandwidth cost. Trade-off: limited compute and RAM require smaller, more efficient models.
Choosing a Model for Edge Deployment
Edge devices cannot run GPT-4o or Claude Opus. Small models that fit on a Raspberry Pi 5 (8 GB RAM): Phi-3-mini (3.8B params), Gemma-2B, TinyLlama-1.1B. These models, when quantised to 4-bit, require 1–3 GB RAM and run at 5–15 tokens/second on CPU.
# Model size reference for edge selection:
EDGE_MODELS = {
'tinyllama-1.1b-q4': {
'params': '1.1B', 'quantization': 'Q4_K_M',
'ram_gb': 0.8, 'tokens_per_sec_cpu': 15,
'use_case': 'simple classification, keyword detection'
},
'phi-3-mini-q4': {
'params': '3.8B', 'quantization': 'Q4_K_M',
'ram_gb': 2.5, 'tokens_per_sec_cpu': 8,
'use_case': 'reasoning, multi-step decisions'
},
'gemma-2b-q4': {
'params': '2B', 'quantization': 'Q4_K_M',
'ram_gb': 1.5, 'tokens_per_sec_cpu': 10,
'use_case': 'general assistant tasks'
}
}
for name, info in EDGE_MODELS.items():
print(f'{name}: {info["ram_gb"]}GB RAM, '
f'{info["tokens_per_sec_cpu"]} tok/s — {info["use_case"]}')All lessons in this course
- MQTT Protocol for Agent Integration
- Time-Series Data Processing in Agents
- Automated Response to Sensor Events
- Edge Deployment of Lightweight Agents