AI Agents · Lesson

Running Local Models with Ollama and llama.cpp

Ollama makes it 'docker run' for LLMs; llama.cpp goes deeper with quantization and direct C++ control.

Two Easy Paths

For self-hosting open models on a workstation or server:

Ollama — simplest, Docker-like UX for LLMs
llama.cpp — closer to the metal, more control, smaller footprint

Ollama Quick Start

# Install (macOS):
brew install ollama

# Pull a model:
ollama pull llama3.1:8b

# Run interactively:
ollama run llama3.1:8b 'Hello'

# Serve via HTTP (OpenAI-compatible):
ollama serve

All lessons in this course

Llama, Mistral and Qwen Overview
Running Local Models with Ollama and llama.cpp
Function-Calling Open Models (Hermes, Functionary)
Trade-offs: Latency, Cost, Capability

← Back to AI Agents