Running Local Models with Ollama and llama.cpp
Ollama makes it 'docker run' for LLMs; llama.cpp goes deeper with quantization and direct C++ control.
Two Easy Paths
For self-hosting open models on a workstation or server:
- Ollama — simplest, Docker-like UX for LLMs
- llama.cpp — closer to the metal, more control, smaller footprint
Ollama Quick Start
# Install (macOS):
brew install ollama
# Pull a model:
ollama pull llama3.1:8b
# Run interactively:
ollama run llama3.1:8b 'Hello'
# Serve via HTTP (OpenAI-compatible):
ollama serveAll lessons in this course
- Llama, Mistral and Qwen Overview
- Running Local Models with Ollama and llama.cpp
- Function-Calling Open Models (Hermes, Functionary)
- Trade-offs: Latency, Cost, Capability