0PricingLogin
AI Agents · Lesson

Running Local Models with Ollama and llama.cpp

Ollama makes it 'docker run' for LLMs; llama.cpp goes deeper with quantization and direct C++ control.

Two Easy Paths

For self-hosting open models on a workstation or server:

  • Ollama — simplest, Docker-like UX for LLMs
  • llama.cpp — closer to the metal, more control, smaller footprint

Ollama Quick Start

# Install (macOS):
brew install ollama

# Pull a model:
ollama pull llama3.1:8b

# Run interactively:
ollama run llama3.1:8b 'Hello'

# Serve via HTTP (OpenAI-compatible):
ollama serve

All lessons in this course

  1. Llama, Mistral and Qwen Overview
  2. Running Local Models with Ollama and llama.cpp
  3. Function-Calling Open Models (Hermes, Functionary)
  4. Trade-offs: Latency, Cost, Capability
← Back to AI Agents