0PricingLogin
AI Engineering Academy · Lesson

Langfuse for Model-Agnostic Observability

Integrate Langfuse as an open-source alternative that works with any LLM provider, capture custom spans for retrieval and tool calls, and set up cost tracking dashboards.

Langfuse: Open-Source LLM Observability

Langfuse is an open-source observability platform for LLM applications that works with any model provider: OpenAI, Anthropic, Mistral, local models via Ollama, or your own fine-tuned model. Unlike LangSmith which ties you to LangChain, Langfuse integrates with any Python code through a simple SDK. You can self-host Langfuse for free or use the managed cloud at cloud.langfuse.com.

# pip install langfuse
from langfuse import Langfuse

langfuse = Langfuse(
    public_key='pk-lf-...',
    secret_key='sk-lf-...',
    host='https://cloud.langfuse.com'  # or your self-hosted URL
)

print('Langfuse connected:', langfuse.auth_check())

Traces, Spans, and Generations

Langfuse uses a hierarchical data model with three levels. A trace represents one end-to-end user request. Within a trace, spans represent individual processing steps (retrieval, preprocessing, tool calls). Generations are a special type of span specifically for LLM calls: they capture the model, prompt tokens, completion tokens, and cost in a structured way that enables cost dashboards and quality metrics.

from langfuse import Langfuse

langfuse = Langfuse()

# Create a trace for one user request
trace = langfuse.trace(
    name='rag-query',
    user_id='user_123',
    session_id='session_abc',
    tags=['production', 'rag']
)

# Add a retrieval span
retrieval_span = trace.span(
    name='vector-retrieval',
    input={'query': 'What is RAG?'}
)
chunks = vector_db.search('What is RAG?')
retrieval_span.end(output={'chunks': [c['text'][:100] for c in chunks]})

# Add an LLM generation
generation = trace.generation(
    name='answer-generation',
    model='gpt-4o',
    model_parameters={'temperature': 0.0},
    input=[{'role': 'user', 'content': 'Context: ...\nQuestion: What is RAG?'}]
)
response = openai_client.chat.completions.create(model='gpt-4o', messages=[...])
generation.end(
    output=response.choices[0].message.content,
    usage={'input': response.usage.prompt_tokens, 'output': response.usage.completion_tokens}
)

All lessons in this course

  1. Why LLM Apps Are Hard to Debug
  2. Tracing with LangSmith
  3. Langfuse for Model-Agnostic Observability
  4. Alerting on Latency, Cost, and Quality Degradation
← Back to AI Engineering Academy