Langfuse for Model-Agnostic Observability
Integrate Langfuse as an open-source alternative that works with any LLM provider, capture custom spans for retrieval and tool calls, and set up cost tracking dashboards.
Langfuse: Open-Source LLM Observability
Langfuse is an open-source observability platform for LLM applications that works with any model provider: OpenAI, Anthropic, Mistral, local models via Ollama, or your own fine-tuned model. Unlike LangSmith which ties you to LangChain, Langfuse integrates with any Python code through a simple SDK. You can self-host Langfuse for free or use the managed cloud at cloud.langfuse.com.
# pip install langfuse
from langfuse import Langfuse
langfuse = Langfuse(
public_key='pk-lf-...',
secret_key='sk-lf-...',
host='https://cloud.langfuse.com' # or your self-hosted URL
)
print('Langfuse connected:', langfuse.auth_check())Traces, Spans, and Generations
Langfuse uses a hierarchical data model with three levels. A trace represents one end-to-end user request. Within a trace, spans represent individual processing steps (retrieval, preprocessing, tool calls). Generations are a special type of span specifically for LLM calls: they capture the model, prompt tokens, completion tokens, and cost in a structured way that enables cost dashboards and quality metrics.
from langfuse import Langfuse
langfuse = Langfuse()
# Create a trace for one user request
trace = langfuse.trace(
name='rag-query',
user_id='user_123',
session_id='session_abc',
tags=['production', 'rag']
)
# Add a retrieval span
retrieval_span = trace.span(
name='vector-retrieval',
input={'query': 'What is RAG?'}
)
chunks = vector_db.search('What is RAG?')
retrieval_span.end(output={'chunks': [c['text'][:100] for c in chunks]})
# Add an LLM generation
generation = trace.generation(
name='answer-generation',
model='gpt-4o',
model_parameters={'temperature': 0.0},
input=[{'role': 'user', 'content': 'Context: ...\nQuestion: What is RAG?'}]
)
response = openai_client.chat.completions.create(model='gpt-4o', messages=[...])
generation.end(
output=response.choices[0].message.content,
usage={'input': response.usage.prompt_tokens, 'output': response.usage.completion_tokens}
)All lessons in this course
- Why LLM Apps Are Hard to Debug
- Tracing with LangSmith
- Langfuse for Model-Agnostic Observability
- Alerting on Latency, Cost, and Quality Degradation