Trajectory-Based Self-Improvement
Learning from successful and failed action sequences to refine future behavior.
What Is a Trajectory?
A trajectory is the complete sequence of states and actions an agent took from start to finish for a given task. It records not just the final answer but how the agent got there: which tools were called, in what order, with what parameters, and what intermediate results were observed.
Recording a Trajectory
Wrap each agent action in a recorder that captures state before and after. A state includes: the current goal, memory contents, and recent observations. An action includes: tool name, parameters, and result.
from dataclasses import dataclass, field
from datetime import datetime
from typing import Any
@dataclass
class TrajectoryStep:
step_index: int
state_summary: str # short description of world state
action_name: str # tool or reasoning step name
action_params: dict
result: Any
timestamp: str = ''
def __post_init__(self):
if not self.timestamp:
self.timestamp = datetime.utcnow().isoformat()
@dataclass
class Trajectory:
trajectory_id: str
task: str
steps: list = field(default_factory=list)
outcome: str = 'unknown' # 'success', 'failure', 'partial'
final_score: float = 0.0
def add_step(self, step: TrajectoryStep):
self.steps.append(step)
def mark_success(self, score: float = 1.0):
self.outcome = 'success'
self.final_score = score
def mark_failure(self, reason: str = ''):
self.outcome = 'failure'
self.final_score = 0.0All lessons in this course
- Feedback Collection and Storage
- Reflection and Self-Critique Loops
- Trajectory-Based Self-Improvement
- When Self-Improvement Goes Wrong