Learn AI with Python · Lesson

Actor-Critic Methods (A2C)

Advantage function, actor (policy) + critic (value), synchronous A2C implementation.

Combining Policy and Value

Actor-Critic methods merge two ideas: an actor (a policy that chooses actions) and a critic (a value function that judges them). The critic gives lower-variance feedback than raw returns, stabilizing learning.

The Actor

The actor is the policy network. It outputs action logits (or probabilities) for the current state, exactly like in REINFORCE, and decides what to do.

class ActorCritic(nn.Module):
    def __init__(self, obs_dim, n_actions):
        super().__init__()
        self.shared = nn.Sequential(nn.Linear(obs_dim, 128), nn.ReLU())
        self.actor = nn.Linear(128, n_actions)

All lessons in this course

Policy Gradient Methods: REINFORCE
Actor-Critic Methods (A2C)
Proximal Policy Optimization (PPO)
Custom Gymnasium Environments

← Back to Learn AI with Python