Temporal-difference learning on a gridworld (interactive)

Andrej Karpathy’s ReinforceJS demo runs SARSA and Q-learning on a gridworld MDP directly in your browser. Use it as a companion to the Monte Carlo / temporal-difference prediction pages: watch value estimates update online from sampled transitions, compare on-policy (SARSA) and off-policy (Q-learning) control, and see the policy emerge without any model of the environment.