Skip to main content
Andrej Karpathy’s ReinforceJS demo runs SARSA and Q-learning on a gridworld MDP directly in your browser. Use it as a companion to the Monte Carlo / temporal-difference prediction pages: watch value estimates update online from sampled transitions, compare on-policy (SARSA) and off-policy (Q-learning) control, and see the policy emerge without any model of the environment.
Source: cs.stanford.edu/people/karpathy/reinforcejs/gridworld_td.html. If the embedded frame above is blocked by your browser or network, open the link in a new tab.