LLM Steering - aegean.ai

Activation steering is a technique for controlling LLM outputs without fine-tuning. Instead of updating weights, you extract a steering vector from the model’s residual stream — a direction in activation space that corresponds to a target concept — and add it at inference time. The model’s behavior shifts predictably along that direction while all other capabilities remain intact.

How it works

Collect contrastive pairs — gather prompts that do and don’t express the target concept (e.g. “Paris” / neutral)
Extract activations — run both sets through the model and record the residual stream at a chosen layer
Compute the steering vector — take the mean difference between the two activation sets
Apply at inference — add α × steering_vector to the residual stream during the forward pass; scale α to control intensity

Demo

The Eiffel Tower Llama space demonstrates this interactively: a steering vector derived from Eiffel Tower–related activations is injected into Llama, progressively shifting its completions toward Paris-related content.

Lab

Lab section under development. Track progress in AURA-655.

The lab will walk you through:

Extracting a concept steering vector from a small open model (Llama 3.2 1B)
Applying it at varying strengths (α) and observing output drift
Visualising the activation geometry using PCA

​How it works

​Demo

​Lab

​Further reading

How it works

Demo

Lab

Further reading