Course W&B Workspace
View all executed course notebooks — metrics, plots, and artifacts in one place.
Why experiment tracking matters
Running a notebook once is easy. Running it ten times with different hyperparameters, across different machines, and remembering which combination produced the best result is not. An experiment manager solves this by automatically capturing:- Hyperparameters — learning rate, batch size, optimizer, architecture choices
- Metrics — loss curves, accuracy, custom metrics logged at each step or epoch
- Artifacts — model checkpoints, generated plots, evaluation outputs
- Environment — Python version, package versions, GPU type, runtime duration
- Code state — git commit hash and diff at the time of execution
How it works in this course
All course notebooks are registered in a centralnotebook-database.yml and execute through a Docker-based pipeline:
Notebook executes in Docker
Each notebook runs inside a containerized environment with pinned dependencies, ensuring identical results regardless of your local setup.
W&B logs metrics automatically
The execution pipeline logs run metadata — notebook name, duration, environment, and any metrics or plots the notebook produces — to the W&B project.
Results appear in the dashboard
Every run is visible in the eng-ai-agents workspace where you can filter, compare, and inspect individual executions.
Getting started
1. Create a W&B account
Sign up at wandb.ai using your university email. The free tier is sufficient for all course work.2. Set your API key
Authentication is handled through the.env file in the eng-ai-agents repository. Copy the example and add your key:
docker-compose.yml loads this file automatically via env_file, so the key is available inside every container. The execution scripts check for WANDB_API_KEY and gracefully skip logging if it is not set — nothing breaks, you just don’t get tracking.
3. Log from your notebook
Course notebooks include W&B integration with a graceful fallback pattern:Using the dashboard
The eng-ai-agents workspace provides several views:| View | Purpose |
|---|---|
| Runs table | List all executions with sortable columns for metrics, duration, and status |
| Charts | Visualize loss curves, accuracy, or any logged metric across runs |
| Artifacts | Browse saved models, datasets, and output files |
| System metrics | GPU utilization, memory usage, and runtime stats |
What gets logged
The execution pipeline (wandb_utils.py) automatically logs for each notebook run:
- Run metadata — notebook path, environment, execution duration, date
- Images — all PNG plots extracted from cell outputs are uploaded as
wandb.Image - Plotly charts — interactive HTML visualizations are uploaded as artifacts
- Run grouping — runs are grouped by notebook category (e.g.,
optimization,transfer-learning) for easy filtering
Comparing runs
Select multiple runs in the table to overlay their metric curves. This is how you answer questions like:- Does Adam converge faster than SGD on this dataset?
- How does doubling the learning rate affect final loss?
- Which regularization strength gives the best validation accuracy?
Best practices
Name your runs descriptively
Name your runs descriptively
Use
wandb.init(name="lstm-lr0.001-hidden256") instead of relying on auto-generated names. This makes the runs table immediately readable.Log hyperparameters as config
Log hyperparameters as config
Pass a config dictionary to
wandb.init(config={...}) so hyperparameters appear as filterable columns in the dashboard.Log at the right granularity
Log at the right granularity
Log per-epoch for training metrics, per-step only if you need fine-grained debugging. Over-logging slows down training and clutters the dashboard.
Use tags for organization
Use tags for organization
W&B in assignments
When submitting assignments that involve training, include a link to your W&B run or workspace view. This lets the TA verify:- The training actually ran (not just copied outputs)
- The reported metrics match the logged values
- The hyperparameters match your description

