Skip to main content
Every notebook in this course executes inside a Docker container and logs metrics, hyperparameters, and artifacts to Weights & Biases (W&B). This gives you a single dashboard to track training runs, compare experiments, and reproduce results.

Course W&B Workspace

View all executed course notebooks — metrics, plots, and artifacts in one place.

Why experiment tracking matters

Running a notebook once is easy. Running it ten times with different hyperparameters, across different machines, and remembering which combination produced the best result is not. An experiment manager solves this by automatically capturing:
  • Hyperparameters — learning rate, batch size, optimizer, architecture choices
  • Metrics — loss curves, accuracy, custom metrics logged at each step or epoch
  • Artifacts — model checkpoints, generated plots, evaluation outputs
  • Environment — Python version, package versions, GPU type, runtime duration
  • Code state — git commit hash and diff at the time of execution
Without this, you end up with spreadsheets, renamed notebook copies, and comments like “this was the good run.” With it, every run is searchable, comparable, and reproducible.

How it works in this course

All course notebooks are registered in a central notebook-database.yml and execute through a Docker-based pipeline:
1

Notebook executes in Docker

Each notebook runs inside a containerized environment with pinned dependencies, ensuring identical results regardless of your local setup.
2

W&B logs metrics automatically

The execution pipeline logs run metadata — notebook name, duration, environment, and any metrics or plots the notebook produces — to the W&B project.
3

Results appear in the dashboard

Every run is visible in the eng-ai-agents workspace where you can filter, compare, and inspect individual executions.

Getting started

1. Create a W&B account

Sign up at wandb.ai using your university email. The free tier is sufficient for all course work.

2. Set your API key

Authentication is handled through the .env file in the eng-ai-agents repository. Copy the example and add your key:
cp .env.example .env
Then add your W&B API key (found at wandb.ai/authorize):
# In .env
WANDB_API_KEY=your_api_key_here
The docker-compose.yml loads this file automatically via env_file, so the key is available inside every container. The execution scripts check for WANDB_API_KEY and gracefully skip logging if it is not set — nothing breaks, you just don’t get tracking.
Never commit your .env file to git. The repository’s .gitignore already excludes it.

3. Log from your notebook

Course notebooks include W&B integration with a graceful fallback pattern:
import os
try:
    import wandb
    _wandb_ok = bool(os.environ.get("WANDB_API_KEY"))
except ImportError:
    wandb = None
    _wandb_ok = False

# Later, in the training loop:
if _wandb_ok and wandb is not None:
    _wb_run = wandb.init(
        project="eng-ai-agents",
        name="sgd-polynomial-regression",
        settings=wandb.Settings(init_timeout=120),
    )

for epoch in range(num_epochs):
    loss = train_step()
    if _wandb_ok:
        wandb.log({"epoch": epoch, "loss": loss})

if _wandb_ok:
    wandb.finish()
This pattern ensures notebooks run correctly whether or not W&B is configured.

Using the dashboard

The eng-ai-agents workspace provides several views:
ViewPurpose
Runs tableList all executions with sortable columns for metrics, duration, and status
ChartsVisualize loss curves, accuracy, or any logged metric across runs
ArtifactsBrowse saved models, datasets, and output files
System metricsGPU utilization, memory usage, and runtime stats

What gets logged

The execution pipeline (wandb_utils.py) automatically logs for each notebook run:
  • Run metadata — notebook path, environment, execution duration, date
  • Images — all PNG plots extracted from cell outputs are uploaded as wandb.Image
  • Plotly charts — interactive HTML visualizations are uploaded as artifacts
  • Run grouping — runs are grouped by notebook category (e.g., optimization, transfer-learning) for easy filtering

Comparing runs

Select multiple runs in the table to overlay their metric curves. This is how you answer questions like:
  • Does Adam converge faster than SGD on this dataset?
  • How does doubling the learning rate affect final loss?
  • Which regularization strength gives the best validation accuracy?

Best practices

Use wandb.init(name="lstm-lr0.001-hidden256") instead of relying on auto-generated names. This makes the runs table immediately readable.
Pass a config dictionary to wandb.init(config={...}) so hyperparameters appear as filterable columns in the dashboard.
Log per-epoch for training metrics, per-step only if you need fine-grained debugging. Over-logging slows down training and clutters the dashboard.
Add tags like assignment-1, final-project, or baseline to group related runs: wandb.init(tags=["assignment-1", "sgd"]).

W&B in assignments

When submitting assignments that involve training, include a link to your W&B run or workspace view. This lets the TA verify:
  1. The training actually ran (not just copied outputs)
  2. The reported metrics match the logged values
  3. The hyperparameters match your description
Make your W&B project public or share it with the TA’s account so runs are accessible for grading.