Experiment Tracking

Every notebook in this course executes inside a Docker container and logs metrics, hyperparameters, and artifacts to Weights & Biases (W&B). This gives you a single dashboard to track training runs, compare experiments, and reproduce results.

Course W&B Workspace

View all executed course notebooks — metrics, plots, and artifacts in one place.

Why experiment tracking matters

Running a notebook once is easy. Running it ten times with different hyperparameters, across different machines, and remembering which combination produced the best result is not. An experiment manager solves this by automatically capturing:

Hyperparameters — learning rate, batch size, optimizer, architecture choices
Metrics — loss curves, accuracy, custom metrics logged at each step or epoch
Artifacts — model checkpoints, generated plots, evaluation outputs
Environment — Python version, package versions, GPU type, runtime duration
Code state — git commit hash and diff at the time of execution

Without this, you end up with spreadsheets, renamed notebook copies, and comments like “this was the good run.” With it, every run is searchable, comparable, and reproducible.

How it works in this course

All course notebooks are registered in a central notebook-database.yml and execute through a Docker-based pipeline:

Notebook executes in Docker

Each notebook runs inside a containerized environment with pinned dependencies, ensuring identical results regardless of your local setup.

W&B logs metrics automatically

The execution pipeline logs run metadata — notebook name, duration, environment, and any metrics or plots the notebook produces — to the W&B project.

Results appear in the dashboard

Every run is visible in the eng-ai-agents workspace where you can filter, compare, and inspect individual executions.

Getting started

1. Create a W&B account

2. Set your API key

Authentication is handled through the .env file in the eng-ai-agents repository. Copy the example and add your key:

cp .env.example .env

Then add your W&B API key (found at wandb.ai/authorize):

# In .env
WANDB_API_KEY=your_api_key_here

The docker-compose.yml loads this file automatically via env_file, so the key is available inside every container. The execution scripts check for WANDB_API_KEY and gracefully skip logging if it is not set — nothing breaks, you just don’t get tracking.

Never commit your .env file to git. The repository’s .gitignore already excludes it.

3. Log from your notebook

Course notebooks include W&B integration with a graceful fallback pattern:

import os
try:
    import wandb
    _wandb_ok = bool(os.environ.get("WANDB_API_KEY"))
except ImportError:
    wandb = None
    _wandb_ok = False

# Later, in the training loop:
if _wandb_ok and wandb is not None:
    _wb_run = wandb.init(
        project="eng-ai-agents",
        name="sgd-polynomial-regression",
        settings=wandb.Settings(init_timeout=120),
    )

for epoch in range(num_epochs):
    loss = train_step()
    if _wandb_ok:
        wandb.log({"epoch": epoch, "loss": loss})

if _wandb_ok:
    wandb.finish()

This pattern ensures notebooks run correctly whether or not W&B is configured.

Using the dashboard

The eng-ai-agents workspace provides several views:

View	Purpose
Runs table	List all executions with sortable columns for metrics, duration, and status
Charts	Visualize loss curves, accuracy, or any logged metric across runs
Artifacts	Browse saved models, datasets, and output files
System metrics	GPU utilization, memory usage, and runtime stats

What gets logged

The execution pipeline (wandb_utils.py) automatically logs for each notebook run:

Run metadata — notebook path, environment, execution duration, date
Images — all PNG plots extracted from cell outputs are uploaded as wandb.Image
Plotly charts — interactive HTML visualizations are uploaded as artifacts
Run grouping — runs are grouped by notebook category (e.g., optimization, transfer-learning) for easy filtering

Comparing runs

Select multiple runs in the table to overlay their metric curves. This is how you answer questions like:

Does Adam converge faster than SGD on this dataset?
How does doubling the learning rate affect final loss?
Which regularization strength gives the best validation accuracy?

Best practices

Name your runs descriptively

Use wandb.init(name="lstm-lr0.001-hidden256") instead of relying on auto-generated names. This makes the runs table immediately readable.

Log hyperparameters as config

Pass a config dictionary to wandb.init(config={...}) so hyperparameters appear as filterable columns in the dashboard.

Log at the right granularity

Log per-epoch for training metrics, per-step only if you need fine-grained debugging. Over-logging slows down training and clutters the dashboard.

Use tags for organization

Add tags like assignment-1, final-project, or baseline to group related runs: wandb.init(tags=["assignment-1", "sgd"]).

W&B in assignments

When submitting assignments that involve training, include a link to your W&B run or workspace view. This lets the TA verify:

The training actually ran (not just copied outputs)
The reported metrics match the logged values
The hyperparameters match your description

Make your W&B project public or share it with the TA’s account so runs are accessible for grading.

Edit this page on GitHub or file an issue.

Semester

Development Environment

Guides

Experiment Tracking

Course W&B Workspace

Why experiment tracking matters

How it works in this course

Getting started

1. Create a W&B account

2. Set your API key

3. Log from your notebook

Using the dashboard

What gets logged

Comparing runs

Best practices

W&B in assignments

Semester

Development Environment

Guides

Course W&B Workspace

​Why experiment tracking matters

​How it works in this course

​Getting started

​1. Create a W&B account

​2. Set your API key

​3. Log from your notebook

​Using the dashboard

​What gets logged

​Comparing runs

​Best practices

​W&B in assignments

Why experiment tracking matters

How it works in this course

Getting started

1. Create a W&B account

2. Set your API key

3. Log from your notebook

Using the dashboard

What gets logged

Comparing runs

Best practices

W&B in assignments