Skip to main content
Open In Colab

Self-supervised video representations with TCN \rightarrow TCC using aegean-ai/tcc

This notebook is designed to read like a tutorial and a course assignment at the same time. It focuses on one question:
Can a self-supervised video representation learn the latent phase of a task purely from temporal structure, without robot control, action labels, or transcription?
The answer developed across the two Google Research papers is:
  1. TCN learns by enforcing time-based contrastive alignment under strong synchronization assumptions.
  2. TCC generalizes this idea by enforcing temporal cycle-consistency, which is more robust to variations in execution speed and alignment.
In this notebook you will:
  • study the conceptual evolution from TCN to TCC
  • train the PyTorch rewrite in aegean-ai/tcc
  • extract frame embeddings
  • visualize trajectories with PCA, t-SNE, and UMAP
  • segment action sequences using representation geometry

Papers

Repository used in this notebook

  • https://github.com/aegean-ai/tcc
    This notebook assumes the main branch and the current PyTorch package layout under src/tcc/.

What you should learn

By the end, you should be able to explain why TCN and TCC are related but not identical:
  • TCN: metric alignment with synchronized positives
  • TCC: structural temporal alignment via cycles
A useful mental model is:
  • TCN says: “frames at the same time index should be close.”
  • TCC says: “if I map from one sequence to another and back, I should return to the same temporal phase.”

1. Theory recap: from TCN to TCC

1.1 TCN: contrastive temporal alignment

TCN learns an embedding zt=fθ(It)z_t = f_\theta(I_t) so that synchronized frames from different views become neighbors in feature space. A canonical triplet-style loss is: LTCN=max(0,  f(Ita)f(Itb)22f(Ita)f(Itb)22+α).\mathcal{L}_{\mathrm{TCN}} = \max\left( 0,\; \|f(I_t^a)-f(I_t^b)\|_2^2 - \|f(I_t^a)-f(I_{t'}^b)\|_2^2 + \alpha \right). Interpretation:
  • anchor: frame ItaI_t^a
  • positive: synchronized frame ItbI_t^b
  • negative: mismatched-time frame ItbI_{t'}^b
This already encodes an important idea: time is supervision. But TCN assumes that corresponding frames are available at matching time indices, which is a strong assumption.

1.2 Why TCN is not enough

Suppose two people perform the same pouring task:
  • one moves slowly
  • one moves quickly
  • one pauses before tilting
  • one starts tilting earlier
Then the frame with semantic phase “tilt begins” is not guaranteed to occur at the same time index in both videos. So absolute time matching becomes fragile.

1.3 TCC: align temporal structure, not raw clock time

TCC keeps the idea that embeddings should reflect task progression, but replaces hard synchronized matching with cycle consistency. Conceptual intuition. Given frame ii in sequence AA, map it to the most corresponding frame in sequence BB: j=argminkf(IiA)f(IkB).j = \arg\min_k \|f(I_i^A)-f(I_k^B)\|. Then map back from sequence BB to sequence AA: i=argminlf(IjB)f(IlA).i' = \arg\min_l \|f(I_j^B)-f(I_l^A)\|. TCC encourages iii' \approx i. Differentiable training loss. The hard argmin above is not differentiable, so the actual TCC loss replaces it with a soft nearest-neighbor formulation. For frame ii in sequence AA, define a soft correspondence distribution over frames in sequence BB: βk(i)=exp(f(IiA)f(IkB)2/τ)kexp(f(IiA)f(IkB)2/τ)\beta_k^{(i)} = \frac{\exp(-\|f(I_i^A) - f(I_k^B)\|^2 / \tau)}{\sum_{k'} \exp(-\|f(I_i^A) - f(I_{k'}^B)\|^2 / \tau)} where τ\tau is a temperature parameter. The cycle-back distribution is computed analogously, and the loss is the cross-entropy between the back-mapped distribution and a target concentrated at the original index ii. This makes the entire cycle differentiable and trainable with standard gradient descent. Conceptually:
  • TCN aligns absolute timestamps
  • TCC aligns latent phase structure
This is why TCC is more appropriate when demonstrations are semantically similar but temporally warped.

1.4 What the embedding should look like on pouring

If TCC works, then the learned trajectory in embedding space should behave like a latent phase variable:
  • early reach frames cluster near other early reach frames
  • grasp transitions appear near one another
  • tilt and pour form coherent regions
  • embeddings from different videos should trace similar temporal paths
That is the premise you will test below.

2. How this notebook and the aegean-ai/tcc repo work together

This notebook is not a standalone script. It is a guided analysis layer that drives the aegean-ai/tcc PyTorch package. The repo provides the training loop, model definitions, dataset utilities, and evaluation code. The notebook provides the experimental protocol: configuring runs, extracting embeddings, and visualizing results.

Two supported environments

Dev container (recommended)Google Colab
GPULocal NVIDIA GPU via DockerColab T4/A100 runtime
Package manageruv (pre-installed in container)pip (Colab default)
Setup effortmake start — one commandClone + pip install in notebook cells
PersistenceFull local diskSession-scoped (data lost on disconnect)
Best forFull sweep, large runsQuick experiments, no local GPU
Choose one environment and follow the corresponding setup path in Section 3.

Workflow overview

┌──────────────────────────────────────────────────────────────────┐
│  This notebook (analysis layer)                                  │
│                                                                  │
│  1. Set up environment (dev container OR Colab)                  │
│  2. Prepare the pouring dataset                                  │
│  3. Configure training via tcc.config.get_default_config()       │
│  4. Launch training via tcc.train.train(cfg)                     │
│  5. Load checkpoints via tcc.train.load_checkpoint()             │
│  6. Extract embeddings via tcc.evaluate.get_embeddings_dataset() │
│  7. Visualize and segment (PCA, UMAP, KMeans — notebook code)   │
└──────────────────────────────────────────────────────────────────┘
         │                        ▲
         │  function calls        │  returns tensors,
         ▼                        │  checkpoints, configs
┌──────────────────────────────────────────────────────────────────┐
│  aegean-ai/tcc  (installed as editable package)                  │
│                                                                  │
│  src/tcc/                                                        │
│  ├── config.py          TCCConfig dataclass + get_default_config │
│  ├── train.py           Training loop, checkpoint save/load      │
│  ├── evaluate.py        Embedding extraction, eval metrics       │
│  ├── datasets.py        DataConfig, create_dataset()             │
│  ├── models.py          ResNet backbone + embedding head         │
│  ├── alignment.py       TCC alignment algorithm                  │
│  ├── losses.py          Cycle-consistency loss                   │
│  └── algos/             Algorithm registry (tcc, tcn, sal, …)    │
│                                                                  │
│  configs/                                                        │
│  └── default.yaml       Default hyperparameters                  │
│                                                                  │
│  scripts/                                                        │
│  └── download_pouring_data.sh                                    │
│                                                                  │
│  src/tcc/dataset_preparation/                                    │
│  ├── videos_to_dataset.py    Raw videos → image folders          │
│  ├── images_to_dataset.py    Images → dataset structure          │
│  └── visualize_dataset.py    Inspect prepared data               │
└──────────────────────────────────────────────────────────────────┘

What you modify vs. what you use as-is

LayerYou modifyYou use as-is
NotebookEmbedding dimension, iteration count, analysis parameters (kk, projection method)Visualization and segmentation code
Repo configmodel.conv_embedder.embedding_size, train.max_iters, logdirEverything else in configs/default.yaml
Repo codeNothing — treat as a librarytrain.py, evaluate.py, datasets.py, models.py

3. Environment setup

Choose one of the two paths below. Both result in a working import tcc with GPU access.
The repo ships a complete Docker-based development environment with GPU support, uv, and VS Code integration. Prerequisites: Docker with NVIDIA Container Toolkit, VS Code with Dev Containers extension. Steps:
  1. Clone the repo locally:
    git clone https://github.com/aegean-ai/tcc && cd tcc
    
  2. Copy the environment file:
    cp .env.example .env
    # Edit .env to add WANDB_API_KEY and/or HF_TOKEN if needed
    
  3. Open in VS Code → “Reopen in Container” (or run docker compose up -d manually).
  4. Inside the container, run:
    make start
    
    This creates a .venv with uv, installs the package in editable mode, and registers a Jupyter kernel.
  5. Open this notebook in VS Code or JupyterLab (port 8888) and select the “Python 3 (tcc)” kernel.
Key details:
  • Base image: pytorch/pytorch:2.7.1-cuda12.8-cudnn9-runtime
  • Package manager: uv (not pip) — the Makefile handles all uv calls
  • Python: whatever 3.11+ is in the container (typically from conda)
  • Workspace: /workspaces/tcc
  • TensorBoard: port 6006
Installing extra notebook dependencies (matplotlib, umap-learn, etc.):
make install-notebooks

Path B: Google Colab (quick start, no local GPU needed)

Use this path if you do not have a local GPU or want a fast start. Colab sessions are ephemeral — save checkpoints to Google Drive to avoid losing training results. Steps:
  1. In a Colab notebook, enable GPU: Runtime → Change runtime type → T4 GPU.
  2. Run the clone and install cells below (Section 3.1–3.2).
  3. Colab uses pip — the %pip install commands handle everything.
Limitations:
  • Session timeout erases all local files. Mount Google Drive for persistence:
    from google.colab import drive
    drive.mount('/content/drive')
    # Point EXPERIMENT_ROOT and DATA_ROOT to /content/drive/MyDrive/tcc/
    
  • Colab’s default Python may differ from 3.11 — the package should still install but is only tested on 3.11–3.12.

Python version requirement

The repo requires Python ≥3.11, <3.13 (pyproject.toml). The dev container satisfies this automatically. On Colab, check with !python --version.
import sys, platform, os, pathlib

print("Python:", sys.version)
print("Platform:", platform.platform())
print("Working directory:", os.getcwd())
Python: 3.11.13 | packaged by conda-forge | (main, Jun  4 2025, 14:48:23) [GCC 13.3.0]
Platform: Linux-6.8.0-101-generic-x86_64-with-glibc2.35
Working directory: /workspaces/tcc

3.1 Clone the repository (Colab / Path B only)

If you are using the dev container (Path A), skip this — the repo is already your workspace at /workspaces/tcc.
import subprocess, pathlib

REPO_URL = "https://github.com/aegean-ai/tcc"
REPO_DIR = pathlib.Path("tcc")

if not REPO_DIR.exists():
    subprocess.run(["git", "clone", REPO_URL, str(REPO_DIR)], check=False)
else:
    print("Repository already exists:", REPO_DIR)

print("Repo dir exists:", REPO_DIR.exists())
Repository already exists: tcc
Repo dir exists: True

3.2 Install the package (Colab / Path B only)

If you are using the dev container (Path A), skip this — make start already installed the package. Run make install-notebooks if you need matplotlib/umap-learn.
# Colab / Path B only — uncomment and run these lines:
# %pip install -e ./tcc
# %pip install matplotlib scikit-learn umap-learn tqdm pyyaml

# Dev container / Path A — these are already installed.
# If you need notebook extras, run in a terminal: make install-notebooks

import importlib
try:
    importlib.import_module("tcc")
    print("tcc package is available.")
except ModuleNotFoundError:
    print("tcc not found. Follow the install instructions for your environment (Path A or B).")
tcc package is available.

3.3 Quick repository inspection

Verify the repo structure. In the dev container the repo root is /workspaces/tcc; on Colab it is the cloned tcc/ directory.
import os

# Detect environment: dev container vs Colab
REPO_ROOT = pathlib.Path("/workspaces/tcc") if pathlib.Path("/workspaces/tcc/src/tcc").exists() else REPO_DIR

def walk_top(path, max_depth=2):
    base = os.path.abspath(path)
    label = os.path.basename(base)
    for root, dirs, files in os.walk(base):
        depth = root[len(base):].count(os.sep)
        if depth <= max_depth:
            print(root.replace(base, label))
            for f in files[:10]:
                print("   ", f)

walk_top(str(REPO_ROOT), max_depth=2)
tcc
    uv.lock
    AGENTS.md
    docker-compose.yml
    CLAUDE.md
    .env.example
    Makefile
    .env
    .notebook-target.yml
    .gitignore
    pyproject.toml
tcc/.git
    index
    packed-refs
    FETCH_HEAD
    description
    HEAD
    ORIG_HEAD
    config
    COMMIT_EDITMSG
tcc/.git/info
    exclude
tcc/.git/logs
    HEAD
tcc/.git/objects
tcc/.git/hooks
    post-update.sample
    pre-commit.sample
    update.sample
    fsmonitor-watchman.sample
    applypatch-msg.sample
    push-to-checkout.sample
    pre-push.sample
    sendemail-validate.sample
    prepare-commit-msg.sample
    commit-msg.sample
tcc/.git/refs
    stash
tcc/.git/branches
tcc/docs
    architecture.md
tcc/.claude
    settings.local.json
tcc/tests
    __init__.py
    test_config.py
    test_train.py
    test_models.py
    test_evaluation.py
    test_datasets.py
    test_losses.py
    test_dataset_preparation.py
    test_algos.py
tcc/tests/__pycache__
    test_datasets.cpython-312-pytest-9.0.2.pyc
    test_losses.cpython-311-pytest-9.0.2.pyc
    test_models.cpython-311-pytest-9.0.2.pyc
    test_config.cpython-311-pytest-9.0.2.pyc
    test_algos.cpython-312-pytest-9.0.2.pyc
    test_datasets.cpython-311-pytest-9.0.2.pyc
    __init__.cpython-312.pyc
    test_losses.cpython-312-pytest-9.0.2.pyc
    test_models.cpython-312-pytest-9.0.2.pyc
    __init__.cpython-311.pyc
tcc/.mypy_cache
    .gitignore
    CACHEDIR.TAG
tcc/.mypy_cache/3.11
    enum.data.json
    subprocess.meta.json
    ast.meta.json
    sre_constants.meta.json
    abc.data.json
    posixpath.meta.json
    _sitebuiltins.meta.json
    genericpath.meta.json
    _collections_abc.meta.json
    _frozen_importlib_external.data.json
tcc/.mypy_cache/3.10
    enum.data.json
    socket.meta.json
    subprocess.meta.json
    ast.meta.json
    _thread.meta.json
    sre_constants.meta.json
    warnings.data.json
    __future__.meta.json
    signal.data.json
    abc.data.json
tcc/notebooks
    notebook-database.yml
tcc/notebooks/self-supervised
    tcc_pouring_tutorial_aegean_main-executed.ipynb
    tcc_pouring_tutorial_aegean_main.ipynb
tcc/scripts
    download_pouring_data.sh
    execute_notebook.py
tcc/runs_tutorial
tcc/data
tcc/data/pouring
    .gitattributes
    README.md
tcc/data/pouring_processed
tcc/configs
    demo.yaml
    default.yaml
tcc/src
tcc/src/tcc
    __init__.py
    losses.py
    alignment.py
    config.py
    stochastic_alignment.py
    models.py
    deterministic_alignment.py
    train.py
    datasets.py
    evaluate.py
tcc/.venv
    pyvenv.cfg
    .lock
    .gitignore
    CACHEDIR.TAG
tcc/.venv/share
tcc/.venv/bin
    tensorboard
    hf
    activate_this.py
    pyftsubset
    activate.ps1
    f2py
    ipython3
    normalizer
    papermill
    tqdm
tcc/.venv/lib
tcc/tcc
    uv.lock
    AGENTS.md
    docker-compose.yml
    CLAUDE.md
    .env.example
    Makefile
    .gitignore
    pyproject.toml
    README.md
tcc/tcc/.git
    index
    packed-refs
    description
    HEAD
    config
tcc/tcc/docs
    architecture.md
tcc/tcc/tests
    __init__.py
    test_config.py
    test_train.py
    test_models.py
    test_evaluation.py
    test_datasets.py
    test_losses.py
    test_dataset_preparation.py
    test_algos.py
tcc/tcc/scripts
    download_pouring_data.sh
tcc/tcc/configs
    demo.yaml
    default.yaml
tcc/tcc/src
tcc/tcc/.beads
    config.yaml
    dolt-monitor.pid
    metadata.json
    dolt-server.activity
    .gitignore
    interactions.jsonl
    README.md
    dolt-server.port
tcc/tcc/docker
    Dockerfile.torch.dev.gpu
tcc/tcc/.devcontainer
    devcontainer.json
tcc/.beads
    config.yaml
    dolt-monitor.pid
    .local_version
    metadata.json
    last-touched
    dolt-config.log
    dolt-server.lock
    dolt-server.log
    .gitignore
    interactions.jsonl
tcc/.beads/dolt
    config.yaml
tcc/.beads/hooks
    pre-commit
    post-checkout
    prepare-commit-msg
    pre-push
    post-merge
tcc/.beads/backup
    config.jsonl
    events.jsonl
    backup_state.json
    comments.jsonl
    dependencies.jsonl
    issues.jsonl
    labels.jsonl
tcc/docker
    Dockerfile.torch.dev.gpu
tcc/.pytest_cache
    .gitignore
    CACHEDIR.TAG
    README.md
tcc/.pytest_cache/v
tcc/.devcontainer
    devcontainer.json

4. Data: the pouring dataset

The multiview pouring dataset is hosted on HuggingFace at sermanet/multiview-pouring. It contains TFRecord files with multi-view video sequences of pouring tasks.

Download from HuggingFace

Use huggingface_hub to download the dataset files. The code cell below clones the dataset repository into data/pouring/. This is the recommended approach — it downloads all TFRecord files and the recombination script needed for one split file.

Expected directory layout

After download and conversion, the dataset root must have this structure:
data/pouring_processed/pouring/
├── train/
│   ├── video_001/
│   │   ├── frame_0000.png
│   │   ├── frame_0001.png
│   │   └── ...
│   ├── video_002/
│   │   └── ...
│   └── ...
└── val/
    ├── video_050/
    │   └── ...
    └── ...
Each video is a directory of sequentially numbered frames. The PyTorch create_dataset function expects this layout — it discovers videos by listing subdirectories under train/ or val/, then loads frames in filename-sorted order.
DATA_ROOT = pathlib.Path("data")
RAW_POURING_ROOT = DATA_ROOT / "pouring"
PROCESSED_POURING_ROOT = DATA_ROOT / "pouring_processed"

RAW_POURING_ROOT.mkdir(parents=True, exist_ok=True)
PROCESSED_POURING_ROOT.mkdir(parents=True, exist_ok=True)

print("Raw data dir:", RAW_POURING_ROOT.resolve())
print("Processed data dir:", PROCESSED_POURING_ROOT.resolve())
Raw data dir: /workspaces/tcc/data/pouring
Processed data dir: /workspaces/tcc/data/pouring_processed

4.1 Download from HuggingFace

The dataset is hosted at sermanet/multiview-pouring and contains TFRecord files organized into train/, val/, and test/ splits. Use huggingface_hub.snapshot_download to download the full dataset. This downloads all files (TFRecords, recombination scripts, README) into a local cache and returns the path. We then symlink or copy into our expected data/pouring/ directory.
Note: One test file (whiteorange_to_clear1_real) was split into two parts due to upload size limits. After downloading, run the provided shell script to recombine it. This only affects the test split — training and validation are ready to use immediately.
from huggingface_hub import snapshot_download

# Download the full dataset from HuggingFace (progress bars suppressed for clean output)
hf_cache_path = snapshot_download(
    repo_id="sermanet/multiview-pouring",
    repo_type="dataset",
    local_dir=str(RAW_POURING_ROOT),
)

print("Dataset downloaded to:", hf_cache_path)

# List what was downloaded
for split_dir in sorted(RAW_POURING_ROOT.iterdir()):
    if split_dir.is_dir() and not split_dir.name.startswith("."):
        tfrecords = list(split_dir.glob("*.tfrecord*"))
        print(f"  {split_dir.name}/: {len(tfrecords)} TFRecord file(s)")
/workspaces/tcc/.venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Fetching 816 files:   0%|          | 0/816 [00:00<?, ?it/s]

Fetching 816 files:   9%|▊         | 71/816 [00:00<00:01, 637.78it/s]

Fetching 816 files:  19%|█▉        | 159/816 [00:00<00:00, 756.62it/s]

Fetching 816 files:  38%|███▊      | 306/816 [00:00<00:00, 1063.56it/s]

Fetching 816 files:  52%|█████▏    | 421/816 [00:00<00:00, 1062.26it/s]

Fetching 816 files:  65%|██████▍   | 528/816 [00:00<00:00, 974.70it/s]

Fetching 816 files:  77%|███████▋  | 627/816 [00:00<00:00, 840.38it/s]

Fetching 816 files: 100%|██████████| 816/816 [00:00<00:00, 1017.65it/s]
Dataset downloaded to: /workspaces/tcc/data/pouring
  labels/: 0 TFRecord file(s)
  tfrecords/: 0 TFRecord file(s)
  videos/: 0 TFRecord file(s)
# Step 1: Download is handled above via huggingface_hub (cell 4.1)

# Step 2 (optional): Recombine the split test file
# Only needed if you plan to use the test split
# !bash {RAW_POURING_ROOT}/tfrecords/test/whiteorange_to_clear1_real_combining.sh

# Step 3: Convert TFRecords to image-folder layout
# In the dev container terminal:
#   python -m tcc.dataset_preparation.videos_to_dataset \
#       --input-dir data/pouring \
#       --output-dir data/pouring_processed/pouring \
#       --name pouring --fps 15 --width 224 --height 224
#
# On Colab, prefix with ! instead:
#   !python -m tcc.dataset_preparation.videos_to_dataset ...

print("After downloading from HuggingFace, convert the TFRecords to image folders.")
After downloading from HuggingFace, convert the TFRecords to image folders.

4.2 Expected semantic phases

We will reason about pouring in terms of latent phases such as:
  1. reach
  2. grasp
  3. lift / position
  4. tilt
  5. pour
  6. retract / return
You do not need action labels for TCC training.
These phase names are used only for qualitative interpretation of the learned representation.

5. Configuration and training

The current aegean-ai/tcc package provides:
  • a typed configuration object
  • an alignment algorithm corresponding to TCC
  • a PyTorch training loop
The default configuration is useful to inspect first, because it tells us:
  • training algorithm
  • dataset name
  • image size
  • batch size
  • embedding size
  • checkpoint/logging schedule
from pprint import pprint

try:
    from tcc.config import get_default_config
    cfg = get_default_config()
    print(cfg)
except Exception as e:
    print("Could not import tcc yet:", repr(e))
    cfg = None
TCCConfig(logdir='/tmp/alignment_logs/', datasets=['pouring'], path_to_tfrecords='/tmp/%s_tfrecords/', training_algo='alignment', train=TrainConfig(max_iters=150000, batch_size=2, num_frames=20, visualize_interval=200), eval=EvalConfig(batch_size=2, num_frames=20, val_iters=20, tasks=['algo_loss', 'classification', 'kendalls_tau', 'event_completion', 'few_shot_classification'], frames_per_batch=25, kendalls_tau_stride=5, kendalls_tau_distance='sqeuclidean', classification_fractions=[0.1, 0.5, 1.0], few_shot_num_labeled=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], few_shot_num_episodes=50), model=ModelConfig(embedder_type='conv', base_model=BaseModelConfig(network='resnet50', layer='conv4_block3_out', train_base='only_bn'), conv_embedder=ConvEmbedderConfig(embedding_size=128, num_context_steps=2, conv_layers=[(256, 3, True), (256, 3, True)], fc_layers=[(256, True), (256, True)], capacity_scalar=2, flatten_method='max_pool', base_dropout_rate=0.0, base_dropout_spatial=False, fc_dropout_rate=0.1, dropout_rate=0.1, pooling='max', l2_normalize=False, use_bn=True), convgru_embedder=ConvGRUEmbedderConfig(conv_layers=[(512, 3, True), (512, 3, True)], gru_layers=[128], dropout_rate=0.0, use_bn=True), vggm=VGGMConfig(use_bn=True), train_embedding=True, l2_reg_weight=1e-05, resnet_pretrained_weights='/tmp/resnet50v2_weights_tf_dim_ordering_tf_kernels_notop.h5'), alignment=AlignmentConfig(loss_type='regression_mse_var', similarity_type='l2', temperature=0.1, label_smoothing=0.1, cycle_length=2, stochastic_matching=False, variance_lambda=0.001, huber_delta=0.1, normalize_indices=True, fraction=1.0), sal=SALConfig(dropout_rate=0.0, fc_layers=[(128, True), (64, True), (2, False)], shuffle_fraction=0.75, num_samples=8, label_smoothing=0.0), alignment_sal_tcn=AlignmentSalTcnConfig(alignment_loss_weight=0.33, sal_loss_weight=0.33), classification=ClassificationConfig(label_smoothing=0.0, dropout_rate=0.0), tcn=TCNConfig(positive_window=5, reg_lambda=0.002), optimizer=OptimizerConfig(type='adam', lr=LRConfig(initial_lr=0.0001, decay_type='fixed', exp_decay_rate=0.97, exp_decay_steps=1000, manual_lr_step_boundaries=[5000, 10000], manual_lr_decay_rate=0.1, num_warmup_steps=0), weight_decay=0.0), data=DataConfig(sampling_strategy='offset_uniform', stride=16, num_steps=2, frame_stride=15, image_size=224, augment=True, shuffle_queue_size=0, num_prefetch_batches=1, random_offset=1, frame_labels=True, per_dataset_fraction=1.0, per_class=False, sample_all_stride=1), augmentation=AugmentationConfig(random_flip=True, random_crop=False, brightness=True, brightness_max_delta=0.12549019607843137, contrast=True, contrast_lower=0.5, contrast_upper=1.5, hue=False, hue_max_delta=0.2, saturation=False, saturation_lower=0.5, saturation_upper=1.5), logging=LoggingConfig(report_interval=100), checkpoint=CheckpointConfig(save_interval=1000))

5.1 Utility: robust config editing

Research repositories evolve. Rather than assuming one exact config layout, we use helper functions that can set values safely if the corresponding fields exist. This makes the notebook more resilient to small refactors of the dataclass hierarchy.
def set_if_exists(obj, path, value):
    parts = path.split(".")
    cur = obj
    for p in parts[:-1]:
        if not hasattr(cur, p):
            return False
        cur = getattr(cur, p)
    if hasattr(cur, parts[-1]):
        setattr(cur, parts[-1], value)
        return True
    return False

def get_if_exists(obj, path, default=None):
    parts = path.split(".")
    cur = obj
    for p in parts:
        if not hasattr(cur, p):
            return default
        cur = getattr(cur, p)
    return cur

def summarize_config(cfg):
    keys = [
        "training_algo",
        "datasets",
        "path_to_tfrecords",
        "logdir",
        "train.batch_size",
        "train.max_iters",
        "train.num_frames",
        "eval.batch_size",
        "model.embedder_type",
        "model.conv_embedder.embedding_size",
        "model.base_model.train_base",
        "optimizer.type",
        "optimizer.lr.initial_lr",
        "data.image_size",
        "data.frame_stride",
        "data.num_steps",
    ]
    rows = []
    for k in keys:
        rows.append((k, get_if_exists(cfg, k)))
    return rows

if cfg is not None:
    for k, v in summarize_config(cfg):
        print(f"{k:40s} {v}")
training_algo                            alignment
datasets                                 ['pouring']
path_to_tfrecords                        /tmp/%s_tfrecords/
logdir                                   /tmp/alignment_logs/
train.batch_size                         2
train.max_iters                          150000
train.num_frames                         20
eval.batch_size                          2
model.embedder_type                      conv
model.conv_embedder.embedding_size       128
model.base_model.train_base              only_bn
optimizer.type                           adam
optimizer.lr.initial_lr                  0.0001
data.image_size                          224
data.frame_stride                        15
data.num_steps                           2

5.2 Choose experiment settings

The assignment requires an embedding-dimension sweep:
  • 32
  • 64
  • 128
We keep everything else as close as possible to the repo defaults so that the experiment isolates the representation bottleneck dimension.
EMBED_DIMS = [32, 64, 128]
EXPERIMENT_ROOT = pathlib.Path("runs_tutorial")
EXPERIMENT_ROOT.mkdir(exist_ok=True)

print("Experiments will be stored in:", EXPERIMENT_ROOT.resolve())
print("Embedding dims:", EMBED_DIMS)
Experiments will be stored in: /workspaces/tcc/runs_tutorial
Embedding dims: [32, 64, 128]

5.3 Build a training config for one run

The training code in src/tcc/train.py expects a TCCConfig, and the default config already uses:
  • datasets: [pouring]
  • training_algo: alignment
We modify:
  • embedding size
  • log directory
  • dataset root
  • optionally train.max_iters for a shorter tutorial run
def make_run_config(embed_dim=128, max_iters=2000, logdir=None):
    from tcc.config import get_default_config

    cfg = get_default_config()

    set_if_exists(cfg, "training_algo", "alignment")
    set_if_exists(cfg, "datasets", ["pouring"])
    set_if_exists(cfg, "train.max_iters", max_iters)
    set_if_exists(cfg, "model.conv_embedder.embedding_size", embed_dim)

    ds_fmt = str((PROCESSED_POURING_ROOT / "%s").resolve())
    set_if_exists(cfg, "path_to_tfrecords", ds_fmt)

    if logdir is None:
        logdir = str((EXPERIMENT_ROOT / f"pouring_tcc_d{embed_dim}").resolve())
    set_if_exists(cfg, "logdir", logdir)

    return cfg

try:
    demo_cfg = make_run_config(embed_dim=64, max_iters=500)
    for k, v in summarize_config(demo_cfg):
        print(f"{k:40s} {v}")
except Exception as e:
    print("Config construction failed:", repr(e))
training_algo                            alignment
datasets                                 ['pouring']
path_to_tfrecords                        /workspaces/tcc/data/pouring_processed/%s
logdir                                   /workspaces/tcc/runs_tutorial/pouring_tcc_d64
train.batch_size                         2
train.max_iters                          500
train.num_frames                         20
eval.batch_size                          2
model.embedder_type                      conv
model.conv_embedder.embedding_size       64
model.base_model.train_base              only_bn
optimizer.type                           adam
optimizer.lr.initial_lr                  0.0001
data.image_size                          224
data.frame_stride                        15
data.num_steps                           2

6. Training

The training loop in the repo is exposed through tcc.train.train(cfg). The logic is:
  1. instantiate the algorithm corresponding to cfg.training_algo
  2. build the dataset loader
  3. optimize the alignment loss
  4. save checkpoints in cfg.logdir
def run_training(cfg):
    from tcc.train import train
    print("Starting training with logdir:", cfg.logdir)
    train(cfg)

# Example debug run:
# cfg_debug = make_run_config(embed_dim=32, max_iters=50)
# run_training(cfg_debug)

print("Uncomment the debug run once the dataset path is ready.")
Uncomment the debug run once the dataset path is ready.

6.1 Full assignment runs

Run three experiments:
  • D=32D=32
  • D=64D=64
  • D=128D=128
# for d in EMBED_DIMS:
#     cfg_run = make_run_config(embed_dim=d, max_iters=5000)
#     run_training(cfg_run)

print("Run the sweep above after verifying the debug run.")
Run the sweep above after verifying the debug run.

7. Loading checkpoints and extracting embeddings

The repo provides the pieces we need:
  • get_algo(...) to instantiate the TCC algorithm
  • checkpoint loading utilities from tcc.train
  • embedding extraction utilities from tcc.evaluate
import torch
from pathlib import Path

def latest_checkpoint(logdir):
    candidates = sorted(Path(logdir).glob("checkpoint_*.pt"))
    if not candidates:
        return None
    return str(candidates[-1])

def load_trained_algo(cfg, checkpoint_path=None):
    from tcc.algos.registry import get_algo
    from tcc.train import load_checkpoint

    algo = get_algo(cfg.training_algo, cfg=cfg)
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    algo = algo.to(device)

    if checkpoint_path is None:
        checkpoint_path = latest_checkpoint(cfg.logdir)

    if checkpoint_path is None:
        raise FileNotFoundError(f"No checkpoint found in {cfg.logdir}")

    _ = load_checkpoint(checkpoint_path, algo, optimizer=None)
    algo.eval()
    return algo, device, checkpoint_path

7.1 Build the evaluation dataloader

The repo training code internally converts the top-level config into a DataConfig.
We reuse the same helper if available; otherwise we build the DataConfig manually.
def make_eval_dataloader(cfg, split="val", mode="eval"):
    """Build an evaluation dataloader from the top-level config.

    Uses the repo's internal _build_data_config helper.  If that helper
    is missing or its signature has changed, the call fails loudly so
    you know the notebook and repo are out of sync.
    """
    from tcc.datasets import create_dataset

    try:
        from tcc.train import _build_data_config
    except ImportError:
        raise ImportError(
            "Cannot import _build_data_config from tcc.train. "
            "The aegean-ai/tcc repo API may have changed. "
            "Check the repo README for the current evaluation interface."
        )

    data_cfg = _build_data_config(cfg)
    loader = create_dataset(split=split, mode=mode, config=data_cfg)
    return loader
def extract_embeddings_for_run(cfg, split="val", max_embs=0):
    """Extract embeddings from a trained checkpoint.

    Args:
        cfg: TCCConfig for the run.
        split: dataset split to evaluate ("val" or "train").
        max_embs: maximum number of video embeddings to extract.
                  0 means extract all available videos (no limit).
    """
    from tcc.evaluate import get_embeddings_dataset

    algo, device, checkpoint_path = load_trained_algo(cfg)
    loader = make_eval_dataloader(cfg, split=split, mode="eval")
    bundle = get_embeddings_dataset(algo, loader, device=device, max_embs=max_embs)

    print("Loaded checkpoint:", checkpoint_path)
    print("Videos:", len(bundle["embeddings_list"]))
    print("Flat embeddings shape:", bundle["embeddings"].shape)
    return bundle

# Example:
# cfg64 = make_run_config(embed_dim=64, max_iters=5000)
# emb_bundle = extract_embeddings_for_run(cfg64, split="val")

8. Representation diagnostics

Now we test the main scientific claim:
Do embeddings organize frames by task phase?
We use two projection methods and two diagnostic approaches:
  1. PCA — linear projection preserving global variance; fast and deterministic
  2. UMAP — nonlinear projection revealing manifold structure; better for fine-grained phase separation
For each, we produce:
  • single-video trajectory plots colored by time
  • cross-video overlays in a shared projection space (joint fit, so coordinates are comparable)
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

try:
    import umap
    HAS_UMAP = True
except ImportError:
    HAS_UMAP = False

def project_pca(Z):
    return PCA(n_components=2).fit_transform(Z)

def project_umap(Z, seed=0):
    if not HAS_UMAP:
        raise ImportError("Install umap-learn: pip install umap-learn")
    return umap.UMAP(n_components=2, random_state=seed).fit_transform(Z)

def project(Z, method="umap", seed=0):
    if method == "pca":
        return project_pca(Z)
    if method == "umap":
        return project_umap(Z, seed=seed)
    raise ValueError(f"Unknown method: {method}. Use 'pca' or 'umap'.")
def plot_single_trajectory(Z, method="umap", title=None):
    Y = project(Z, method=method)
    t = np.arange(len(Y))

    plt.figure(figsize=(6, 5))
    sc = plt.scatter(Y[:, 0], Y[:, 1], c=t, s=12)
    plt.colorbar(sc, label="time")
    plt.xlabel("component 1")
    plt.ylabel("component 2")
    plt.title(title or f"{method.upper()} trajectory")
    plt.tight_layout()
    plt.show()

def plot_multiple_trajectories(embeddings_list, names=None, method="umap", max_videos=6):
    """Plot cross-video trajectories in a shared projection space.

    All video embeddings are concatenated, projected once, then split
    back so that the 2D coordinates are comparable across videos.
    """
    n = min(max_videos, len(embeddings_list))
    selected = embeddings_list[:n]
    lengths = [len(Z) for Z in selected]
    Z_all = np.concatenate(selected, axis=0)

    Y_all = project(Z_all, method=method, seed=0)

    splits = np.cumsum(lengths[:-1])
    Y_per_video = np.split(Y_all, splits)

    plt.figure(figsize=(7, 6))
    for i, Y in enumerate(Y_per_video):
        label = names[i] if names else f"video_{i}"
        plt.plot(Y[:, 0], Y[:, 1], alpha=0.8, label=label)
    plt.xlabel("component 1")
    plt.ylabel("component 2")
    plt.title(f"{method.upper()} cross-video trajectories (joint projection)")
    plt.legend(loc="best", fontsize=8)
    plt.tight_layout()
    plt.show()
# Example usage:
#
# emb_bundle = extract_embeddings_for_run(cfg64, split="val")
# Z0 = emb_bundle["embeddings_list"][0]
# plot_single_trajectory(Z0, method="pca", title="PCA trajectory")
# plot_single_trajectory(Z0, method="umap", title="UMAP trajectory")
# plot_multiple_trajectories(emb_bundle["embeddings_list"], emb_bundle["names"], method="umap")

9. Temporal segmentation from embedding geometry

This section operationalizes the claim that the embedding has learned latent phase. We use two complementary segmentation strategies:

9.1 Change-point detection

If the representation changes rapidly at phase transitions, then dt=ztzt1d_t = \|z_t - z_{t-1}\| should spike near boundaries. This is a boundary-detection approach — it finds where phase transitions occur without assigning cluster labels.

9.2 KMeans clustering in the native embedding space

If the embedding clusters by phase, KMeans should recover coarse phase labels. We use k=6k=6 to match the six expected pouring phases (reach, grasp, lift, tilt, pour, retract). Experiment with different kk values to test sensitivity.
from sklearn.cluster import KMeans

def change_point_scores(Z):
    d = np.linalg.norm(Z[1:] - Z[:-1], axis=1)
    d = np.concatenate([[0.0], d])
    return d

def detect_boundaries(d, threshold_quantile=0.98, min_gap=10):
    thr = float(np.quantile(d, threshold_quantile))
    idx = np.where(d >= thr)[0].tolist()

    kept = []
    last = -10**9
    for i in idx:
        if i - last >= min_gap:
            kept.append(i)
            last = i
    return kept, thr

def cluster_kmeans(Z, k=6, seed=0):
    """KMeans with k=6 matching the six expected pouring phases."""
    return KMeans(n_clusters=k, random_state=seed, n_init="auto").fit_predict(Z)
def plot_segmentation(Z, labels=None, boundaries=None, title="Segmentation"):
    d = change_point_scores(Z)
    T = len(Z)

    plt.figure(figsize=(10, 3))
    plt.plot(np.arange(T), d)
    if boundaries is not None:
        for b in boundaries:
            plt.axvline(b, linestyle="--")
    plt.title(title + " — change-point score")
    plt.xlabel("frame index")
    plt.ylabel(r"$\|z_t-z_{t-1}\|$")
    plt.tight_layout()
    plt.show()

    if labels is not None:
        plt.figure(figsize=(10, 2))
        plt.plot(np.arange(T), labels, drawstyle="steps-mid")
        if boundaries is not None:
            for b in boundaries:
                plt.axvline(b, linestyle="--")
        plt.title(title + " — cluster labels over time")
        plt.xlabel("frame index")
        plt.ylabel("cluster")
        plt.tight_layout()
        plt.show()
# Example usage:
#
# Z = emb_bundle["embeddings_list"][0]
# d = change_point_scores(Z)
# boundaries, thr = detect_boundaries(d, threshold_quantile=0.98, min_gap=8)
# labels_km = cluster_kmeans(Z, k=6)
#
# plot_segmentation(Z, labels=labels_km, boundaries=boundaries, title="KMeans on native embeddings")

10. Embedding dimension sweep

The assignment asks you to compare:
  • D=32D=32
  • D=64D=64
  • D=128D=128
This matters because the embedding dimension controls the trade-off between:
  • compression
  • expressiveness
  • ease of clustering
  • risk of overfitting appearance rather than phase
def run_full_analysis_for_dimension(embed_dim, max_iters=5000, split="val",
                                    max_embs=0, num_videos_to_analyze=3):
    """Run the complete analysis pipeline for one embedding dimension.

    Analyzes up to num_videos_to_analyze videos (not just the first one)
    to ensure results are not artifacts of a single video.
    """
    cfg = make_run_config(embed_dim=embed_dim, max_iters=max_iters)
    bundle = extract_embeddings_for_run(cfg, split=split, max_embs=max_embs)

    print(f"\n===== Dimension {embed_dim} =====")
    print("Number of videos:", len(bundle["embeddings_list"]))
    print("Flat embedding matrix:", bundle["embeddings"].shape)

    if len(bundle["embeddings_list"]) == 0:
        print("No embeddings found.")
        return cfg, bundle

    n_analyze = min(num_videos_to_analyze, len(bundle["embeddings_list"]))

    for vid_idx in range(n_analyze):
        Z = bundle["embeddings_list"][vid_idx]
        vid_name = bundle["names"][vid_idx] if "names" in bundle else f"video_{vid_idx}"
        tag = f"D={embed_dim}, {vid_name}"

        plot_single_trajectory(Z, method="pca", title=f"PCA trajectory ({tag})")
        if HAS_UMAP:
            plot_single_trajectory(Z, method="umap", title=f"UMAP trajectory ({tag})")

        d = change_point_scores(Z)
        boundaries, thr = detect_boundaries(d, threshold_quantile=0.98, min_gap=8)
        labels_km = cluster_kmeans(Z, k=6)
        plot_segmentation(Z, labels=labels_km, boundaries=boundaries,
                          title=f"KMeans k=6 ({tag})")

    # Cross-video overlay (joint projection)
    if HAS_UMAP and len(bundle["embeddings_list"]) > 1:
        plot_multiple_trajectories(bundle["embeddings_list"], bundle.get("names"),
                                   method="umap")

    return cfg, bundle

# Example:
# cfg64, bundle64 = run_full_analysis_for_dimension(64, max_iters=5000)

11. Write-up questions

Q1. TCN vs TCC

Explain, in your own words, the evolution from TCN to TCC. Include the role of the soft nearest-neighbor formulation in making cycle consistency differentiable.

Q2. Does the learned representation encode phase?

Use your PCA and UMAP plots to justify a claim. Compare single-video trajectories with cross-video overlays.

Q3. How well does segmentation recover phase structure?

Compare change-point detection and KMeans clustering. Do the detected boundaries align with qualitative phase transitions? Does varying kk change the story?

Q4. What failure modes remain?

Examples:
  • appearance variation dominating phase
  • pauses causing over-segmentation
  • self-similar frames across non-adjacent stages
  • collapse of distinct phases into one cluster

12. Final checklist

Before submitting, verify that you have:
  • trained at least one real TCC run on pouring
  • extracted embeddings from a saved checkpoint
  • produced PCA and UMAP trajectory plots for multiple videos
  • produced a cross-video overlay using joint projection
  • run both change-point detection and KMeans segmentation
  • compared D=32,64,128D=32,64,128
  • written answers to all four questions (Q1–Q4) in inline markdown cells, with all supporting figures embedded in the notebook
That completes the assignment.