Skip to main content
Open In Colab

Camera Calibration

Introduction

Camera calibration is the process of estimating the intrinsic and extrinsic parameters of a camera — focal length, optical center, and lens distortion coefficients — so that we can accurately map 3D points in the world to 2D image coordinates. Calibration is a prerequisite for any vision-based robotics pipeline: visual odometry, SLAM, object pose estimation, and 3D reconstruction all depend on an accurate camera model. In simulation the camera is ideal (no distortion, known focal length), but real cameras always have manufacturing imperfections. This assignment gives you hands-on experience with the full calibration workflow using a real camera, and then applies the calibration to pose estimation in ROS 2.

Constraints

You can use libraries like NumPy, SciPy, and PyTorch for matrix operations and optimization, but you should not use any high-level computer vision libraries that provide built-in camera calibration functions (like OpenCV’s calibrateCamera). You will implement Zhang’s method from scratch. You may use the Kornia library and consult its source code for DLT and other camera calibration functions to guide your implementation. You should not call high-level Kornia APIs directly, but you can use helper functions such as kornia.geometry.conversions.convert_points_to_homogeneous. PyTorch bonus: Students who implement their solution using PyTorch will receive 10 extra credit points. You can structure your code as a package using this template, or you can work within a single Jupyter notebook.

Background Reading

Study sections 13.1 and 13.2 of Peter Corke’s textbook Robotics, Vision and Control (RVC3). The book’s GitHub repo and the Chapter 13 notebook script contain reference code.

Task 1: Implement Zhang’s Camera Calibration Method

You will implement Zhang’s method from scratch, including:
  1. DLT (Direct Linear Transform) initialization — compute an initial estimate of the homography between the calibration pattern and each image.
  2. Closed-form intrinsic parameter extraction — recover the camera matrix KK from the homographies.
  3. Non-linear refinement — refine all parameters (intrinsics + extrinsics for each view) by minimizing reprojection error. You can implement this with stochastic gradient descent (SGD) or Levenberg-Marquardt.

Test Dataset

Use the pantelism/wide-camera-calibration HuggingFace dataset as test images. The dataset is available in Parquet format:
from datasets import load_dataset
dataset = load_dataset("pantelism/wide-camera-calibration")
# Access images: dataset["train"][0]["image"]
An OpenCV reference solution is provided below. Validate your from-scratch implementation by comparing your results against the OpenCV solution.

Your Own Captures

After validating on the test dataset, repeat the calibration with your own images:
  1. Print out the 9x7 checkerboard pattern and attach it to a rigid flat surface (cardboard or clipboard).
  2. Using your smartphone, laptop webcam, or a USB camera set to a fixed zoom level, capture at least 15 images of the checkerboard from different angles, distances, and orientations — as demonstrated in Fig 13.11 of RVC3.
Capture tips:
  • Ensure good, diffuse lighting; keep the pattern flat and in focus for every shot
  • Disable autofocus and set a fixed focus for the duration of the calibration
  • Capture from many different positions and angles, up to about 45 degrees
  • Ensure the pattern fills a significant portion of the field of view and cover the entire lens area
  • Avoid images where the pattern is perfectly flat/frontal — angled views provide more information
  • Capture both wide shots and close-ups where the pattern is larger in the frame
  1. Report the following:
    • Camera matrix KK (focal lengths fxf_x, fyf_y and principal point cxc_x, cyc_y)
    • Distortion coefficients (k1k_1, k2k_2, p1p_1, p2p_2, k3k_3)
    • Reprojection error (RMS, in pixels)
    • Include your own captured images — not the book’s examples or the test dataset
  2. Visualize the undistorted vs. distorted images side by side for at least 3 of your captures. Plot the radial distortion profile (distortion factor vs. radial distance from center).

Task 2: Pose Estimation with Calibrated Camera (ROS)

Now that you have calibrated your camera, use the calibration to estimate object poses.
  1. In the Gazebo maze environment, place objects on tables (you can reuse objects from previous assignments).
  2. Using OpenCV’s solvePnP function (wrapped by RVC3’s Section 13.2.1 functions), estimate the relative pose of the objects from the camera. You need the 3D geometry of your objects (e.g., known cube dimensions) and corresponding 2D image detections. Alternatively, attach ArUco fiducial markers to the objects and use the cv2.aruco module to estimate poses directly.
  3. Establish a correspondence between a world coordinate system and the fixed objects in the scene. Use the relative pose estimates to compute the camera pose (rotation matrix and translation vector) in the world frame.
  4. In ROS 2, verify how calibration parameters are published on the /camera_info topic. For the simulated camera, confirm the parameters match the ideal model:
    camera_matrix:
      rows: 3
      cols: 3
      data: [fx, 0, cx, 0, fy, cy, 0, 0, 1]
    distortion_model: "plumb_bob"
    distortion_coefficients:
      data: [0, 0, 0, 0, 0]  # no distortion in sim
    

Deliverables

  1. This notebook (or package) with all cells executed, showing your captured images and calibration results.
  2. Your from-scratch Zhang’s method implementation with DLT initialization and non-linear refinement.
  3. Validation: comparison table of your results vs. the OpenCV reference solution on the test dataset.
  4. Your own camera’s calibration parameters and a comparison with the simulated camera’s ideal parameters.
  5. Visualizations: detected checkerboard corners, undistorted images, radial distortion profile, and estimated object poses.

OpenCV Reference Solution

The following cells provide the OpenCV reference solution for camera calibration. Use this to validate your from-scratch implementation.
from datasets import load_dataset

dataset = load_dataset("pantelism/wide-camera-calibration")
print(dataset)
# Access images: dataset["train"][0]["image"]
/workspaces/eng-ai-agents/.venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Generating train split:   0%|          | 0/44 [00:00<?, ? examples/s]

Generating train split: 100%|██████████| 44/44 [00:00<00:00, 1035.60 examples/s]
DatasetDict({
    train: Dataset({
        features: ['image', 'filename'],
        num_rows: 44
    })
})
import numpy as np
import cv2
import matplotlib.pyplot as plt
import warnings

warnings.filterwarnings("ignore")

print("OpenCV version:", cv2.__version__)
print("NumPy version:", np.__version__)

# Prepare object points for 8x6 chessboard (inner corners)
objp = np.zeros((6 * 8, 3), np.float32)
objp[:, :2] = np.mgrid[0:8, 0:6].T.reshape(-1, 2)

objpoints = []  # 3d points in real world space
imgpoints = []  # 2d points in image plane

images = dataset["train"]["image"]  # List of PIL Images
print(f"Found {len(images)} calibration images")
OpenCV version: 4.13.0
NumPy version: 2.3.1
Found 44 calibration images
# Corner detection
fig, axes = plt.subplots(2, 5, figsize=(20, 8))
axes = axes.flatten()

successful_detections = 0

for idx, img in enumerate(images[:10]):
    try:
        img = np.array(img)
        if img is None:
            continue

        gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

        ret, corners = cv2.findChessboardCorners(
            gray,
            (8, 6),
            flags=cv2.CALIB_CB_ADAPTIVE_THRESH
            + cv2.CALIB_CB_FAST_CHECK
            + cv2.CALIB_CB_NORMALIZE_IMAGE,
        )

        if ret:
            objpoints.append(objp)
            corners2 = cv2.cornerSubPix(
                gray,
                corners,
                (11, 11),
                (-1, -1),
                (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001),
            )
            imgpoints.append(corners2)
            successful_detections += 1

            img_with_corners = img.copy()
            cv2.drawChessboardCorners(img_with_corners, (8, 6), corners2, ret)
            img_rgb = cv2.cvtColor(img_with_corners, cv2.COLOR_BGR2RGB)

            if idx < 10:
                axes[idx].imshow(img_rgb)
                axes[idx].set_title(f"Image {idx + 1}: Found", fontsize=10, color="green")
                axes[idx].axis("off")
        else:
            if idx < 10:
                img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
                axes[idx].imshow(img_rgb)
                axes[idx].set_title(f"Image {idx + 1}: Failed", fontsize=10, color="red")
                axes[idx].axis("off")

    except Exception as e:
        print(f"Error processing image {idx}: {e}")
        continue

for idx in range(len(images[:10]), 10):
    axes[idx].axis("off")

plt.tight_layout()
plt.suptitle(
    f"Chessboard Corner Detection Results\n"
    f"Successfully detected corners in {successful_detections}/{len(images)} images",
    fontsize=14,
    y=1.02,
)
plt.show()

print(f"\nCalibration data summary:")
print(f"- Total images processed: {len(images)}")
print(f"- Successful corner detections: {successful_detections}")
print(f"- Success rate: {successful_detections / len(images) * 100:.1f}%")
Output from cell 3

Calibration data summary:
- Total images processed: 44
- Successful corner detections: 10
- Success rate: 22.7%
# Camera calibration
if len(objpoints) == 0:
    print("Error: No successful corner detections found.")
else:
    print(f"Starting calibration with {len(objpoints)} successful detections...")

    img = np.array(images[0])
    img_size = (img.shape[1], img.shape[0])
    print(f"Image size: {img_size}")

    ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(
        objpoints, imgpoints, img_size, None, None
    )

    print(f"Calibration successful! RMS error: {ret:.4f}")

    print("\n=== Camera Calibration Results ===")
    print("Camera Matrix (K):")
    print(mtx)
    print(f"\nFocal lengths: fx={mtx[0, 0]:.2f}, fy={mtx[1, 1]:.2f}")
    print(f"Principal point: cx={mtx[0, 2]:.2f}, cy={mtx[1, 2]:.2f}")
    print(f"Aspect ratio: {mtx[0, 0] / mtx[1, 1]:.4f}")

    print(f"\nDistortion Coefficients:")
    print(f"k1={dist[0, 0]:.6f}, k2={dist[0, 1]:.6f}, p1={dist[0, 2]:.6f}")
    print(f"p2={dist[0, 3]:.6f}, k3={dist[0, 4]:.6f}")

    # Undistortion
    dst = cv2.undistort(img, mtx, dist, None, mtx)

    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 8))
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    dst_rgb = cv2.cvtColor(dst, cv2.COLOR_BGR2RGB)

    ax1.imshow(img_rgb)
    ax1.set_title("Original Image", fontsize=16)
    ax1.axis("off")

    ax2.imshow(dst_rgb)
    ax2.set_title("Undistorted Image", fontsize=16)
    ax2.axis("off")

    plt.suptitle(f"Camera Calibration Results (RMS Error: {ret:.4f})", fontsize=18)
    plt.tight_layout()
    plt.show()

    # Reprojection error analysis
    total_error = 0
    max_error = 0

    for i in range(len(objpoints)):
        projected_points, _ = cv2.projectPoints(
            objpoints[i], rvecs[i], tvecs[i], mtx, dist
        )
        error = cv2.norm(imgpoints[i], projected_points, cv2.NORM_L2) / len(
            projected_points
        )
        total_error += error
        max_error = max(max_error, error)

    mean_error = total_error / len(objpoints)

    print(f"\n=== Reprojection Error Analysis ===")
    print(f"Mean reprojection error: {mean_error:.4f} pixels")
    print(f"Maximum reprojection error: {max_error:.4f} pixels")
    print(f"RMS reprojection error: {ret:.4f} pixels")
Starting calibration with 10 successful detections...
Image size: (1280, 960)
Calibration successful! RMS error: 0.4605

=== Camera Calibration Results ===
Camera Matrix (K):
[[560.41573383   0.         650.84218312]
 [  0.         561.65699487 498.55268096]
 [  0.           0.           1.        ]]

Focal lengths: fx=560.42, fy=561.66
Principal point: cx=650.84, cy=498.55
Aspect ratio: 0.9978

Distortion Coefficients:
k1=-0.244031, k2=0.075706, p1=0.000043
p2=0.000314, k3=-0.012066
Output from cell 4

=== Reprojection Error Analysis ===
Mean reprojection error: 0.0643 pixels
Maximum reprojection error: 0.0805 pixels
RMS reprojection error: 0.4605 pixels
# Undistortion on multiple images and distortion profile
if "mtx" in locals() and "dist" in locals():
    test_images = images[:4]

    fig, axes = plt.subplots(2, 4, figsize=(16, 8))

    for idx, img in enumerate(test_images):
        img = np.array(img)
        if img is not None:
            undist_img = cv2.undistort(img, mtx, dist, None, mtx)
            img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
            undist_rgb = cv2.cvtColor(undist_img, cv2.COLOR_BGR2RGB)

            axes[0, idx].imshow(img_rgb)
            axes[0, idx].set_title(f"Original {idx + 1}", fontsize=10)
            axes[0, idx].axis("off")

            axes[1, idx].imshow(undist_rgb)
            axes[1, idx].set_title(f"Undistorted {idx + 1}", fontsize=10)
            axes[1, idx].axis("off")

    plt.suptitle("Calibration Results on Multiple Images", fontsize=14)
    plt.tight_layout()
    plt.show()

    # Field of view
    fx, fy = mtx[0, 0], mtx[1, 1]
    width, height = img_size

    fov_x = 2 * np.arctan(width / (2 * fx)) * 180 / np.pi
    fov_y = 2 * np.arctan(height / (2 * fy)) * 180 / np.pi

    print(f"\n=== Camera Parameters Summary ===")
    print(f"Image resolution: {width} x {height}")
    print(f"Focal length: fx={fx:.1f}, fy={fy:.1f} pixels")
    print(f"Field of view: {fov_x:.1f} x {fov_y:.1f} degrees")
    print(f"Principal point: ({mtx[0, 2]:.1f}, {mtx[1, 2]:.1f})")

    # Radial distortion profile
    fig, ax = plt.subplots(1, 1, figsize=(10, 6))

    r_max = np.sqrt((width / 2) ** 2 + (height / 2) ** 2)
    r = np.linspace(0, r_max, 1000)

    k1, k2, p1, p2, k3 = dist[0]
    r_norm = r / max(width, height)
    distortion_factor = 1 + k1 * r_norm**2 + k2 * r_norm**4 + k3 * r_norm**6

    ax.plot(r, distortion_factor, "b-", linewidth=2, label="Radial distortion")
    ax.axhline(y=1, color="r", linestyle="--", alpha=0.7, label="No distortion")
    ax.set_xlabel("Radial distance from center (pixels)")
    ax.set_ylabel("Distortion factor")
    ax.set_title("Radial Distortion Profile")
    ax.legend()
    ax.grid(True, alpha=0.3)
    plt.show()

    print("Camera calibration analysis complete!")
Output from cell 5

=== Camera Parameters Summary ===
Image resolution: 1280 x 960
Focal length: fx=560.4, fy=561.7 pixels
Field of view: 97.6 x 81.0 degrees
Principal point: (650.8, 498.6)
Output from cell 5
Camera calibration analysis complete!