Skip to main content

Introduction to Camera Models

The camera models are mathematical representations of how a camera captures the 3D world and projects it onto a 2D image plane. There are several different camera models, each with its own assumptions and characteristics. Some of the most common camera models include:
  1. Pinhole Camera Model: This is the simplest camera model, which assumes that light rays pass through a single point (the pinhole) and project onto the image plane. It is characterized by its focal length and the position of the pinhole.
  2. Perspective Camera Model: This model extends the pinhole camera model by incorporating lens distortion and other optical effects. It is commonly used in computer vision and graphics to simulate realistic camera behavior.
  3. Orthographic Camera Model: In this model, parallel lines in the 3D world remain parallel in the 2D image. This is useful for technical drawings and architectural visualizations, where accurate measurements are important.
  4. Spherical Camera Model: This model captures a 360-degree view of the scene by using a spherical image sensor. It is commonly used in virtual reality and panoramic photography.
  5. Omnidirectional Camera Model: Similar to the spherical camera model, this model captures a wide field of view (FOV) by using multiple lenses or a fisheye lens. It is often used in robotics and surveillance applications.
Each of these camera models has its own set of equations and parameters that describe how 3D points are projected onto the 2D image plane. Understanding these models is essential for tasks such as image rectification, 3D reconstruction, and camera calibration. Here we focus on the pinhole camera model. Pinhole camera model Pinhole camera model. The image is formed on the image plane by light rays passing through a small aperture (the pinhole) at the center of projection. The image is inverted and smaller than the object.

Camera Model Fundamentals

Pinhole Camera Model

The functions in this section use a so-called pinhole camera model. The view of a scene is obtained by projecting a scene’s 3D point PwP_w into the image plane using a perspective transformation which forms the corresponding pixel pp. Both PwP_w and pp are represented in homogeneous coordinates, i.e. as 3D and 2D homogeneous vector respectively. The distortion-free projective transformation given by a pinhole camera model is: λ  p=K[Rt]Pw\lambda \; p = K \begin{bmatrix} R|t \end{bmatrix} P_w where:
  • PwP_w is a 3D point expressed with respect to the world coordinate system
  • pp is a 2D pixel in the image plane
  • KK is the camera intrinsic matrix
  • RR and tt are the rotation and translation that describe the change of coordinates from world to camera coordinate systems
  • λ\lambda is the projective transformation’s arbitrary scaling

Camera Intrinsic Matrix

The camera intrinsic matrix KK projects 3D points given in the camera coordinate system to 2D pixel coordinates: p=KPcp = K P_c The camera intrinsic matrix KK is composed of the focal lengths fxf_x and fyf_y, which are expressed in pixel units, and the principal point (cx,cy)(c_x, c_y), that is usually close to the image center: K=[fx0cx0fycy001]K = \begin{bmatrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix} and thus: λ[uv1]=[fx0cx0fycy001][XcYcZc]\lambda \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = \begin{bmatrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} X_c \\ Y_c \\ Z_c \end{bmatrix}

Coordinate Transformations

The joint rotation-translation matrix [Rt][R|t] is the matrix product of a projective transformation and a homogeneous transformation. The 3-by-4 projective transformation maps 3D points represented in camera coordinates to 2D points in the image plane and represented in normalized camera coordinates x=Xc/Zcx' = X_c / Z_c and y=Yc/Zcy' = Y_c / Z_c. The homogeneous transformation is encoded by the extrinsic parameters RR and tt and represents the change of basis from world coordinate system ww to the camera coordinate system cc: Pc=[Rt01]PwP_c = \begin{bmatrix} R & t \\ 0 & 1 \end{bmatrix} P_w This gives us the complete transformation: λ[uv1]=[fx0cx0fycy001][r11r12r13txr21r22r23tyr31r32r33tz][XwYwZw1]\lambda \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = \begin{bmatrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} r_{11} & r_{12} & r_{13} & t_x \\ r_{21} & r_{22} & r_{23} & t_y \\ r_{31} & r_{32} & r_{33} & t_z \end{bmatrix} \begin{bmatrix} X_w \\ Y_w \\ Z_w \\ 1 \end{bmatrix} If Zc0Z_c \neq 0, this is equivalent to: [uv]=[fxXc/Zc+cxfyYc/Zc+cy]\begin{bmatrix} u \\ v \end{bmatrix} = \begin{bmatrix} f_x X_c/Z_c + c_x \\ f_y Y_c/Z_c + c_y \end{bmatrix}

Lens Distortion Model

Real lenses introduce distortions (radial and tangential). Distortion examples Distortion examples: barrel, pincushion, and tangential distortions. The extended camera model accounts for this: [uv]=[fxx+cxfyy+cy]\begin{bmatrix} u \\ v \end{bmatrix} = \begin{bmatrix} f_x x'' + c_x \\ f_y y'' + c_y \end{bmatrix} where: [xy]=[x1+k1r2+k2r4+k3r61+k4r2+k5r4+k6r6+2p1xy+p2(r2+2x2)+s1r2+s2r4y1+k1r2+k2r4+k3r61+k4r2+k5r4+k6r6+p1(r2+2y2)+2p2xy+s3r2+s4r4]\begin{bmatrix} x'' \\ y'' \end{bmatrix} = \begin{bmatrix} x' \frac{1 + k_1 r^2 + k_2 r^4 + k_3 r^6}{1 + k_4 r^2 + k_5 r^4 + k_6 r^6} + 2 p_1 x' y' + p_2(r^2 + 2 x'^2) + s_1 r^2 + s_2 r^4 \\ y' \frac{1 + k_1 r^2 + k_2 r^4 + k_3 r^6}{1 + k_4 r^2 + k_5 r^4 + k_6 r^6} + p_1 (r^2 + 2 y'^2) + 2 p_2 x' y' + s_3 r^2 + s_4 r^4 \end{bmatrix} with r2=x2+y2r^2 = x'^2 + y'^2 and [xy]=[Xc/ZcYc/Zc]\begin{bmatrix} x' \\ y' \end{bmatrix} = \begin{bmatrix} X_c/Z_c \\ Y_c/Z_c \end{bmatrix} if Zc0Z_c \neq 0. Distortion Parameters:
  • Radial coefficients: k1k_1, k2k_2, k3k_3, k4k_4, k5k_5, k6k_6
  • Tangential coefficients: p1p_1, p2p_2
  • Thin prism coefficients: s1s_1, s2s_2, s3s_3, s4s_4
The distortion coefficients are passed as: (k1,k2,p1,p2[,k3[,k4,k5,k6[,s1,s2,s3,s4[,τx,τy]]]])(k_1, k_2, p_1, p_2[, k_3[, k_4, k_5, k_6 [, s_1, s_2, s_3, s_4[, \tau_x, \tau_y]]]]) Types of Distortion:
  • Barrel distortion: (1+k1r2+k2r4+k3r6)(1 + k_1 r^2 + k_2 r^4 + k_3 r^6) monotonically decreasing
  • Pincushion distortion: (1+k1r2+k2r4+k3r6)(1 + k_1 r^2 + k_2 r^4 + k_3 r^6) monotonically increasing

Coordinate Systems

Right-handed vs Left-handed

The right-handed and left-handed coordinate systems are two conventions for defining the orientation of axes in 3D space.

ROS2 Coordinate System

The table below shows what ROS2 RViz2 displays - it’s a right-handed coordinate system. The right-hand rule is used to determine the direction of the axes.
AxisDirectionColor
XForwardRed
YLeftGreen
ZUpBlue

Sensor Coordinate Systems

Each sensor has its own coordinate system, supported by the vendor documentation, which may be right-handed or left-handed. Take for example RealSense cameras - a right-handed sensor.
RealSense Camera Coordinate Conventions:
  • Point of View: Imagine standing behind the camera, looking forward
  • ROS2 Coordinate System: (X: Forward, Y: Left, Z: Up)
  • Camera Optical Coordinate System: (X: Right, Y: Down, Z: Forward)
  • References: REP-0103, REP-0105
All data published in RealSense wrapper topics is optical data taken directly from the camera sensors. Static and dynamic TF topics publish optical CS and ROS CS to give the user the ability to move from one CS to another.
When exporting vertices to PLY format (a common 3D file format), the RealSense SDK, since version 2.19.0, converts the points to a left-handed coordinate system. This conversion was implemented to ensure compatibility with MeshLab, where the default viewpoint is configured for a left-handed coordinate system.

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.