Sim-to-Real Transfer

Sim-to-real transfer addresses the central challenge of robot learning: policies trained in simulation rarely work out of the box on real hardware. The domain gap, differences in visual appearance, physics, sensor noise, and dynamics between simulator and reality, degrades performance at deployment time.

The domain gap

A policy trained in Gazebo sees flat textures, perfect lighting, and noiseless depth. A real kitchen has specular reflections, clutter, and a depth camera that hallucinates on glass surfaces. The gap shows up as:

Visual mismatch, synthetic renders look nothing like real camera images
Dynamics mismatch, simulated friction, mass, and contact differ from the real robot
Sensor mismatch, perfect simulated sensors vs noisy real IMUs, cameras, and lidars

Each technique below attacks one or more of these gaps.

Domain randomization

The brute-force approach: randomize simulation parameters (textures, lighting, object positions, camera noise, friction coefficients) so broadly that the real world falls within the training distribution. The policy learns to be invariant to visual and physical variation rather than memorizing one simulated environment. Domain randomization is simple to implement but expensive, you need enough variation to cover reality, and you cannot know in advance whether you have enough. It also tends to produce conservative policies that handle variation by being cautious rather than precise.

Domain adaptation

Instead of randomizing the source domain, align the feature distributions of simulated and real data so they become indistinguishable to the policy. Techniques include:

Adversarial adaptation, train a discriminator to distinguish sim from real features, and a feature extractor that fools it
Style transfer, render simulated images through a neural style transfer network trained on real images
Feature matching, minimize distributional distance (MMD, CORAL) between sim and real feature spaces

Domain adaptation requires some real-world data for alignment, but far less than training a policy from scratch on real data.

System identification

Calibrate the simulator’s physics parameters to match the real robot. Measure real-world friction, damping, motor response curves, and sensor noise profiles, then set the simulator to match. This closes the dynamics gap directly rather than learning around it. System identification is most effective for dynamics-dominated tasks (locomotion, contact-rich manipulation) where visual appearance matters less than physical accuracy.

3D Gaussian splatting for photorealistic world generation

The techniques above all accept a hand-authored simulation as the starting point and try to compensate for its visual poverty. A different approach: start from a 3D scan of the real environment and train in a photorealistic reconstruction. 3D Gaussian Splatting (3DGS) reconstructs a dense, renderable 3D scene from a set of posed images. Each Gaussian carries position, shape, color, and opacity. Rendering is done by splatting onto an image plane, no ray marching, enabling real-time (100+ FPS) novel view synthesis. This changes the sim-to-real pipeline: Three specific roles for 3DGS in sim-to-real: Photorealistic world generation. Instead of hand-authoring simulator worlds with approximate textures and lighting, you scan the target environment (warehouse, kitchen, hospital corridor) with a camera, train a Gaussian splat, and render training views from it. The policy trains on images that look like reality because they are derived from reality, re-rendered from novel viewpoints. This drastically reduces the visual domain gap. Infinite viewpoint augmentation. A single scan produces a continuous 3D field renderable from any pose. The robot can practice navigating the real space from viewpoints it has never physically visited. Unlike image augmentation (crop, color jitter), this produces geometrically consistent novel views with correct occlusion and parallax. Semantic simulation environments. With language-embedded splats (LEGaussians, LangSplat), the reconstructed world becomes queryable, “where is the couch?” returns a 3D location. A VLA agent can train navigation and manipulation in a photorealistic, semantically labeled environment derived from a real scan, without manual annotation. The progression from classical mapping to 3DGS-based simulation:

Representation	What it captures	Sim-to-real utility
Occupancy grid (SLAM)	Free/occupied cells	Collision avoidance only
Sparse point cloud (ORB-SLAM)	3D keypoints	Re-localization landmarks
3D Gaussian Splat	Dense geometry + appearance	Photorealistic training views
Language-embedded splat	Geometry + appearance + semantics	Queryable training environment

For a hands-on exploration of this pipeline, from capturing a real room to training a splat to comparing sim vs real fidelity, see the Gaussian Splatting for Robot Navigation project.

Combining approaches

In practice, these techniques are complementary:

Use 3DGS to close the visual gap at the source
Apply domain randomization on top for factors the scan does not capture (lighting changes, object rearrangement, sensor noise)
Use system identification for dynamics-critical tasks
Apply domain adaptation as a final fine-tuning step with a small amount of real-world data

The goal is not to pick one technique but to layer them so that each residual gap is addressed by the method best suited to it. Key references: (Bousmalis et al., 2017; Marco et al., 2017; Weber et al., 2017; Pan et al., 2017; Hester et al., 2017)

References

Bousmalis, K., Irpan, A., Wohlhart, P., Bai, Y., Kelcey, M., et al. (2017). Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping.
Hester, T., Vecerik, M., Pietquin, O., Lanctot, M., Schaul, T., et al. (2017). Deep Q-learning from Demonstrations.
Marco, A., Berkenkamp, F., Hennig, P., Schoellig, A., Krause, A., et al. (2017). Virtual vs. Real: Trading Off Simulations and Physical Experiments in Reinforcement Learning with Bayesian Optimization.
Pan, X., You, Y., Wang, Z., Lu, C. (2017). Virtual to Real Reinforcement Learning for Autonomous Driving.
Weber, T., Racanière, S., Reichert, D., Buesing, L., Guez, A., et al. (2017). Imagination-Augmented Agents for Deep Reinforcement Learning.

Edit this page on GitHub or file an issue.

​The domain gap

​Domain randomization

​Domain adaptation

​System identification

​3D Gaussian splatting for photorealistic world generation

​Combining approaches

​References

The domain gap

Domain randomization

Domain adaptation

System identification

3D Gaussian splatting for photorealistic world generation

Combining approaches

References