Why use LLMs for simulation authoring?
A typical Gazebo model requires coordinating several files:model.config, model.sdf, mesh references, material scripts, and placement poses in the world file. Writing this by hand demands:
- Knowledge of SDF schema and element nesting rules
- Correct inertia tensor values for realistic physics
- Consistent URI conventions (
model://name/meshes/...) - Collision mesh simplification decisions
Literature review
Several research threads are directly relevant to NLP-to-simulation authoring:Procedural and generative world creation
Early work on procedural content generation (PCG) for simulation used template engines and grammar-based methods. LLMs represent a qualitative shift: they can interpret underspecified natural language and infer reasonable defaults. WorldGen / SceneX-type approaches — papers such as SceneX (CVPR 2024) and Holodeck (CVPR 2024) demonstrate end-to-end pipelines that convert a text description of a room (“a modern living room with a grey sofa facing a TV”) into a 3D scene with placed furniture. These systems query LLMs to select object categories and spatial relationships, then look up 3D assets from a database (Objaverse, ShapeNet) and assign poses. The core LLM task is structured output generation — producing a JSON or XML scene graph from free text. Code-as-interface — ProgPrompt (ICLR 2023) and Code as Policies (ICRA 2023) show that LLMs are effective at generating robot task programs when given a library of available primitives. The same principle applies to SDF authoring: if the LLM is told what XML elements are available and what each does, it generates structurally correct documents. The SDF schema serves as the “primitive library.” Language-to-simulation bridges — ChatSim (CVPR 2024) specifically targets autonomous driving simulation, converting natural language commands into scene edits in a CARLA/nuScenes-style environment. The authors use a multi-agent LLM pipeline where one agent interprets the command, another places objects, and a third checks physical plausibility.Prompt engineering for structured output
Getting an LLM to produce valid XML consistently requires careful prompt design:- Schema injection — include the relevant portion of the SDF specification in the system prompt. Modern long-context models (Claude, GPT-4o) can hold the full SDF 1.6 schema and still produce coherent output.
- Few-shot examples — provide one or two complete model.sdf examples before the request. LLMs are highly sensitive to format exemplars.
- Validation feedback loop — pipe the generated XML through
gz sdf --checkand feed errors back as user messages for iterative correction. - Decomposition — generate the collision mesh description and visual mesh description separately, then combine. Monolithic generation of complex models degrades quality.
Mesh generation
SDF model files reference.DAE (COLLADA) or .OBJ meshes for geometry. LLMs generate the description of the model, but mesh files must come from one of:
- 3D asset libraries — Google Poly, Sketchfab, AWS RoboMaker models
- Procedural geometry — for simple shapes (boxes, cylinders, L-shaped objects), SDF primitive geometry suffices and requires no mesh file
- Text-to-3D models — Shap-E (OpenAI, 2023) and One-2-3-45 generate meshes from text or images; quality is improving rapidly but remains lower than hand-authored assets for precise collision models
Prompt patterns
Pattern 1: Generate a model from a description
Use this pattern to create a newmodel.sdf for a simple object using primitive geometry.
System prompt:
Pattern 2: Place models in a world file
Use this pattern to generate the<model> placement blocks for a room layout.
User prompt:
Pattern 3: Iterative correction with validation feedback
After generating SDF, validate it with the Gazebo command-line tool and feed errors back to the LLM:Worked example: generating a new appliance model
The AWS Small House world includes air conditioners (AirconditionerA_01) but no ceiling fan. Here is how to generate one using the LLM workflow.
Step 1: Describe the object
Step 2: Generate
Send the prompt to your preferred LLM (Claude, GPT-4o). The response will be a completemodel.sdf using <cylinder> and <box> primitives.
Step 3: Create model directory
model.config:
model.sdf in the same directory.
Step 4: Validate
Step 5: Place in the world
Add a placement block tohouse_world.sdf.xacro:
Step 6: Launch and inspect
<pose> values and restart.
Common failure modes and fixes
| Failure | Cause | Fix |
|---|---|---|
| Model spawns underground | Z pose too low for model origin | Increase z in <pose> to half the model height |
| Model floats above floor | Z pose too high | Decrease z |
| Physics instability (model explodes) | Unrealistic inertia tensor (off-diagonal terms) | Set all off-diagonal terms to 0; use <static>true</static> for furniture |
| Collision mesh blocks navigation | Collision geometry too large | Scale collision down, or use <static>true</static> with simplified collision |
gz sdf --check URI error | Model name in URI doesn’t match directory name | Ensure model://name matches the directory name exactly |
| Model invisible in Gazebo | Missing <visual> element | LLM sometimes omits visual for collision-only models; add it |
Integration with the TurtleBot demo
Adding new models to the house world affects robot navigation in two ways: 1. Obstacle map — If a model has a<collision> element, Nav2 will detect it as an obstacle via the LiDAR scan. Static furniture that the robot should navigate around must have collision geometry.
2. Camera perception — New objects may be detected by the YOLOv8 or HSV vision pipeline. If you add a bright red object to the scene, the HSV detector (targeting TARGET_COLOR=red) will report a detection when the robot looks at it.
After modifying the world, re-run SLAM to generate an updated map:
Further reading
- Gazebo SDF specification — complete schema reference
- Holodeck: Language Guided Generation of 3D Embodied AI Environments (CVPR 2024)
- ProgPrompt: Generating Situated Robot Task Plans using Large Language Models (ICLR 2023)
- Code as Policies: Language Model Programs for Embodied Control (ICRA 2023)
- ChatSim: Editable Scene Simulation for Autonomous Driving (CVPR 2024)
- AWS RoboMaker Small House World — open-source residential models

