LLM-driven world and model authoring

Writing Gazebo world files and 3D model descriptions by hand is tedious and error-prone. Large language models (LLMs) can generate valid SDF XML from plain English descriptions, dramatically accelerating the world-building process. This page covers the techniques, prompt patterns, and tools for LLM-assisted Gazebo asset authoring. This page builds on Gazebo worlds and the SDF/Xacro format, which covers world-file anatomy, the SDF and Xacro schema, the model.config and model.sdf structure, and the AWS RoboMaker Small House world that the examples below extend. Read that page first if you are new to SDF authoring.

Why use LLMs for simulation authoring?

A typical Gazebo model requires coordinating several files: model.config, model.sdf, mesh references, material scripts, and placement poses in the world file. Writing this by hand demands:

Knowledge of SDF schema and element nesting rules
Correct inertia tensor values for realistic physics
Consistent URI conventions (model://name/meshes/...)
Collision mesh simplification decisions

LLMs encode knowledge of the SDF schema from training data and can produce syntactically valid XML for common object types in seconds. The human role shifts from author to reviewer and corrector.

Literature review

Several research threads are directly relevant to NLP-to-simulation authoring:

Procedural and generative world creation

Early work on procedural content generation (PCG) for simulation used template engines and grammar-based methods. LLMs represent a qualitative shift: they can interpret underspecified natural language and infer reasonable defaults. WorldGen / SceneX-type approaches, papers such as SceneX (CVPR 2024) and Holodeck (CVPR 2024) demonstrate end-to-end pipelines that convert a text description of a room (“a modern living room with a grey sofa facing a TV”) into a 3D scene with placed furniture. These systems query LLMs to select object categories and spatial relationships, then look up 3D assets from a database (Objaverse, ShapeNet) and assign poses. The core LLM task is structured output generation, producing a JSON or XML scene graph from free text. Code-as-interface, ProgPrompt (ICLR 2023) and Code as Policies (ICRA 2023) show that LLMs are effective at generating robot task programs when given a library of available primitives. The same principle applies to SDF authoring: if the LLM is told what XML elements are available and what each does, it generates structurally correct documents. The SDF schema serves as the “primitive library.” Language-to-simulation bridges, ChatSim (CVPR 2024) specifically targets autonomous driving simulation, converting natural language commands into scene edits in a CARLA/nuScenes-style environment. The authors use a multi-agent LLM pipeline where one agent interprets the command, another places objects, and a third checks physical plausibility.

Prompt engineering for structured output

Getting an LLM to produce valid XML consistently requires careful prompt design:

Schema injection, include the relevant portion of the SDF specification in the system prompt. Modern long-context models (Claude, GPT-4o) can hold the full SDF 1.6 schema and still produce coherent output.
Few-shot examples, provide one or two complete model.sdf examples before the request. LLMs are highly sensitive to format exemplars.
Validation feedback loop, pipe the generated XML through gz sdf --check and feed errors back as user messages for iterative correction.
Decomposition, generate the collision mesh description and visual mesh description separately, then combine. Monolithic generation of complex models degrades quality.

Mesh generation

SDF model files reference .DAE (COLLADA) or .OBJ meshes for geometry. LLMs generate the description of the model, but mesh files must come from one of:

3D asset libraries, Google Poly, Sketchfab, AWS RoboMaker models
Procedural geometry, for simple shapes (boxes, cylinders, L-shaped objects), SDF primitive geometry suffices and requires no mesh file
Text-to-3D models, Shap-E (OpenAI, 2023) and One-2-3-45 generate meshes from text or images; quality is improving rapidly but remains lower than hand-authored assets for precise collision models

For classroom exercises, SDF primitive geometry (boxes, cylinders, spheres) is the recommended starting point. It requires no mesh files and the LLM can generate the complete SDF without any external assets.

Prompt patterns

Pattern 1: Generate a model from a description

Use this pattern to create a new model.sdf for a simple object using primitive geometry. System prompt:

You are an expert in Gazebo SDF 1.6 model authoring. Generate syntactically valid
SDF 1.6 model files. Use primitive geometry (<box>, <cylinder>, <sphere>) unless
a mesh URI is provided. Always include realistic mass and inertia values.
Set <static>false</static> for movable objects.
Respond with only the XML, no explanation.

User prompt:

Create a model.sdf for a rectangular wooden coffee table.
Dimensions: 1.2 m long, 0.6 m wide, 0.45 m tall.
Top surface thickness: 0.03 m. Four legs, each 0.05 m square cross-section.
Mass: 12 kg. The table is a static piece of furniture.

Example LLM output:

<?xml version="1.0"?>
<sdf version="1.6">
  <model name="coffee_table">
    <static>true</static>

    {/* Table top */}
    <link name="top">
      <inertial>
        <mass>9.0</mass>
        <inertia>
          <ixx>0.279</ixx>
          <iyy>1.082</iyy>
          <izz>1.350</izz>
          <ixy>0</ixy><ixz>0</ixz><iyz>0</iyz>
        </inertia>
      </inertial>
      <collision name="collision">
        <pose>0 0 0.435 0 0 0</pose>
        <geometry>
          <box><size>1.2 0.6 0.03</size></box>
        </geometry>
      </collision>
      <visual name="visual">
        <pose>0 0 0.435 0 0 0</pose>
        <geometry>
          <box><size>1.2 0.6 0.03</size></box>
        </geometry>
        <material>
          <ambient>0.55 0.35 0.15 1</ambient>
          <diffuse>0.55 0.35 0.15 1</diffuse>
        </material>
      </visual>
    </link>

    {/* Front-left leg */}
    <link name="leg_fl">
      <inertial>
        <mass>0.75</mass>
        <inertia>
          <ixx>0.013</ixx><iyy>0.013</iyy><izz>0.0002</izz>
          <ixy>0</ixy><ixz>0</ixz><iyz>0</iyz>
        </inertia>
      </inertial>
      <collision name="collision">
        <pose>0.575 0.275 0.21 0 0 0</pose>
        <geometry>
          <box><size>0.05 0.05 0.42</size></box>
        </geometry>
      </collision>
      <visual name="visual">
        <pose>0.575 0.275 0.21 0 0 0</pose>
        <geometry>
          <box><size>0.05 0.05 0.42</size></box>
        </geometry>
        <material>
          <ambient>0.55 0.35 0.15 1</ambient>
          <diffuse>0.55 0.35 0.15 1</diffuse>
        </material>
      </visual>
    </link>

    {/* (front-right, back-left, back-right legs follow same pattern) */}
    <joint name="top_leg_fl" type="fixed">
      <parent>top</parent>
      <child>leg_fl</child>
    </joint>
  </model>
</sdf>

Pattern 2: Place models in a world file

Use this pattern to generate the <model> placement blocks for a room layout. User prompt:

I have a 6 m × 4 m living room in a Gazebo world. The origin is at the center of
the room. Generate SDF placement blocks for:
- A sofa along the south wall (y = -1.8), facing north, model URI: aws_robomaker_residential_Sofa_01
- A coffee table in front of the sofa, model URI: aws_robomaker_residential_CoffeeTable_01
- Two armchairs facing the sofa, model URI: aws_robomaker_residential_ChairA_01
- A TV stand on the north wall (y = 1.8), facing south, model URI: aws_robomaker_residential_TVStand_01

All models are static. Use realistic spacing. Output only the XML blocks.

Example LLM output:

<model name='Sofa_001'>
  <static>true</static>
  <include>
    <uri>model://aws_robomaker_residential_Sofa_01</uri>
  </include>
  <pose>0 -1.5 0 0 0 0</pose>
</model>

<model name='CoffeeTable_001'>
  <static>true</static>
  <include>
    <uri>model://aws_robomaker_residential_CoffeeTable_01</uri>
  </include>
  <pose>0 -0.6 0 0 0 0</pose>
</model>

<model name='ChairA_001'>
  <static>true</static>
  <include>
    <uri>model://aws_robomaker_residential_ChairA_01</uri>
  </include>
  <pose>-1.2 -0.3 0 0 0 -1.5708</pose>
</model>

<model name='ChairA_002'>
  <static>true</static>
  <include>
    <uri>model://aws_robomaker_residential_ChairA_01</uri>
  </include>
  <pose>1.2 -0.3 0 0 0 1.5708</pose>
</model>

<model name='TVStand_001'>
  <static>true</static>
  <include>
    <uri>model://aws_robomaker_residential_TVStand_01</uri>
  </include>
  <pose>0 1.6 0 0 0 3.1416</pose>
</model>

Pattern 3: Iterative correction with validation feedback

After generating SDF, validate it with the Gazebo command-line tool and feed errors back to the LLM:

gz sdf --check model.sdf

If validation fails, append the error to the conversation:

The SDF validator returned this error:
  Error: Missing required element [inertia] in [inertial]
  at line 12 in model.sdf

Please fix the model.sdf to resolve this error.

The LLM will correct the specific element while preserving the rest of the file. This loop typically converges in 1–3 iterations for straightforward models.

Worked example: generating a new appliance model

The AWS Small House world includes air conditioners (AirconditionerA_01) but no ceiling fan. Here is how to generate one using the LLM workflow.

Step 1: Describe the object

Create a model.sdf for a ceiling fan.
The fan consists of:
- A central hub: cylinder, 0.15 m diameter, 0.05 m tall, mass 1.5 kg, mounted at z = 0 (fan hangs from ceiling)
- Four blades: each a flat box, 0.45 m × 0.12 m × 0.008 m, mass 0.2 kg each
  Blades are arranged 90° apart, centered 0.25 m from hub center
- The fan is static (ceiling-mounted, no spinning in simulation)
Name: ceiling_fan

Step 2: Generate

Send the prompt to your preferred LLM (Claude, GPT-4o). The response will be a complete model.sdf using <cylinder> and <box> primitives.

Step 3: Create model directory

cd tb_worlds/models
mkdir ceiling_fan

Create model.config:

<?xml version="1.0" ?>
<model>
  <name>ceiling_fan</name>
  <version>1.0</version>
  <sdf version="1.6">model.sdf</sdf>
  <author><name>Generated by LLM</name><email></email></author>
  <description>Ceiling fan with four blades, static</description>
</model>

Save the generated SDF as model.sdf in the same directory.

Step 4: Validate

gz sdf --check tb_worlds/models/ceiling_fan/model.sdf

Fix any errors by feeding them back to the LLM.

Step 5: Place in the world

Add a placement block to house_world.sdf.xacro:

<model name='CeilingFan_001'>
  <static>true</static>
  <include>
    <uri>model://ceiling_fan</uri>
  </include>
  {/* Hang from kitchen ceiling at z = 2.5 m */}
  <pose>1.5 0.5 2.5 0 0 0</pose>
</model>

Step 6: Launch and inspect

docker compose up demo-world-house

Open Gazebo and verify the fan appears at the correct location. If the position is wrong, adjust the <pose> values and restart.

Common failure modes and fixes

Failure	Cause	Fix
Model spawns underground	Z pose too low for model origin	Increase z in `<pose>` to half the model height
Model floats above floor	Z pose too high	Decrease z
Physics instability (model explodes)	Unrealistic inertia tensor (off-diagonal terms)	Set all off-diagonal terms to 0; use `<static>true</static>` for furniture
Collision mesh blocks navigation	Collision geometry too large	Scale collision down, or use `<static>true</static>` with simplified collision
`gz sdf --check` URI error	Model name in URI doesn’t match directory name	Ensure `model://name` matches the directory name exactly
Model invisible in Gazebo	Missing `<visual>` element	LLM sometimes omits visual for collision-only models; add it

Integration with the TurtleBot demo

Adding new models to the house world affects robot navigation in two ways: 1. Obstacle map, If a model has a <collision> element, Nav2 will detect it as an obstacle via the LiDAR scan. Static furniture that the robot should navigate around must have collision geometry. 2. Camera perception, New objects may be detected by the YOLOv8 or HSV vision pipeline. If you add a bright red object to the scene, the HSV detector (targeting TARGET_COLOR=red) will report a detection when the robot looks at it. After modifying the world, re-run SLAM to generate an updated map:

# Terminal 1: house world with new furniture
docker compose up demo-world-house

# Terminal 2: run GMapping SLAM (if using 2D SLAM)
docker compose run overlay ros2 launch nav2_bringup slam_launch.py

# Save the map
docker compose run overlay ros2 run nav2_map_server map_saver_cli \
  -f tb_worlds/maps/house_world_map

​Why use LLMs for simulation authoring?

​Literature review

​Procedural and generative world creation

​Prompt engineering for structured output

​Mesh generation

​Prompt patterns

​Pattern 1: Generate a model from a description

​Pattern 2: Place models in a world file

​Pattern 3: Iterative correction with validation feedback

​Worked example: generating a new appliance model

​Step 1: Describe the object

​Step 2: Generate

​Step 3: Create model directory

​Step 4: Validate

​Step 5: Place in the world

​Step 6: Launch and inspect

​Common failure modes and fixes

​Integration with the TurtleBot demo

​Further reading

Why use LLMs for simulation authoring?

Literature review

Procedural and generative world creation

Prompt engineering for structured output

Mesh generation

Prompt patterns

Pattern 1: Generate a model from a description

Pattern 2: Place models in a world file

Pattern 3: Iterative correction with validation feedback

Worked example: generating a new appliance model

Step 1: Describe the object

Step 2: Generate

Step 3: Create model directory

Step 4: Validate

Step 5: Place in the world

Step 6: Launch and inspect

Common failure modes and fixes

Integration with the TurtleBot demo

Further reading