Skip to main content
Modern robotics requires sophisticated AI systems capable of reasoning about complex environments, planning actions, and interacting with robot platforms in real-time. ROSA (Robot Operating System Agents) demonstrates an advanced AI agent architecture built on ReAct principles, enabling natural language interaction with ROS-based robots for simulation, testing, and field deployment. ROSA Architecture
ROSA Demo: Robots can follow natural language instructions - Video developed by Oscar Poudel, Ph.D Candidate at NJIT.

Key Capabilities

Natural Language Control

Issue commands to robots using natural language, eliminating the need for manual CLI operations or complex programming.

Real-Time Interaction

Interact with robots during operation for algorithmic design, field verification, testing, and monitoring.

Semantic Navigation

Command robots using semantic goals like “go to the table” or “navigate to the charging station” rather than absolute coordinates.

ROS Integration

Seamlessly integrate with Robot Operating System (ROS) for comprehensive robot control and monitoring.

System Architecture

ROSA employs a decoupled architecture that separates AI reasoning from robot control, enabling flexibility, scalability, and maintainability. AI Agent Interaction with ROS

Core Components

AI Agent Layer Built on ReAct (Reasoning + Acting) principles, the agent combines large language model reasoning with tool use for robot control. The system uses Pydantic AI for structured agent definitions, providing type-safe, validated interactions with LLM APIs. Message Passing Infrastructure A swappable message passing mechanism decouples AI agents from ROS simulators and robot platforms. Based on NATS.io and JetStream, this architecture enables:
  • Asynchronous communication between AI and robot systems
  • Scalability to multiple robots and agents
  • Resilience through persistent message queues
  • Cloud-native deployment patterns
ROS Integration Layer Direct integration with Robot Operating System enables comprehensive robot control including:
  • Node management and introspection
  • Topic subscription and publication
  • Service calls for robot actions
  • Parameter server configuration

LLM Reasoning

ROSA supports multiple LLM backends for reasoning and planning:
  • OpenAI API: GPT-4 and GPT-3.5 for production deployments
  • Anthropic Claude: Claude 3.5 for advanced reasoning tasks
  • Open Source Models: Llama, Mistral, and other models through HuggingFace or local inference
The system intelligently manages API costs through model selection, prompt engineering, and response caching.

Demonstrated Capabilities

ROS Command and Control

ROSA enables natural language interaction with ROS systems, eliminating the need for manual command-line operations:
Example Query: “List all active ROS nodes”Agent Action:
  • Interprets the request
  • Executes rosnode list through ROS interface
  • Formats and returns structured output
Example Query: “Subscribe to the /camera/rgb/image topic and describe what the robot sees”Agent Action:
  • Subscribes to the specified topic
  • Receives image messages
  • Processes visual data through vision models
  • Provides natural language description of the scene

Semantic Navigation

Beyond basic ROS commands, ROSA demonstrates advanced capabilities for semantic robot navigation:
Example Query: “Navigate to the bench in the laboratory”Agent Reasoning:
  1. Interprets semantic location reference (“bench”)
  2. Queries environment map or semantic labels
  3. Plans path to target location
  4. Issues navigation commands to robot
  5. Monitors progress and reports status
  6. Confirms arrival at destination
The agent handles both absolute coordinates and semantic references, reasoning about the environment to translate high-level goals into executable robot commands.

Environment Simulation

ROSA integrates with Gazebo simulation environments for algorithm development and testing:
  • TurtleBot Maze Navigation: Autonomous navigation in complex indoor environments
  • Manipulation Tasks: Object grasping and placement using semantic descriptions
  • Multi-Robot Coordination: Coordinating multiple agents in shared workspaces
  • Sensor Fusion: Integrating camera, LIDAR, and IMU data for perception

Technical Architecture Details

Agent Framework

The system employs Pydantic AI for agent definitions, providing:
  • Structured Outputs: Type-safe agent responses with automatic validation
  • Tool Use: Declarative tool definitions for robot control functions
  • Prompt Engineering: Templated prompts with variable injection
  • Python-Centric Design: Native Python integration without DSL overhead
This architecture improves upon traditional agent frameworks (e.g., LangChain) by reducing dependencies, improving performance, and enabling more maintainable code.

Decoupling Strategy

ROSA’s decoupled architecture provides significant operational advantages: Development Flexibility: AI agents can be developed, tested, and updated independently of robot systems Scalability: Multiple agents can control multiple robots through the message passing layer Deployment Options:
  • Local deployment for low-latency control
  • Cloud deployment for computationally intensive reasoning
  • Hybrid architectures combining local and cloud components
Robustness: Message queues provide buffering and reliability during network interruptions

Real-World Applications

ROSA’s architecture demonstrates practical applications across robotics domains:
  • Warehouse Automation: Natural language control of autonomous mobile robots for inventory management
  • Field Robotics: Real-time operator interaction with robots for exploration and inspection tasks
  • Manufacturing: Semantic task specification for collaborative robots in assembly operations
  • Research and Development: Rapid prototyping of robot behaviors through conversational interfaces
  • Education and Training: Intuitive robot programming for students and non-experts

Technical Implementation

The system combines multiple technologies into a cohesive platform:
  • Agent Framework: Pydantic AI for structured agent definitions
  • LLM APIs: OpenAI, Anthropic, or open-source models for reasoning
  • Message Broker: NATS.io with JetStream for reliable messaging
  • Robot Framework: ROS (Robot Operating System) for robot control
  • Simulation: Gazebo for virtual robot environments
  • Visualization: RViz for real-time robot state visualization

Explore the Technology

ROSA demonstrates how AI agents can transform robot interaction through natural language interfaces, semantic reasoning, and decoupled architectures. The system showcases modern approaches to robot intelligence that prioritize maintainability, scalability, and operator accessibility (Royce et al., 2024). Contact us to discuss how ROSA’s architecture can be adapted for your robotics applications.