
ROSA Demo: Robots can follow natural language instructions - Video developed by Oscar Poudel, Ph.D Candidate at NJIT.
Key Capabilities
Natural Language Control
Issue commands to robots using natural language, eliminating the need for manual CLI operations or complex programming.
Real-Time Interaction
Interact with robots during operation for algorithmic design, field verification, testing, and monitoring.
Semantic Navigation
Command robots using semantic goals like “go to the table” or “navigate to the charging station” rather than absolute coordinates.
ROS Integration
Seamlessly integrate with Robot Operating System (ROS) for comprehensive robot control and monitoring.
System Architecture
ROSA employs a decoupled architecture that separates AI reasoning from robot control, enabling flexibility, scalability, and maintainability.
Core Components
AI Agent Layer Built on ReAct (Reasoning + Acting) principles, the agent combines large language model reasoning with tool use for robot control. The system uses Pydantic AI for structured agent definitions, providing type-safe, validated interactions with LLM APIs. Message Passing Infrastructure A swappable message passing mechanism decouples AI agents from ROS simulators and robot platforms. Based on NATS.io and JetStream, this architecture enables:- Asynchronous communication between AI and robot systems
- Scalability to multiple robots and agents
- Resilience through persistent message queues
- Cloud-native deployment patterns
- Node management and introspection
- Topic subscription and publication
- Service calls for robot actions
- Parameter server configuration
LLM Reasoning
ROSA supports multiple LLM backends for reasoning and planning:- OpenAI API: GPT-4 and GPT-3.5 for production deployments
- Anthropic Claude: Claude 3.5 for advanced reasoning tasks
- Open Source Models: Llama, Mistral, and other models through HuggingFace or local inference
Demonstrated Capabilities
ROS Command and Control
ROSA enables natural language interaction with ROS systems, eliminating the need for manual command-line operations:Example Query: “List all active ROS nodes”Agent Action:
- Interprets the request
- Executes
rosnode listthrough ROS interface - Formats and returns structured output
Example Query: “Subscribe to the /camera/rgb/image topic and describe what the robot sees”Agent Action:
- Subscribes to the specified topic
- Receives image messages
- Processes visual data through vision models
- Provides natural language description of the scene
Semantic Navigation
Beyond basic ROS commands, ROSA demonstrates advanced capabilities for semantic robot navigation:Example Query: “Navigate to the bench in the laboratory”Agent Reasoning:
- Interprets semantic location reference (“bench”)
- Queries environment map or semantic labels
- Plans path to target location
- Issues navigation commands to robot
- Monitors progress and reports status
- Confirms arrival at destination
Environment Simulation
ROSA integrates with Gazebo simulation environments for algorithm development and testing:- TurtleBot Maze Navigation: Autonomous navigation in complex indoor environments
- Manipulation Tasks: Object grasping and placement using semantic descriptions
- Multi-Robot Coordination: Coordinating multiple agents in shared workspaces
- Sensor Fusion: Integrating camera, LIDAR, and IMU data for perception
Technical Architecture Details
Agent Framework
The system employs Pydantic AI for agent definitions, providing:- Structured Outputs: Type-safe agent responses with automatic validation
- Tool Use: Declarative tool definitions for robot control functions
- Prompt Engineering: Templated prompts with variable injection
- Python-Centric Design: Native Python integration without DSL overhead
Decoupling Strategy
ROSA’s decoupled architecture provides significant operational advantages: Development Flexibility: AI agents can be developed, tested, and updated independently of robot systems Scalability: Multiple agents can control multiple robots through the message passing layer Deployment Options:- Local deployment for low-latency control
- Cloud deployment for computationally intensive reasoning
- Hybrid architectures combining local and cloud components
Real-World Applications
ROSA’s architecture demonstrates practical applications across robotics domains:- Warehouse Automation: Natural language control of autonomous mobile robots for inventory management
- Field Robotics: Real-time operator interaction with robots for exploration and inspection tasks
- Manufacturing: Semantic task specification for collaborative robots in assembly operations
- Research and Development: Rapid prototyping of robot behaviors through conversational interfaces
- Education and Training: Intuitive robot programming for students and non-experts
Technical Implementation
The system combines multiple technologies into a cohesive platform:- Agent Framework: Pydantic AI for structured agent definitions
- LLM APIs: OpenAI, Anthropic, or open-source models for reasoning
- Message Broker: NATS.io with JetStream for reliable messaging
- Robot Framework: ROS (Robot Operating System) for robot control
- Simulation: Gazebo for virtual robot environments
- Visualization: RViz for real-time robot state visualization

