Human Robot Interaction

Modern robotics requires sophisticated AI systems capable of reasoning about complex environments, planning actions, and interacting with robot platforms in real-time. ROSA (Robot Operating System Agents) demonstrates an advanced AI agent architecture built on ReAct principles, enabling natural language interaction with ROS-based robots for simulation, testing, and field deployment.

ROSA Demo: Robots can follow natural language instructions - Video developed by Oscar Poudel, Ph.D Candidate at NJIT.

Key Capabilities

Natural Language Control

Issue commands to robots using natural language, eliminating the need for manual CLI operations or complex programming.

Real-Time Interaction

Interact with robots during operation for algorithmic design, field verification, testing, and monitoring.

Semantic Navigation

Command robots using semantic goals like “go to the table” or “navigate to the charging station” rather than absolute coordinates.

ROS Integration

Seamlessly integrate with Robot Operating System (ROS) for comprehensive robot control and monitoring.

System Architecture

ROSA employs a decoupled architecture that separates AI reasoning from robot control, enabling flexibility, scalability, and maintainability.

Core Components

AI Agent Layer Built on ReAct (Reasoning + Acting) principles, the agent combines large language model reasoning with tool use for robot control. The system uses Pydantic AI for structured agent definitions, providing type-safe, validated interactions with LLM APIs. Message Passing Infrastructure A swappable message passing mechanism decouples AI agents from ROS simulators and robot platforms. Based on NATS.io and JetStream, this architecture enables:

Asynchronous communication between AI and robot systems
Scalability to multiple robots and agents
Resilience through persistent message queues
Cloud-native deployment patterns

ROS Integration Layer Direct integration with Robot Operating System enables comprehensive robot control including:

Node management and introspection
Topic subscription and publication
Service calls for robot actions
Parameter server configuration

LLM Reasoning

ROSA supports multiple LLM backends for reasoning and planning:

OpenAI API: GPT-4 and GPT-3.5 for production deployments
Anthropic Claude: Claude 3.5 for advanced reasoning tasks
Open Source Models: Llama, Mistral, and other models through HuggingFace or local inference

The system intelligently manages API costs through model selection, prompt engineering, and response caching.

Demonstrated Capabilities

ROS Command and Control

ROSA enables natural language interaction with ROS systems, eliminating the need for manual command-line operations:

Example Query: “List all active ROS nodes”Agent Action:

Interprets the request
Executes rosnode list through ROS interface
Formats and returns structured output

Example Query: “Subscribe to the /camera/rgb/image topic and describe what the robot sees”Agent Action:

Subscribes to the specified topic
Receives image messages
Processes visual data through vision models
Provides natural language description of the scene

Beyond basic ROS commands, ROSA demonstrates advanced capabilities for semantic robot navigation:

Example Query: “Navigate to the bench in the laboratory”Agent Reasoning:

Interprets semantic location reference (“bench”)
Queries environment map or semantic labels
Plans path to target location
Issues navigation commands to robot
Monitors progress and reports status
Confirms arrival at destination

The agent handles both absolute coordinates and semantic references, reasoning about the environment to translate high-level goals into executable robot commands.

Environment Simulation

ROSA integrates with Gazebo simulation environments for algorithm development and testing:

TurtleBot Maze Navigation: Autonomous navigation in complex indoor environments
Manipulation Tasks: Object grasping and placement using semantic descriptions
Multi-Robot Coordination: Coordinating multiple agents in shared workspaces
Sensor Fusion: Integrating camera, LIDAR, and IMU data for perception

Technical Architecture Details

Agent Framework

The system employs Pydantic AI for agent definitions, providing:

Structured Outputs: Type-safe agent responses with automatic validation
Tool Use: Declarative tool definitions for robot control functions
Prompt Engineering: Templated prompts with variable injection
Python-Centric Design: Native Python integration without DSL overhead

This architecture improves upon traditional agent frameworks (e.g., LangChain) by reducing dependencies, improving performance, and enabling more maintainable code.

Decoupling Strategy

ROSA’s decoupled architecture provides significant operational advantages: Development Flexibility: AI agents can be developed, tested, and updated independently of robot systems Scalability: Multiple agents can control multiple robots through the message passing layer Deployment Options:

Local deployment for low-latency control
Cloud deployment for computationally intensive reasoning
Hybrid architectures combining local and cloud components

Robustness: Message queues provide buffering and reliability during network interruptions

Real-World Applications

ROSA’s architecture demonstrates practical applications across robotics domains:

Warehouse Automation: Natural language control of autonomous mobile robots for inventory management
Field Robotics: Real-time operator interaction with robots for exploration and inspection tasks
Manufacturing: Semantic task specification for collaborative robots in assembly operations
Research and Development: Rapid prototyping of robot behaviors through conversational interfaces
Education and Training: Intuitive robot programming for students and non-experts

Technical Implementation

The system combines multiple technologies into a cohesive platform:

Agent Framework: Pydantic AI for structured agent definitions
LLM APIs: OpenAI, Anthropic, or open-source models for reasoning
Message Broker: NATS.io with JetStream for reliable messaging
Robot Framework: ROS (Robot Operating System) for robot control
Simulation: Gazebo for virtual robot environments
Visualization: RViz for real-time robot state visualization

Explore the Technology

ROSA demonstrates how AI agents can transform robot interaction through natural language interfaces, semantic reasoning, and decoupled architectures. The system showcases modern approaches to robot intelligence that prioritize maintainability, scalability, and operator accessibility (Royce et al., 2024). Contact us to discuss how ROSA’s architecture can be adapted for your robotics applications.

Edit this page on GitHub or file an issue.

Overview

Sports Analytics

Language Agents

Computer Using Agents

Physical AI

Human Robot Interaction

Key Capabilities

Natural Language Control

Real-Time Interaction

Semantic Navigation

ROS Integration

System Architecture

Core Components

LLM Reasoning

Demonstrated Capabilities

ROS Command and Control

Semantic Navigation

Environment Simulation

Technical Architecture Details

Agent Framework

Decoupling Strategy

Real-World Applications

Technical Implementation

Explore the Technology

Overview

Sports Analytics

Language Agents

Computer Using Agents

Physical AI

​Key Capabilities

Natural Language Control

Real-Time Interaction

Semantic Navigation

ROS Integration

​System Architecture

​Core Components

​LLM Reasoning

​Demonstrated Capabilities

​ROS Command and Control

​Semantic Navigation

​Environment Simulation

​Technical Architecture Details

​Agent Framework

​Decoupling Strategy

​Real-World Applications

​Technical Implementation

​Explore the Technology

Key Capabilities

System Architecture

Core Components

LLM Reasoning

Demonstrated Capabilities

ROS Command and Control

Semantic Navigation

Environment Simulation

Technical Architecture Details

Agent Framework

Decoupling Strategy

Real-World Applications

Technical Implementation

Explore the Technology