A Survey of Language-Based Communication in Robotics

An Unpublished Survey Paper
(Posted on ArXiv)

William Hunt1, Sarvapali D.  Ramchurn1 Mohammad D.  Soorati1
1University of Southampton

Embodied robots which can interact with their environment and neighbours are increasingly being used as a test case to develop Artificial Intelligence. This creates a need for multimodal robot controllers that can operate across different types of information, including text. Large Language Models are able to process and generate textual as well as audiovisual data and, more recently, robot actions. Language Models are increasingly being applied to robotic systems; these Language-Based robots leverage the power of language models in a variety of ways. Additionally, the use of language opens up multiple forms of information exchange between members of a human-robot team. This survey motivates the use of language models in robotics, and then delineates works based on the part of the overall control flow in which language is incorporated. Language can be used by a human to task a robot, by a robot to inform a human, between robots as a human-like communication medium, and internally for a robot's planning and control. Applications of language-based robots are explored, and numerous limitations and challenges are discussed to provide a summary of the development needed for the future of language-based robotics.

Graphical Abstract: Interaction between robots can be broken into four categories: Human-to-Robot (a human instructing the robot with language), Robot-to-Human (a robot explaining or validating its actions with the human), Robot-to-Robot (robots communicating with each other), and Internal (a robot using language internally). We also discuss the advantages, some applications, and limitations to LLMs generally, in robotics, and ethically.

Graphical Abstract

Human-To-Robot Communication

The most suited use-case of language models in robotics is the direct commanding of robots. Classical approaches have defined a library of skills and commands with textual labels that can be used to control a robot. However the introduction of LLMs redefines this paradigm; what was once a discrete input space defined by a programmer can now be learned at a higher level. By considering a textual (or, by extension, visual or gestural) command as encoding some meaning which exists in an embedding space, it can be theorised that a model can interpret an instruction never before seen, and internally relate it to previous inputs. This is encapsulated by the fundamental concept of machine learning --- generalisation, but it is an important and nontrivial step to move from generalising to classify unseen images to interesting and acting based on the semantic meaning of a sentence. The ``Human to Robot Communication'' category describes work which focuses on using human-style commands that may otherwise be issued to other humans, such as ``pick up the red ball'' and creating models that map this to an action.
Modality Paper Code (if available) LLM Robot/sim Application
Task Breakdown Can an Embodied Agent Find Your "Cat-shaped Mug"? LLM-Guided Exploration for Zero-Shot Object Navigation Code GPT-3 RoboTHOR Generate language-based plans
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances Code GPT-3.5 Franka Robot Generate plans from commands
Embodied Task Planning with Large Language Models Code TaPA AI2THOR Generate plans from commands
CARTIER: Cartographic lAnguage Reasoning Targeted at Instruction Execution for Robots GPT-3 Clearpath Jackal UGV Generate plans from commands
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents Code GPT-3 VirtualHome Generate plans from commands
SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning GPT-4 Robot arm on wheels Plan large tasks
DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics DALL-E Robot arm Generate target state
GenSim: Generating Robotic Simulation Tasks via Large Language Models Code GPT-4 Ravens Generate sim data
Pragmatic Instruction Following and Goal Assistance via Cooperative Language-Guided Inverse Planning Code GPT-3 VirtualHome Determine human intentions
Code and Rewards ProgPrompt: Generating Situated Robot Task Plans using Large Language Models Code GPT-3 Virtual Home & Franka Panda Write code from a library
Deploying and Evaluating LLMs to Program Service Mobile Robots Code Various RoboEval Write code from a library
Gesture-Informed Robot Assistance via Foundation Models GPT-3.5 Franka Panda Write code from a library
Visual Language Maps for Robot Navigation Code GPT-3 AI2THOR Write code from a library
Code as Policies: Language Model Programs for Embodied Control Code GPT-3 UR5e robot arm Write code from a library
Text2Motion: From Natural Language Instructions to Feasible Plans GPT-3.5 Franka Panda Write code from list of skills
Language to Rewards for Robotic Skill Synthesis Code GPT-4 MuJoCo Generate reward function
Language Instructed Reinforcement Learning for Human-AI Coordination Code GPT-3.5 N/A Generate reward function
Planning with Large Language Models for Code Generation Code GPT-2 N/A Write code from descriptions
Eureka: Human-Level Reward Design via Coding Large Language Models Code GPT-4 IsaacGym Writing and updating code
Evolutionary Reward Design and Optimization with Multimodal Large Language Models GPT-4V IsaacGym Writing and updating code
Scaling Robot Learning with Semantically Imagined Experience GPT-3 Robot arm Generate training examples
TidyBot: Personalized Robot Assistance with Large Language Models Code PaLM 540B Kinova Gen3 7-DoF Write plans in code
Integrated Input Object-Centric Instruction Augmentation for Robotic Manipulation GPT-3.5 Franka Robot Augmented into the model
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action Code GPT-3 Clearpath Jackal UGV Augmented into input space
Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition Code GPT-3 & LLAMA2 MuJoCo Write success labelling function
InCoRo: In-Context Learning for Robotics Control with Feedback Loops GPT-3.5 N/A Integrated into the model
Incremental Learning of Humanoid Robot Behavior from Natural Interaction and Large Language Models GPT-3.5 ARMAR-6 Integrated as memory and to adapt
LLM-Based Human-Robot Collaboration Framework for Manipulation Tasks GPT-2 Overcooked Integrated into the model
Meta-Reinforcement Learning via Language Instructions Code GloVe MetaWorld and robot arm Integrated into MDP
GRID: Scene-Graph-based Instruction-driven Robotic Task Planning Code INSTRUCTOR PUDUbot2 Integrated into the model
ExTraCT -- Explainable Trajectory Corrections from language inputs using Textual description of features S-BERT xArm-6 manipulator Integrated into the model
LATTE: LAnguage Trajectory TransformEr Code BERT Panda robot arm Embedding input into architecture
Interactive Language: Talking to Robots in Real Time Code Language Conditioned Behavioural Cloning Language-Table and xArm6 Integrated into the model
"No, to the Right" -- Online Language Corrections for Robotic Manipulation via Shared Autonomy Code GPT-3 Franka Emika Panda Modify policy with embedding
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models Code GPT-4 Franka Emika Panda Modify policy with embedding
LILA: Language-Informed Latent Actions Code Distil-RoBERTa Franka Panda Disambiguate manual input
RT-Trajectory: Robotic Task Generalization via Hindsight Trajectory Sketches RT-Trajectory Everyday Robots arm Generate trajectories
Yell At Your Robot: Improving On-the-Fly from Language Corrections Code GPT-4V ALOHA Modify robot behaviour

Robot-To-Human Communication

Agents are often difficult to understand for human operators and bystanders, and a sensible application of LLMs is to address this problem. Robots can use language to describe their actions, beliefs, and intentions to a human; such an approach removes the requirement for expert users who can debug the robot's beliefs, instead communicating in a way that almost all can understand --- language. The generative aspect of LLMs is naturally suited to this task, as well as the question-answering mode that is often employed. This is especially relevant when paired with visual models and/or in safety-critical scenarios; describing why an agent took a certain action could help operators catch mistakes when human lives are at risk. The ``Robot to Human Communication'' category describes work which employs LLMs to feed information back to humans for purposes ranging from explainability to concisely representing their observations of the environment.
Modality Paper Code (if available) LLM Application
Explainability LINGO-1: Exploring Natural Language for Autonomous Driving LINGO-1 Describe AV actions & Q-A
Explaining Agent Behavior with Large Language Models GPT-4 Describe agent actions & Q-A
Using Large Language Models for Interpreting Autonomous Robots Behaviors GPT-3.5 & Alpaca Describe behavior through log files
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving Code GPT-3.5 Describe AV actions
REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction Code GPT-4 Describe task from video
A Closer Look at Reward Decomposition for High-Level Robotic Explanations GPT-3.5 Q-A in terms of reward
Sorry Dave, I'm Afraid I Can't Do That: Explaining Unachievable Robot Tasks Using Natural Language Syntax trees Describe feasibility
Behavior Explanation as Intention Signaling in Human-Robot Teaming N/A (templates) Signal intentions
Explain Yourself: A Natural Language Interface for Scrutable Autonomous Robots N/A (templates) Q-A
Explainable AI for Robot Failures: Generating Explanations that Improve User Assistance in Fault Recovery Code Syntax trees Diagnose errors and suggest resolution
Asking for Help Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners Code PaLM-2L & GPT-3.5 Ask when certainty is low
Towards Robots That Know When They Need Help: Affordance-Based Uncertainty for Large Language Model Planners GPT-4 Ask when certainty is low
Interactively Robot Action Planning with Uncertainty Analysis and Active Questioning by Large Language Model GPT-3.5 Ask to resolve ambiguity
TEACh: Task-driven Embodied Agents that Chat Code Episodic Transformer Ask to resolve ambiguity
Asking Follow-Up Clarifications to Resolve Ambiguities in Human-Robot Conversation N/A Ask to resolve ambiguity
The RobotSlang Benchmark: Dialog-guided Robot Localization and Navigation Code LSTM Describe env. & ask for guidance
Towards quantitative modeling of task confirmations in human-robot dialog N/A Ask when certainty is low
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models Code GPT-3 Ask to resolve ambiguity
Jointly Improving Parsing and Perception for Natural Language Commands through Human-Robot Dialog N/A Ask to resolve ambiguity
SGP-TOD: Building Task Bots Effortlessly via Schema-Guided LLM Prompting GPT-3.5 Ask to guide through decision tree
Safe Task Planning for Language-Instructed Multi-Robot Systems using Conformal Prediction GPT-3.5 Ask to resolve ambiguity
JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents BART Ask to resolve ambiguity
Introspective Planning: Aligning Robots' Uncertainty with Inherent Task Ambiguity Code GPT-4 Ask to resolve ambiguity

Robot-To-Robot Communication

Multi-robot systems often have a communication component, where robots exchange bids, plans, observations, or any other information. Classically this has been a carefully structured process based on predefined protocols. LLMs, however, provide the opportunity to remove these constraints and allow generative models to produce messages in a dialogical manner. From this dialogue emerges a collective intelligence which supports coordination, planning, and knowledge transfer. This approach is particularly appealing as it models a core strength of human groups using the exact same tools; language allows teams of people to organise, robots may be the same. We term this type of system ``Robot to Robot Communication''.
Modality Paper Code (if available) LLM Application # Agents Human in Loop
Role Playing CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society Code LLaMa-7B Various inc. code, maths, science 2
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate Code GPT-4 Debate to give advice 3+
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework Code GPT-4 Software development team 2+
Adapting LLM Agents with Universal Feedback in Communication GPT-4 Finding objects in ALFWorld 2
Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback Code GPT-4 & Claude Bartering (buyer/seller) 2
Self-Adaptive Large Language Model (LLM)-Based Multiagent Systems GPT-4 Bartering (buyer/seller) 2+
Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View Code GPT-3.5 Various inc. quiz, maths, chess 2+
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language Code Various Various captioning tasks 3
Shall We Talk: Exploring Spontaneous Collaborations of Competing LLM Agents Code GPT-4 Various (mostly game theory) 2+
Inter-Agent Coordination Generative Agents: Interactive Simulacra of Human Behavior Code GPT-3.5 Model a community 2+
Embodied Agents for Efficient Exploration and Smart Scene Description CLIP Navigation and captioning 2+
Building Cooperative Embodied Agents Modularly with Large Language Models Code GPT-4 Fetching items in a home 2
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Code GPT-4 Various inc. maths, code, chess 2+
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors Code GPT-4 Complex plans as advice, coding, etc. 1+
Collaborative Multi-Agent Dialogue Model Training Via Reinforcement Learning Code N/A Dialogues 2
Improving Factuality and Reasoning in Language Models through Multiagent Debate Code N/A Reasoning 2
Inter-Robot SMART-LLM: Smart Multi-Agent Robot Task Planning using Large Language Models Code GPT-4 Planning and task allocation 2+
RoCo: Dialectic Multi-Robot Collaboration with Large Language Models Code GPT-4 Planning and coordination 2+
Conversational Language Models for Human-in-the-Loop Multi-Robot Coordination GPT-4 Planning and coordination 2+

Robot Control and Reasoning

Language and language-based models can also be used internally withing a single robot. This can take a variety of forms, but in general these approaches seek to utilise the intelligence of LLMs within the robot's own control and reasoning processes.
Modality Paper Code (if available) LLM Robot/Sim Application
Transformer Robotics RT-1: Robotics Transformer for Real-World Control at Scale Code RT-1 7-DoF Robot Arm Describe task
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control (Unofficial open source) Code RT-2 7-DoF Robot Arm Describe task
PaLM-E: An Embodied Multimodal Language Model (Unofficial open source) Code PaLM-E Various Describe and plan
Open X-Embodiment: Robotic Learning Datasets and RT-X Models Code RT-X 7-DoF Robot Arm Describe task
AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents (Unofficial open source) Code AutoRT 7-DoF Robot Arm Describe task & alignment
Language Architecture Prompt a Robot to Walk with Large Language Models Code GPT-4 Various Acts on joints
Inner Monologue: Embodied Reasoning through Planning with Language Models InstructGPT & CLIPort UR5e Arm Plan and act
Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk OpenLlama N/A Guide through behaviours
LLM-MARS: Large Language Model for Behavior Tree Generation and NLP-enhanced Dialogue in Multi-Agent Robot Systems Alpaca 7B N/A Build behaviour trees
Robot Behavior-Tree-Based Task Generation with Large Language Models GPT-3.5 N/A Build behaviour trees
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models GPT-4 & PaLM N/A Planning and reasoning
Tree of Thoughts: Deliberate Problem Solving with Large Language Models Code GPT-4 N/A Planning and reasoning
Video Language Planning Code PaLM-E Various Planning and acting
Graph of Thoughts: Solving Elaborate Problems with Large Language Models Code GPT-3.5 N/A Planning and reasoning
Reasoning about the Unseen for Efficient Outdoor Object Navigation Code GPT-4 Unitree Go1 Planning and reasoning
ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning Code GPT-4 Clearpath Jackal Planning and reasoning
Dynamic Adaptation Vision-Language Interpreter for Robot Task Planning Code GPT-4 Robot Arm Re-prompting from error messages
LANCAR: Leveraging Language for Context-Aware Robot Locomotion in Unstructured Environments GPT-4 spot-mini-mini Modify terrain type
Errors are Useful Prompts: Instruction Guided Task Programming with Verifier-Assisted Iterative Prompting Code GPT-4 Franka Panda Reprompting from error messages
AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation GPT-4 Franka Research 3 Environment feedback to LLM
Grounding LLMs For Robot Task Planning Using Closed-loop State Feedback GPT-4 & PaLM-2 Franka Research 3 High and Low-level planning
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization Code GPT-3 N/A Q-A

BibTeX

@misc{hunt2024survey,
      title={A Survey of Language-Based Communication in Robotics},
      author={William Hunt and Sarvapali D. Ramchurn and Mohammad D. Soorati},
      year={2024},
      eprint={2406.04086},
      archivePrefix={arXiv},
      primaryClass={cs.RO}
}