A Survey of Language-Based Communication in Robotics

An Unpublished Survey Paper

(Posted on ArXiv)

William Hunt¹, Sarvapali D. Ramchurn¹ Mohammad D. Soorati¹

¹University of Southampton

Embodied robots which can interact with their environment and neighbours are increasingly being used as a test case to develop Artificial Intelligence. This creates a need for multimodal robot controllers that can operate across different types of information, including text. Large Language Models are able to process and generate textual as well as audiovisual data and, more recently, robot actions. Language Models are increasingly being applied to robotic systems; these Language-Based robots leverage the power of language models in a variety of ways. Additionally, the use of language opens up multiple forms of information exchange between members of a human-robot team. This survey motivates the use of language models in robotics, and then delineates works based on the part of the overall control flow in which language is incorporated. Language can be used by a human to task a robot, by a robot to inform a human, between robots as a human-like communication medium, and internally for a robot's planning and control. Applications of language-based robots are explored, and numerous limitations and challenges are discussed to provide a summary of the development needed for the future of language-based robotics.

Graphical Abstract: Interaction between robots can be broken into four categories: Human-to-Robot (a human instructing the robot with language), Robot-to-Human (a robot explaining or validating its actions with the human), Robot-to-Robot (robots communicating with each other), and Internal (a robot using language internally). We also discuss the advantages, some applications, and limitations to LLMs generally, in robotics, and ethically.

Human-To-Robot Communication

Modality	Paper	Code (if available)	LLM	Robot/sim	Application
Task Breakdown	Can an Embodied Agent Find Your "Cat-shaped Mug"? LLM-Guided Exploration for Zero-Shot Object Navigation	Code	GPT-3	RoboTHOR	Generate language-based plans
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances	Code	GPT-3.5	Franka Robot	Generate plans from commands
Embodied Task Planning with Large Language Models	Code	TaPA	AI2THOR	Generate plans from commands
CARTIER: Cartographic lAnguage Reasoning Targeted at Instruction Execution for Robots		GPT-3	Clearpath Jackal UGV	Generate plans from commands
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents	Code	GPT-3	VirtualHome	Generate plans from commands
SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning		GPT-4	Robot arm on wheels	Plan large tasks
DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics		DALL-E	Robot arm	Generate target state
GenSim: Generating Robotic Simulation Tasks via Large Language Models	Code	GPT-4	Ravens	Generate sim data
Pragmatic Instruction Following and Goal Assistance via Cooperative Language-Guided Inverse Planning	Code	GPT-3	VirtualHome	Determine human intentions
Code and Rewards	ProgPrompt: Generating Situated Robot Task Plans using Large Language Models	Code	GPT-3	Virtual Home & Franka Panda	Write code from a library
Deploying and Evaluating LLMs to Program Service Mobile Robots	Code	Various	RoboEval	Write code from a library
Gesture-Informed Robot Assistance via Foundation Models		GPT-3.5	Franka Panda	Write code from a library
Visual Language Maps for Robot Navigation	Code	GPT-3	AI2THOR	Write code from a library
Code as Policies: Language Model Programs for Embodied Control	Code	GPT-3	UR5e robot arm	Write code from a library
Text2Motion: From Natural Language Instructions to Feasible Plans		GPT-3.5	Franka Panda	Write code from list of skills
Language to Rewards for Robotic Skill Synthesis	Code	GPT-4	MuJoCo	Generate reward function
Language Instructed Reinforcement Learning for Human-AI Coordination	Code	GPT-3.5	N/A	Generate reward function
Planning with Large Language Models for Code Generation	Code	GPT-2	N/A	Write code from descriptions
Eureka: Human-Level Reward Design via Coding Large Language Models	Code	GPT-4	IsaacGym	Writing and updating code
Evolutionary Reward Design and Optimization with Multimodal Large Language Models		GPT-4V	IsaacGym	Writing and updating code
Scaling Robot Learning with Semantically Imagined Experience		GPT-3	Robot arm	Generate training examples
TidyBot: Personalized Robot Assistance with Large Language Models	Code	PaLM 540B	Kinova Gen3 7-DoF	Write plans in code
Integrated Input	Object-Centric Instruction Augmentation for Robotic Manipulation		GPT-3.5	Franka Robot	Augmented into the model
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action	Code	GPT-3	Clearpath Jackal UGV	Augmented into input space
Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition	Code	GPT-3 & LLAMA2	MuJoCo	Write success labelling function
InCoRo: In-Context Learning for Robotics Control with Feedback Loops		GPT-3.5	N/A	Integrated into the model
Incremental Learning of Humanoid Robot Behavior from Natural Interaction and Large Language Models		GPT-3.5	ARMAR-6	Integrated as memory and to adapt
LLM-Based Human-Robot Collaboration Framework for Manipulation Tasks		GPT-2	Overcooked	Integrated into the model
Meta-Reinforcement Learning via Language Instructions	Code	GloVe	MetaWorld and robot arm	Integrated into MDP
GRID: Scene-Graph-based Instruction-driven Robotic Task Planning	Code	INSTRUCTOR	PUDUbot2	Integrated into the model
ExTraCT -- Explainable Trajectory Corrections from language inputs using Textual description of features		S-BERT	xArm-6 manipulator	Integrated into the model
LATTE: LAnguage Trajectory TransformEr	Code	BERT	Panda robot arm	Embedding input into architecture
Interactive Language: Talking to Robots in Real Time	Code	Language Conditioned Behavioural Cloning	Language-Table and xArm6	Integrated into the model
"No, to the Right" -- Online Language Corrections for Robotic Manipulation via Shared Autonomy	Code	GPT-3	Franka Emika Panda	Modify policy with embedding
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models	Code	GPT-4	Franka Emika Panda	Modify policy with embedding
LILA: Language-Informed Latent Actions	Code	Distil-RoBERTa	Franka Panda	Disambiguate manual input
RT-Trajectory: Robotic Task Generalization via Hindsight Trajectory Sketches		RT-Trajectory	Everyday Robots arm	Generate trajectories
Yell At Your Robot: Improving On-the-Fly from Language Corrections	Code	GPT-4V	ALOHA	Modify robot behaviour

Modality

Paper

Code (if available)

LLM

Robot/sim

Application

Task Breakdown

Can an Embodied Agent Find Your "Cat-shaped Mug"? LLM-Guided Exploration for Zero-Shot Object Navigation

Modality	Paper	Code (if available)	LLM	Application
Explainability	LINGO-1: Exploring Natural Language for Autonomous Driving		LINGO-1	Describe AV actions & Q-A
	Explaining Agent Behavior with Large Language Models		GPT-4	Describe agent actions & Q-A
	Using Large Language Models for Interpreting Autonomous Robots Behaviors		GPT-3.5 & Alpaca	Describe behavior through log files
	Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving	Code	GPT-3.5	Describe AV actions
	REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction	Code	GPT-4	Describe task from video
	A Closer Look at Reward Decomposition for High-Level Robotic Explanations		GPT-3.5	Q-A in terms of reward
	Sorry Dave, I'm Afraid I Can't Do That: Explaining Unachievable Robot Tasks Using Natural Language		Syntax trees	Describe feasibility
	Behavior Explanation as Intention Signaling in Human-Robot Teaming		N/A (templates)	Signal intentions
	Explain Yourself: A Natural Language Interface for Scrutable Autonomous Robots		N/A (templates)	Q-A
	Explainable AI for Robot Failures: Generating Explanations that Improve User Assistance in Fault Recovery	Code	Syntax trees	Diagnose errors and suggest resolution
Asking for Help	Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners	Code	PaLM-2L & GPT-3.5	Ask when certainty is low
	Towards Robots That Know When They Need Help: Affordance-Based Uncertainty for Large Language Model Planners		GPT-4	Ask when certainty is low
	Interactively Robot Action Planning with Uncertainty Analysis and Active Questioning by Large Language Model		GPT-3.5	Ask to resolve ambiguity
	TEACh: Task-driven Embodied Agents that Chat	Code	Episodic Transformer	Ask to resolve ambiguity
	Asking Follow-Up Clarifications to Resolve Ambiguities in Human-Robot Conversation		N/A	Ask to resolve ambiguity
	The RobotSlang Benchmark: Dialog-guided Robot Localization and Navigation	Code	LSTM	Describe env. & ask for guidance
	Towards quantitative modeling of task confirmations in human-robot dialog		N/A	Ask when certainty is low
	LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models	Code	GPT-3	Ask to resolve ambiguity
	Jointly Improving Parsing and Perception for Natural Language Commands through Human-Robot Dialog		N/A	Ask to resolve ambiguity
	SGP-TOD: Building Task Bots Effortlessly via Schema-Guided LLM Prompting		GPT-3.5	Ask to guide through decision tree
	Safe Task Planning for Language-Instructed Multi-Robot Systems using Conformal Prediction		GPT-3.5	Ask to resolve ambiguity
	JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents		BART	Ask to resolve ambiguity
	Introspective Planning: Aligning Robots' Uncertainty with Inherent Task Ambiguity	Code	GPT-4	Ask to resolve ambiguity

Modality	Paper	Code (if available)	LLM	Application	# Agents	Human in Loop
Role Playing	CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society	Code	LLaMa-7B	Various inc. code, maths, science	2	✓
	ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate	Code	GPT-4	Debate to give advice	3+
	MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework	Code	GPT-4	Software development team	2+
	Adapting LLM Agents with Universal Feedback in Communication		GPT-4	Finding objects in ALFWorld	2
	Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback	Code	GPT-4 & Claude	Bartering (buyer/seller)	2
	Self-Adaptive Large Language Model (LLM)-Based Multiagent Systems		GPT-4	Bartering (buyer/seller)	2+
	Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View	Code	GPT-3.5	Various inc. quiz, maths, chess	2+
	Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language	Code	Various	Various captioning tasks	3	✓
	Shall We Talk: Exploring Spontaneous Collaborations of Competing LLM Agents	Code	GPT-4	Various (mostly game theory)	2+
Inter-Agent Coordination	Generative Agents: Interactive Simulacra of Human Behavior	Code	GPT-3.5	Model a community	2+
	Embodied Agents for Efficient Exploration and Smart Scene Description		CLIP	Navigation and captioning	2+
	Building Cooperative Embodied Agents Modularly with Large Language Models	Code	GPT-4	Fetching items in a home	2	✓
	AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation	Code	GPT-4	Various inc. maths, code, chess	2+	✓
	AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors	Code	GPT-4	Complex plans as advice, coding, etc.	1+
	Collaborative Multi-Agent Dialogue Model Training Via Reinforcement Learning	Code	N/A	Dialogues	2
	Improving Factuality and Reasoning in Language Models through Multiagent Debate	Code	N/A	Reasoning	2
Inter-Robot	SMART-LLM: Smart Multi-Agent Robot Task Planning using Large Language Models	Code	GPT-4	Planning and task allocation	2+
	RoCo: Dialectic Multi-Robot Collaboration with Large Language Models	Code	GPT-4	Planning and coordination	2+
	Conversational Language Models for Human-in-the-Loop Multi-Robot Coordination		GPT-4	Planning and coordination	2+	✓

Modality	Paper	Code (if available)	LLM	Robot/Sim	Application
Transformer Robotics	RT-1: Robotics Transformer for Real-World Control at Scale	Code	RT-1	7-DoF Robot Arm	Describe task
	RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control	(Unofficial open source) Code	RT-2	7-DoF Robot Arm	Describe task
	PaLM-E: An Embodied Multimodal Language Model	(Unofficial open source) Code	PaLM-E	Various	Describe and plan
	Open X-Embodiment: Robotic Learning Datasets and RT-X Models	Code	RT-X	7-DoF Robot Arm	Describe task
	AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents	(Unofficial open source) Code	AutoRT	7-DoF Robot Arm	Describe task & alignment
Language Architecture	Prompt a Robot to Walk with Large Language Models	Code	GPT-4	Various	Acts on joints
	Inner Monologue: Embodied Reasoning through Planning with Language Models		InstructGPT & CLIPort	UR5e Arm	Plan and act
	Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk		OpenLlama	N/A	Guide through behaviours
	LLM-MARS: Large Language Model for Behavior Tree Generation and NLP-enhanced Dialogue in Multi-Agent Robot Systems		Alpaca 7B	N/A	Build behaviour trees
	Robot Behavior-Tree-Based Task Generation with Large Language Models		GPT-3.5	N/A	Build behaviour trees
	Chain-of-Thought Prompting Elicits Reasoning in Large Language Models		GPT-4 & PaLM	N/A	Planning and reasoning
	Tree of Thoughts: Deliberate Problem Solving with Large Language Models	Code	GPT-4	N/A	Planning and reasoning
	Video Language Planning	Code	PaLM-E	Various	Planning and acting
	Graph of Thoughts: Solving Elaborate Problems with Large Language Models	Code	GPT-3.5	N/A	Planning and reasoning
	Reasoning about the Unseen for Efficient Outdoor Object Navigation	Code	GPT-4	Unitree Go1	Planning and reasoning
	ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning	Code	GPT-4	Clearpath Jackal	Planning and reasoning
Dynamic Adaptation	Vision-Language Interpreter for Robot Task Planning	Code	GPT-4	Robot Arm	Re-prompting from error messages
	LANCAR: Leveraging Language for Context-Aware Robot Locomotion in Unstructured Environments		GPT-4	spot-mini-mini	Modify terrain type
	Errors are Useful Prompts: Instruction Guided Task Programming with Verifier-Assisted Iterative Prompting	Code	GPT-4	Franka Panda	Reprompting from error messages
	AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation		GPT-4	Franka Research 3	Environment feedback to LLM
	Grounding LLMs For Robot Task Planning Using Closed-loop State Feedback		GPT-4 & PaLM-2	Franka Research 3	High and Low-level planning
	Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization	Code	GPT-3	N/A	Q-A

A Survey of Language-Based Communication in Robotics

An Unpublished Survey Paper (Posted on ArXiv)

Human-To-Robot Communication

Robot-To-Human Communication

Robot-To-Robot Communication

Robot Control and Reasoning

BibTeX

An Unpublished Survey Paper

(Posted on ArXiv)