Multi-agent reinforcement learning is a key method for training multi-robot systems over a series of episodes in which robots are rewarded or punished according to their performance; only once the system is trained to a suitable standard is it deployed in the real world. If the system is not trained enough, the task will likely not be completed and could pose a risk to the surrounding environment. We introduce Multi-Agent Reinforcement Learning guided by Language-based Inter-Robot Negotiation (MARLIN), in which the training process requires fewer training episodes to reach peak performance. Robots are equipped with large language models that negotiate and debate a task, producing plans used to guide the policy during training. The approach dynamically switches between using reinforcement learning and LLM-based action negotiation throughout training. This reduces the number of training episodes required, compared to standard multi-agent reinforcement learning, and hence allows the system to be deployed to physical hardware earlier. The performance of this approach is compared to multi-agent reinforcement learning to show that our hybrid method requires less training at little performance cost.
Graphical Abstract: When training robots using Multi-Agent Reinforcement Learning (MARL), poor performing actions are often chosen until the robots begin to learn how to make progress in the task. We utilise the reasoning skills in dialogical language models to generate higher performing plans to guide training. The goal of this is to reach peak performance with fewer training episodes, and hence quicker deployment of MARL policies onto real hardware.
@misc{godfrey2024marlin,
title={MARLIN: Multi-Agent Reinforcement Learning Guided by Language-Based Inter-Robot Negotiation},
author={Toby Godfrey and William Hunt and Mohammad D. Soorati},
year={2024},
eprint={2410.14383},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2410.14383}
}