Multi-agent reinforcement learning is a key method for training multi-robot systems over a series of episodes in which robots are rewarded or punished according to their performance; only once the system is trained to a suitable standard is it deployed in the real world. If the system is not trained enough, the task will likely not be completed and could pose a risk to the surrounding environment. Therefore, reaching high performance in a shorter training period can lead to significant reductions in time and resource consumption. We introduce Multi-Agent Reinforcement Learning guided by Language-based Inter-Robot Negotiation (MARLIN), which makes the training process both faster and more transparent. We equip robots with large language models that negotiate and debate the task, producing a plan that is used to guide the policy during training. We dynamically switch between using reinforcement learning and the negotiation-based approach throughout training. This offers an increase in training speed when compared to standard multi-agent reinforcement learning and allows the system to be deployed to physical hardware earlier. As robots negotiate in natural language, we can better understand the behaviour of the robots individually and as a collective. We compare the performance of our approach to multi-agent reinforcement learning and a large language model to show that our hybrid method trains faster at little cost to performance.
Graphical Abstract: When training robots using Multi-Agent Reinforcement Learning (MARL), poor performing actions are often chosen until the robots begin to learn how to make progress in the task. We utilise the reasoning skills in dialogical language models to generate higher performing plans to guide training. The goal of this is to reach peak performance faster, and hence quicker deployment of MARL policies onto real hardware.
@misc{godfrey2024marlin,
title={MARLIN: Multi-Agent Reinforcement Learning Guided by Language-Based Inter-Robot Negotiation},
author={Toby Godfrey and William Hunt and Mohammad D. Soorati},
year={2024},
eprint={2410.14383},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2410.14383}
}