Back to news

Multi-agent navigation: communication

Lug 30, 2025

This is the last of three parts that describe developments related to multi-agent navigation carried out by SUPSI in the context of REXASI-PRO.

Proxemics, a form of implicit communication, governs human perception and use of space. For instance, standing in front of a doorstep implicitly communicates our intention to enter. Similarly, advancing towards someone indicates our intention to interact. This form of communication is crucial for coordinating among individuals, in particular during navigation. Therefore, it should be considered when developing navigation algorithms for smart wheelchairs that promote local coordination, such as the formation of lanes of flow.

In situations with limited free space and increased probability of conflicts, people resort to more explicit rules and forms of communication. For instance, they may nod or wave to signal precedence, similarly to how drivers use turning lights. In REXASI-PRO, we addressed this challenge in two ways. We designed model-based communication strategies that make the smart wheelchair’s intents more explicit, such as projecting the desired direction or trajectory on the floor, which we demonstrated in Virtual Reality. Additionally, we trained machine-learning models that simultaneously learn to navigate and to communicate with neighbors. In this article, we present one example of this second approach, which is explored extensively in a Navground Learning tutorial.

We consider the following very simple navigation scenario. A red pad lies in the middle of a corridor. Two agents advance in opposing directions along the corridor. They need to avoid simultaneously occupying the pad. The next video illustrates what may happen when the agents disregard each other. When both agents occupy the pad, they turn red, marking a failure to perform the task.

It is not difficult to design an algorithm that achieves optimal performance in solving this problem, provided that the agents have complete perception of each other. The next video illustrates an optimal policy where one agent slows down to allow the other agent to pass

Despite its simplicity, this problem presents also an intriguing opportunity to apply machine-learning techniques. We are particularly interested in the case where agents perceive the location of the pad but not of their neighbor, yet they possess the capability to communicate (e.g., via radio) among themselves. In such a case, they could still exchange their relative position with respect to the pad, thereby providing all the information for learning an effective policy. Instead of predefined messages, can the agents themselves learn to exchange and interpret useful messages? In other words, can they learn a policy that accepts the pad location along with the message received from the neighbor and outputs an action (e.g., an acceleration) in conjunction with the message to be sent to the neighbor?

This is indeed possible, even if we restrict the message to a single floating-point number or a single bit. In the next video, agents use the learned policy to move and communicate. Messages encode a single number between -1 and 1, displayed as a colored LED with a scale ranging from red (-1) to green (1)

Learning such policies requires strategies to address instabilities that are common in multi-agent Reinforcement Learning. These instabilities arise from agents learning in a dynamic environment where the actions of one agent influence the rewards and observations of the other agents. Learning to communicate introduces further instability. In the tutorial, we explore several strategies. For instance, centralized training provides a stable method to optimize communication in Reinforcement Learning. Additionally, we can partition the action and communication components of the policy, training them with distinct learning rates or even alternating their training, which further enhances stability.