Back to news

Conformal Off-Policy Prediction for Multi-Agent Systems (MA-COPP)

Ott 16, 2024

In the world of autonomous systems, ensuring safety when new and untested control strategies are introduced is a key challenge. Our work on Conformal Off-Policy Prediction for Multi-Agent Systems (MA-COPP) presents a novel approach for predicting how multiple autonomous agents will behave when one or more of them changes their control policy. This method is particularly important in environments where testing new policies directly could pose serious safety risks.

To explain this in a practical context, consider a busy train station where autonomous wheelchairs help their passengers move around. These wheelchairs need to navigate the space safely, avoiding dynamic obstacles such as pedestrians and infrastructure, as well as each other. If one wheelchair suddenly changes its navigation strategy, it could cause unexpected reactions from the others. The challenge is to predict how the entire system of wheelchairs and pedestrians will behave when one changes its decision making and course. Directly testing these new control policies in a crowded station would be highly risky, but the MA-COPP method allows developers to predict outcomes with statistical guarantees based only on past data. Hence, our method ensures safety before any real-world deployment.

Previous methods were limited to single-agent scenarios over a single timestep. These methods did not account for how other agents might react to the changes of others, especially over time. Leveraging the leading uncertainty quantification method, Conformal Prediction, MA-COPP introduces joint prediction regions (JPRs) to predict the collective behaviour of multiple agents with probabilistic guarantees, making it more comprehensive and applicable to real-world multi-agent systems, like autonomous wheelchairs in crowded spaces.

What makes our method particularly powerful is its ability to manage prediction complexity without needing to simulate every possible future trajectory. Instead, it uses a technique to construct an over-approximation of possible trajectories, which saves time and computational resources whilst maintaining statistical guarantees and accuracy. In turn, this makes it suitable for predicting safe outcomes in complex, dynamic environments.

Abstracting from the complex autonomous wheelchair setting, we show how the MA-COPP method works in practice with a simple multi-agent scenario. In the video below, there are three agents (purple and blue circles) and three landmarks. The goal of this environment is for the agents to cover all three landmarks (minimise their distance) whilst avoiding each other. The steps are as follows:

1. The environment plays out for three seconds, and then we make a prediction (based on previously collected data) what the blue agent’s future trajectory will be.

2. Joint prediction regions are constructed (using standard conformal prediction methods) over the predicted trajectory. These red circles denote that any trajectory we predict that lies within these circles we can say will be the true trajectory that the agent would take, with 95% certainty.

3. Next, we show that blue (ego) agent has changed its control policy and has taken the green path instead and that our original JPRs are no longer sufficient to capture this new behaviour. In other words, we lose our statistical guarantees.

4. Given the new control policy, we can perform simulations of the environment by plugging in data-driven models of the dynamics of the environment and control policies of the other agents. We can then see that many these trajectories are under covered by the original JPRs.

5. Using the new control policy, we apply the MA-COPP method to re-calibrate the JPRs. This step regains the statistical guarantees whilst adjusting the original JPRs just enough to capture the new behaviour.

In summary, by allowing reliable prediction of how an entire environment of autonomous agents will react to policy changes without the need for risky real-world tests, MA-COPP contributes significantly to the safety and reliability of autonomous systems.