At NeurIPS 2024, we introduced Verifiably Robust Conformal Prediction (VRCP) — a new framework that brings statistical reliability and provable robustness to conformal prediction under adversarial attacks.
Conformal Prediction (CP) is a widely-used technique for uncertainty quantification that provides prediction sets guaranteed to cover the true output with high probability, as long as the test and calibration data are exchangeable. However, in real-world applications — especially in safety-critical settings — data can be corrupted by adversarial noise, breaking CP’s foundational assumptions and rendering its guarantees invalid.
To address this challenge, VRCP integrates neural network verification techniques with conformal prediction, enabling provable coverage guarantees under bounded adversarial perturbations.
Unlike existing robust CP methods based on randomized smoothing, which are limited to classification tasks and ℓ₂-norm attacks, VRCP generalizes to arbitrary ℓₚ norms (including ℓ₁ and ℓ∞). It supports both classification and regression.
We introduced two variants:
- VRCP–I applies verification at inference time to compute conservative predictions.
- VRCP–C shifts the verification burden to calibration, enabling faster runtime predictions without compromising robustness.
Extensive experiments demonstrate VRCP’s strong theoretical guarantees and practical performance. On image classification tasks (CIFAR10, CIFAR100, TinyImageNet) and deep reinforcement learning regression benchmarks, VRCP recovers or exceeds nominal coverage levels while producing smaller and more informative prediction sets than state-of-the-art alternatives like RSCP+.
With its robustness grounded in formal methods, VRCP marks a significant step toward trustworthy AI under adversarial uncertainty, paving the way for more reliable deployment of AI models in high-stakes environments.