QuantV2X: A Fully Quantized Multi-Agent System for Cooperative Perception

1UCLA 2UW-Madison 3NCSU 4Purdue University 5UC Berkeley 6UT Austin

† Project lead contact: sethzhao506@g.ucla.edu.

‡ Corresponding author: jiaqima@ucla.edu.

* Equal contribution.

3.2×
System Speedup
99.8%
Accuracy Preserved
QuantV2X Teaser

QuantV2X enables efficient cooperative perception via full-stack quantization, achieving faster end-to-end performance while maintaining accuracy.

Abstract

Cooperative perception through Vehicle-to-Everything (V2X) communication offers significant potential for enhancing vehicle perception by mitigating occlusions and expanding the field of view. However, past research has predominantly focused on improving accuracy metrics without addressing the crucial system-level considerations of efficiency, latency, and real-world deployability. Noticeably, most existing systems rely on full-precision models, which incur high computational and transmission costs, making them impractical for real-time operation in resource-constrained environments.

In this paper, we introduce QuantV2X, the first fully quantized multi-agent system designed specifically for efficient and scalable deployment of multi-modal, multi-agent V2X cooperative perception. QuantV2X introduces a unified end-to-end quantization strategy across both neural network models and transmitted message representations that simultaneously reduces computational load and transmission bandwidth.

Remarkably, despite operating under low-bit constraints, QuantV2X achieves accuracy comparable to full-precision systems. More importantly, when evaluated under deployment-oriented metrics, QuantV2X reduces system-level latency by 3.2× and achieves a +9.5 improvement in mAP30 over full-precision baselines. Furthermore, QuantV2X scales more effectively, enabling larger and more capable models to fit within strict memory budgets.

These results highlight the viability of a fully quantized multi-agent intermediate fusion system for real-world deployment. The system will be publicly released to promote research in this field.

Method

QuantV2X Three-Stage Pipeline

Figure: QuantV2X three-stage pipeline (Pretraining → Codebook Learning → Post-Training Quantization).

QuantV2X follows a three-stage approach to achieve efficient and robust cooperative perception under realistic constraints. First, a backbone is pretrained with full-precision features. Next, a codebook is learned to compress BEV features into compact indices. Finally, post-training quantization is applied across the pipeline to reduce compute and memory overhead.

Quantized Codebook Communication

Instead of transmitting full-precision BEV features, QuantV2X transmits compact codebook indices. This reduces communication bandwidth while preserving semantic information. At the receiver, indices are decoded back into approximate features for fusion, enabling bandwidth savings without sacrificing perception quality.

Alignment Module

Real-world deployment introduces heterogeneity (different sensors/backbones) and spatial misalignment (pose noise, latency). QuantV2X includes an Alignment Module that corrects geometric inconsistencies before fusion, improving robustness to pose errors, reducing false positives, and maintaining cross-agent consistency.

Pose Error and Alignment Illustration

Figure: Alignment module mitigating pose error effects.

Unified End-to-End Quantization

Unlike approaches that quantize only model weights, QuantV2X unifies quantization across both neural components and the communication channel. This yields consistent acceleration and memory savings end to end, enabling real-time deployment on resource-constrained platforms without sacrificing accuracy.

Results

Model-Level (PTQ)

QuantV2X preserves accuracy under low-precision while keeping calibration cost low. The table summarizes AP metrics against prior PTQ methods on DAIR-V2X.

Method Bits (W/A) AP30 AP50 Calibration Cost (GPU·hr)
Full Precision 32/32 75.1 68.2
PD-Quant 4/8 65.5 56.1 0.37
LiDAR-PTQ 4/8 73.8 65.7 0.93
QuantV2X (Ours) 4/8 74.2 66.7 0.38
Qualitative Results (SEC)

Qualitative results (LP + CR) — ground-truth in green, predictions in red.

Qualitative Results (LSS)

Qualitative results (LP + LS) — reduced false positives vs naive quantization.

System-Level (End-to-End)

Under realistic deployment (ROS + TensorRT), QuantV2X reduces latency across all stages: local inference, communication, and fusion; the overall end-to-end speedup is approximately 3.2×.

System-Level Latency Breakdown

Latency breakdown (ms) and component-wise speedups: 1.6× local, 5.3× communication, 2.5× fusion; total 3.2×.