† Project lead contact: sethzhao506@g.ucla.edu.
‡ Corresponding author: jiaqima@ucla.edu.
* Equal contribution.
Cooperative perception through Vehicle-to-Everything (V2X) communication offers significant potential for enhancing vehicle perception by mitigating occlusions and expanding the field of view. However, past research has predominantly focused on improving accuracy metrics without addressing the crucial system-level considerations of efficiency, latency, and real-world deployability. Noticeably, most existing systems rely on full-precision models, which incur high computational and transmission costs, making them impractical for real-time operation in resource-constrained environments.
In this paper, we introduce QuantV2X, the first fully quantized multi-agent system designed specifically for efficient and scalable deployment of multi-modal, multi-agent V2X cooperative perception. QuantV2X introduces a unified end-to-end quantization strategy across both neural network models and transmitted message representations that simultaneously reduces computational load and transmission bandwidth.
Remarkably, despite operating under low-bit constraints, QuantV2X achieves accuracy comparable to full-precision systems. More importantly, when evaluated under deployment-oriented metrics, QuantV2X reduces system-level latency by 3.2× and achieves a +9.5 improvement in mAP30 over full-precision baselines. Furthermore, QuantV2X scales more effectively, enabling larger and more capable models to fit within strict memory budgets.
These results highlight the viability of a fully quantized multi-agent intermediate fusion system for real-world deployment. The system will be publicly released to promote research in this field.
Figure: QuantV2X three-stage pipeline (Pretraining → Codebook Learning → Post-Training Quantization).
QuantV2X follows a three-stage approach to achieve efficient and robust cooperative perception under realistic constraints. First, a backbone is pretrained with full-precision features. Next, a codebook is learned to compress BEV features into compact indices. Finally, post-training quantization is applied across the pipeline to reduce compute and memory overhead.
Instead of transmitting full-precision BEV features, QuantV2X transmits compact codebook indices. This reduces communication bandwidth while preserving semantic information. At the receiver, indices are decoded back into approximate features for fusion, enabling bandwidth savings without sacrificing perception quality.
Real-world deployment introduces heterogeneity (different sensors/backbones) and spatial misalignment (pose noise, latency). QuantV2X includes an Alignment Module that corrects geometric inconsistencies before fusion, improving robustness to pose errors, reducing false positives, and maintaining cross-agent consistency.
Figure: Alignment module mitigating pose error effects.
Unlike approaches that quantize only model weights, QuantV2X unifies quantization across both neural components and the communication channel. This yields consistent acceleration and memory savings end to end, enabling real-time deployment on resource-constrained platforms without sacrificing accuracy.
QuantV2X preserves accuracy under low-precision while keeping calibration cost low. The table summarizes AP metrics against prior PTQ methods on DAIR-V2X.
Method | Bits (W/A) | AP30 ↑ | AP50 ↑ | Calibration Cost (GPU·hr) |
---|---|---|---|---|
Full Precision | 32/32 | 75.1 | 68.2 | — |
PD-Quant | 4/8 | 65.5 | 56.1 | 0.37 |
LiDAR-PTQ | 4/8 | 73.8 | 65.7 | 0.93 |
QuantV2X (Ours) | 4/8 | 74.2 | 66.7 | 0.38 |
Qualitative results (LP + CR) — ground-truth in green, predictions in red.
Qualitative results (LP + LS) — reduced false positives vs naive quantization.
Under realistic deployment (ROS + TensorRT), QuantV2X reduces latency across all stages: local inference, communication, and fusion; the overall end-to-end speedup is approximately 3.2×.
Latency breakdown (ms) and component-wise speedups: 1.6× local, 5.3× communication, 2.5× fusion; total 3.2×.