1University of Oxford 2Microsoft Research
We propose DiffRefine, a diffusion-based module that densifies sparse object points inside box proposals to improve second-stage refinement in 3D object detection under domain shift. Motivated by the observation that proposals often have good localization but low objectness, DiffRefine iteratively generates points on object surfaces to reinforce missing features that lead to false negatives. Our approach uses differentiable 3D generation on voxel grids and conditions on spatial context to prevent hallucinations. Experiments on KITTI, nuScenes, and Waymo demonstrate competitive improvements, especially for distant objects where sparsity is severe.
Implementation sketch: given proposal voxels, the diffusion model predicts offsets to the nearest occupied voxels and warps points progressively—providing a differentiable path for learning to densify object surfaces.
We fuse a spatial context feature from a BEV encoder with generated object points using a cross-attention style correlation. This steers densification toward plausible geometry and helps the second-stage refinement reject false positives.
@inproceedings{shin2025diffrefine, title = {DiffRefine: Diffusion-based Proposal Specific Point Cloud Densification for Cross-Domain Object Detection}, author = {Shin, Sangyun and He, Yuhang and Hou, Xinyu and Hodgson, Samuel and Markham, Andrew and Trigoni, Niki}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, year = {2025} }