Preprint

WildDepth: A Multimodal Dataset for
3D Wildlife Perception and Depth Estimation

Muhammad Aamir1*, Naoya Muramatsu2*, Sangyun Shin1*, Matthew Wijers1
Jiaxing Jhong1, Xinyu Hou1, Amir Patel3, Andrew Markham1
1University of Oxford   2University of Cape Town   3University College London
* Equal contribution
{muhammad.aamir, sangyun.shin, jiaxing.jhong, xinyu.hou, andrew.markham}@cs.ox.ac.uk
matthew.wijers@biology.ox.ac.uk, mrmnao001@myuct.ac.za, amir.patel@ucl.ac.uk

Paper PDF Data Viewer GitHub BibTeX

Abstract

Depth estimation and 3D reconstruction have been extensively studied as core topics in computer vision. Starting from rigid objects with relatively simple geometric shapes, such as vehicles, the research has expanded to address general objects, including challenging deformable objects, such as humans and animals. In particular, for the animal, however, the majority of the existing models are trained based on datasets without metric scale, which can help validate image-only models. To address this limitation, we present WildDepth, a multimodal dataset and benchmark suite for depth estimation, behavior detection, and 3D reconstruction from diverse categories of animals ranging from domestic to wild environments with synchronized RGB and LiDAR. Experimental results show that the use of multi-modal data improves depth reliability by up to 10% RMSE, while RGB–LiDAR fusion enhances 3D reconstruction fidelity by 12% in Chamfer distance. By releasing WildDepth and its benchmarks, we aim to foster robust multimodal perception systems that generalize across domains.

Dataset at a Glance

Kgalagadi

South Africa
  • Apex predators & large herbivores
  • Lions, cheetahs, leopards, springbok, gemsbok
  • 13-day field survey
  • RGB + LiDAR (zoom-lens setup)

Bubye Valley

Zimbabwe
  • 6 species incl. giraffes, zebras, warthogs
  • 84,700 synchronized frames
  • 4K RGB + Livox Mid-70 LiDAR
  • Private wildlife conservancy

Longleat Safari Park

United Kingdom
  • 10+ species in controlled environment
  • 42,500 synchronized frames
  • RGB + Livox Avia LiDAR
  • Diverse body morphologies

Visual Showcase

RGB and LiDAR colored point cloud overlay
Synchronized RGB imagery with LiDAR colored point cloud overlay.
Monocular depth estimation compared to LiDAR ground truth
Monocular depth estimation: RGB input, predicted depth map, and LiDAR ground truth.
3D reconstruction results
3D reconstruction results from multimodal inputs.
Wildlife sample frame
Sample detection frame from field data.
Wildlife sample frame
Sample behavior detection frame.

Explore the Data

Browse synchronized RGB and LiDAR recordings, 3D point clouds, and depth maps interactively in our live data viewer.

Open Data Viewer

Citation

@inproceedings{wilddepth2025,
  title     = {WildDepth: A Multimodal Dataset for 3D Wildlife Perception and Depth Estimation},
  author    = {Aamir, Muhammad and Muramatsu, Naoya and Shin, Sangyun and Wijers, Matthew and Jhong, Jiaxing and Hou, Xinyu and Patel, Amir and Markham, Andrew},
  year      = {2025},
  note      = {Preprint}
}