WildDepth: A Multimodal Dataset for 3D Wildlife Perception and Depth Estimation

Abstract

Depth estimation and 3D reconstruction have been extensively studied as core topics in computer vision. Starting from rigid objects with relatively simple geometric shapes, such as vehicles, the research has expanded to address general objects, including challenging deformable objects, such as humans and animals. In particular, for the animal, however, the majority of the existing models are trained based on datasets without metric scale, which can help validate image-only models. To address this limitation, we present WildDepth, a multimodal dataset and benchmark suite for depth estimation, behavior detection, and 3D reconstruction from diverse categories of animals ranging from domestic to wild environments with synchronized RGB and LiDAR. Experimental results show that the use of multi-modal data improves depth reliability by up to 10% RMSE, while RGB–LiDAR fusion enhances 3D reconstruction fidelity by 12% in Chamfer distance. By releasing WildDepth and its benchmarks, we aim to foster robust multimodal perception systems that generalize across domains.

Dataset at a Glance

Kgalagadi

South Africa

Apex predators & large herbivores
Lions, cheetahs, leopards, springbok, gemsbok
13-day field survey
RGB + LiDAR (zoom-lens setup)

Bubye Valley

Zimbabwe

6 species incl. giraffes, zebras, warthogs
84,700 synchronized frames
4K RGB + Livox Mid-70 LiDAR
Private wildlife conservancy

Longleat Safari Park

United Kingdom

10+ species in controlled environment
42,500 synchronized frames
RGB + Livox Avia LiDAR
Diverse body morphologies

Visual Showcase

RGB and LiDAR colored point cloud overlay

Synchronized RGB imagery with LiDAR colored point cloud overlay.

Monocular depth estimation compared to LiDAR ground truth

Monocular depth estimation: RGB input, predicted depth map, and LiDAR ground truth.

3D reconstruction results from multimodal inputs.

Sample detection frame from field data.

Sample behavior detection frame.

Video Gallery

Each video shows a 3-panel view: RGB input, LiDAR point cloud overlay, and depth map / colored point cloud. Select a snippet below to preview recordings from the Longleat Safari Park subset.

Pre-release version. More data is on the way with further refinement and improved calibrations.

Select a snippet above to play its video

Download Full Dataset

Download Dataset

Longleat v1, UK

640×480, 68 snippets

RGB + LiDAR + depth maps
Per-frame DA3 calibration
3-panel videos

Download (42 GB)

Longleat v2, UK — Part 1

1080p rectified, 22 snippets

Rectified RGB (JPEG) + LiDAR depth maps
July 07 (all) + July 08 (Part A)
Rhino, Cheetah, Camel, Giraffe, Ankole Cow

Download (38 GB)

Longleat v2, UK — Part 2

1080p rectified, 12 snippets

Rectified RGB (JPEG) + LiDAR depth maps
July 08 (Part B)
Tiger, Lion, Lioness, Wolf, Giraffe

Download (21 GB)

Bubye Valley, Zimbabwe — Part 1

1080p rectified, 17 snippets

Rectified RGB (JPEG) + LiDAR depth maps
Bubye Valley Conservancy, Zimbabwe
Donkey, Giraffe, Zebra, Goat

Download (46 GB)

Bubye Valley, Zimbabwe — Part 2

1080p rectified, 20 snippets

Rectified RGB (JPEG) + LiDAR depth maps
Bubye Valley Conservancy, Zimbabwe
Goat, Warthog, Zebra, Plover Bird

Download (38 GB)

Citation

@article{aamir2026wilddepth,
  title     = {WildDepth: A Multimodal Dataset for 3D Wildlife Perception and Depth Estimation},
  author    = {Aamir, Muhammad and Muramatsu, Naoya and Shin, Sangyun and Wijers, Matthew and Zhong, Jia-Xing and Hou, Xinyu and Patel, Amir and Loveridge, Andrew and Markham, Andrew},
  journal   = {arXiv preprint arXiv:2603.16816},
  year      = {2026},
  doi       = {10.48550/arXiv.2603.16816}
}