Sangyun Shin

About Me

I am working towards AI that unifies all physical sensors into one body of perception — pushing machine understanding beyond the boundaries of human perception.

I am currently a postdoctoral researcher at the Cyber Physical Systems (CPS) group at University of Oxford under the supervision of Profs. Niki Trigoni and Andrew Markham. My research builds AI systems that unify diverse physical sensors—RGB, LiDAR, radar, thermal, and event cameras—into robust perception systems that extend how we understand the physical world.

I obtained my D.Phil. (PhD) in Computer Science at the University of Oxford, co-supervised by Profs. Niki Trigoni and Andrew Markham in the CPS group. My study was generously supported by the Global Korea Scholarship (GKS) Program and ACE-OPS grant. Prior to my D.Phil., I worked as a research associate supervised by Prof. Yong-Guk Kim at Sejong University, South Korea, where I completed my master's and undergraduate courses in Computer Science.

Research Interests

The physical world is rich with information invisible to human senses — heat, radar reflections, microsecond dynamics. My research advances multi-modal and multi-view sensing with diverse physical sensors, including RGB-D cameras, LiDAR, mmWave radar, thermal cameras, and event cameras. A key challenge in real-world settings is achieving reliable long-term localization of these sensors in a shared reference space, enabling accurate multi-view fusion while leveraging the unique capabilities of each modality.

Beyond localization, I am interested in how foundation models can develop reasoning abilities across modalities—understanding the complementary strengths of each sensor type and dynamically selecting and combining the most informative modalities for a given task. This supports robust, context-aware performance on downstream applications, including detection, tracking, re-identification, anomaly detection, and more.

Ultimately, my work seeks to bridge physical sensing with adaptive multi-modal learning, building perception systems that are accurate, resilient, and capable of operating reliably in complex real-world environments.

Research Platform

We have designed and built Frankenstein, a multi-modal sensing platform that integrates diverse physical sensors—including RGB-D cameras, LiDAR, mmWave radar, and event cameras—into a unified system. The goal is to enable robust, long-term perception in complex real-world environments by leveraging the complementary strengths of each modality.

Multi-modal sensor platform concept — Frankenstein: A multi-modal sensor platform integrating RGB-D, LiDAR, mmWave radar, and event cameras.

Example sensor data from the multi-modal platform — Example data captured simultaneously across multiple sensor modalities.

By co-registering and fusing data from these heterogeneous sensors, the platform supports research in cross-domain generalization, multi-view localization, and adaptive modality fusion for various essential downstream tasks in ML.

Using four synchronised Frankenstein units, we have recorded long-term, large-scale multi-view, multi-modal datasets from a wide range of domains such as campus, industrial, and wildlife environments, capturing diverse real-world conditions that unimodal perception cannot address.

Multi-view, multi-modal dataset captured in an industrial environment from 4 Frankenstein units. The BEV map shows one unified frame across modalities.

Multi-modal dataset captured in a wildlife (Rhino) environment.

Ongoing Research

Multi-Modal Localization with Thermal, RGB, and mmWave Radar

Leveraging our multi-modal sensing platform (Frankenstein), this project explores robust localization by fusing thermal imaging, RGB cameras, and mmWave radar. By combining these complementary modalities, the system achieves reliable pose estimation across challenging conditions such as poor lighting, adverse weather, and visually degraded environments.

This work demonstrates that multi-modal fusion is essential for reliable localization — a core building block toward a foundation model that reasons across all sensor modalities.

Extended Perception via Multi-Modal Fusion

In the real world, there are many circumstances where unimodal models fail significantly. For example, rain and water droplets on camera lenses severely degrade RGB-based perception, causing even state-of-the-art monocular depth estimation algorithms to fail. In safety-critical applications, this creates dangerous blind spots. This project investigates how complementary modalities can compensate for vision degradation and overcome unimodal limitations, enabling reliable perception under all weather conditions.

This work shows why multi-modal perception is not optional but essential — a core motivation for building an engine where AI reasons about which sensors to trust and how to fuse them.

RGB image with water droplet artifacts from rain — RGB image corrupted by water droplets on the lens due to rain.

Failed depth estimation from corrupted RGB input — State-of-the-art depth estimation fails under rain-induced lens corruption.

Unimodal Enhancement

While RGB-based perception has been extensively studied, many other modalities that can address the limitations of RGB remain under-explored. For example, mmWave radar and event cameras offer essential complementary information. Radar provides robust sensing in adverse weather and lighting conditions, while event cameras capture high-temporal-resolution motion with minimal latency. Enhancing the individual capabilities of these under-explored modalities extends the perception capability of all downstream intelligent systems.

Advancing each modality individually is a prerequisite for effective fusion — the engine must understand what each sensor can uniquely contribute before it can reason about how to combine them.

Radar-based perception for robust sensing.

Event camera data visualization — Event camera capturing high-temporal-resolution hand motion data.

News

[Mar. 2026] Our paper WildDepth on a large-scale dataset with calibrated multi-modal sensors for 3D wildlife perception and depth estimation is now available on arXiv!
[Nov. 2025] Our survey paper Revisiting U-Net on U-Net as a foundational backbone for modern generative AI got published in Artificial Intelligence Review (Springer)!
[Nov. 2025] Our paper Thermal-to-RGB on enhancing low-resolution thermal imagery with diffusion models for wildlife monitoring got published at ACM International Workshop on Thermal Sensing and Computing 2025!
[Jun. 2025] Our paper DiffRefine on generative cross-domain detection in 3D got accepted into ICCV 2025 for spotlight presentation!
[Mar. 2025] Our paper WildPose on multi-modal sensing dataset for deformable animals got accepted into Journal of Experimental Biology and selected as the cover of the issue!
[Jan. 2025] Completed my thesis correction and obtained D.Phil. (PhD) status!
[Jan. 2025] Our paper SoundLoc3D on sound source localization got accepted into WACV 2025 for oral presentation!
[Oct. 2024] Started my role as a postdoctoral researcher in CPS group.
[Oct. 2024] Successfully defended my D.Phil viva! (Internal Examiner: Prof. Christian Rupprecht at VGG, External Examiner: Prof. Dimitrios Kanoulas at UCL).
[Sep. 2024] Our paper GroupExp-DA on Domain-Adaptive 3D Detection got accepted into NeurIPS 2024!
[May. 2024] Our paper on stereo depth estimation with visual foundation models got accepted into ICRA 2024!
[Mar. 2024] Our paper Spherical Mask on 3D instance segmentation got accepted into CVPR 2024!
[Feb. 2024] Defended my Confirmation viva (Examiners: Profs. Ronald Clark and Alessandro Abate).
[Jan. 2024] Our paper Sound3DVDet on sound source localization got accepted into WACV 2024!
[Jul. 2023] Our paper on view synthesis with NeRF got accepted into NeurIPS 2023!
[Jan. 2023] Our paper Sample, Crop, Track on self-supervised 3D object detection got accepted into ICRA 2023!
[Aug. 2022] Our paper on monocular Night-time Depth Estimation got accepted into CORL 2023!
[Mar. 2022] Our paper on monocular SLAM on UAV got accepted into IROS 2022!
[Jan. 2022] Defended my Transfer of Status viva (Examiners: Profs. Alex Rogers and Alessandro Abate).
[Oct. 2020] Started my D.Phil. (PhD) at the University of Oxford.
[Aug. 2020] Completed my master's research at Sejong University, South Korea, on vision-based autonomous drone navigation — our team won 1st place at the NeurIPS 2019 Autonomous Drone Racing Competition! See our paper and demo video.

Selected Publications

WildDepth: A Multimodal Dataset for 3D Wildlife Perception and Depth Estimation

Muhammad Aamir*, Naoya Muramatsu*, Sangyun Shin*, Matthew Wijers, Jiaxing Jhong, Xinyu Hou, Amir Patel, Andrew Markham (*=Equal Contribution)

arXiv arXiv Preprint, 2026

[Paper] [Project] [Gallery] [Dataset]

DiffRefine: Diffusion-based Proposal Specific Point Cloud Densification for Cross-Domain Object Detection

Sangyun Shin, Yuhang He, Xinyu Hou, Samuel Hodgson, Andrew Markham, Niki Trigoni

ICCV International Conference on Computer Vision (ICCV), 2025 (Spotlight)

[Paper] [Project] [Code (temporarily unavailable - license)]

WildPose: a long-range 3D wildlife motion capture system

Naoya Muramatsu, Sangyun Shin, Qianyi Deng, Andrew Markham, Amir Patel

Journal Journal of Experimental Biology, Volume 228, Issue 5 (Cover of the Issue)

[Paper] [Code] [Dataset]

SoundLoc3D: Invisible 3D Sound Source Localization and Classification Using a Multimodal RGB-D Acoustic Camera

Yuhang He, Sangyun Shin, Anoop Cherian, Niki Trigoni, Andrew Markham

WACV Winter Conference on Applications of Computer Vision (WACV), 2025 (Oral)

[Paper]

Towards Learning Group-Equivariant Features for Domain Adaptive 3D Detection

Sangyun Shin, Yuhang He, Madhu Vankadari, Ta-Ying Cheng, Qian Xie, Andrew Markham, Niki Trigoni

NeurIPS Neural Information Processing Systems (NeurIPS), 2024

[Paper] [Code]

Spherical Mask: Coarse-to-Fine 3D Point Cloud Instance Segmentation with Spherical Representation

Sangyun Shin, Kaichen Zhou, Madhu Vankadari, Andrew Markham, Niki Trigoni

CVPR Computer Vision and Pattern Recognition (CVPR), 2024

[Paper] [Code]

Sound3DVDet: 3D Sound Source Detection using Multiview Microphone Array and RGB Images

Yuhang He*, Sangyun Shin*, Anoop Cherian, Niki Trigoni, Andrew Markham (*=Equal Contribution)

WACV Winter Conference on Applications of Computer Vision (WACV), 2025

[Paper] [Code]

Sample, crop, track: Self-supervised mobile 3d object detection for urban driving lidar

Sangyun Shin, Stuart Golodetz, Madhu Vankadari, Kaichen Zhou, Andrew Markham, Niki Trigoni

ICRA International Conference on Robotics and Automation (ICRA), 2023

[Paper]

Contact

Email: sangyun.shin@cs.ox.ac.uk

I'm open to opportunities, collaborations and research discussions. Feel free to reach out!