About Me

I am currently a postdoctoral researcher at Cyber Physical Systems (CPS) group at University of Oxford under supervisions of Profs. Niki Trigoni and Andrew Markham. My research is centered on multi-modal sensing for localization and cross-domain generalization, especially for building robust perception in real-world settings.

I obtained my D.Phil. (PhD) in Computer Science at the University of Oxford, co-supervised by Profs. Niki Trigoni and Andrew Markham in CPS group. My study was generously supported by Global Korea Scholarship (GKS) Program and ACE-OPS grant. Prior to my D.Phil., I worked as a research assosiate supervised by Prof. Yong-Guk Kim at Sejong University, South Korea, where I completed my master's and undergraduate courses in Computer Science.

Research Interests

My research aims to advance multi-modal and multi-view sensing with diverse physical sensors, including RGB-D cameras, LiDAR, mmWave radar, event cameras, etc. A key challenge in real-world settings is achieving reliable long-term localization of these sensors in a shared reference space, enabling accurate multi-view fusion while leveraging the unique capabilities of each modality.

Beyond localization, I am interested in how foundation models can develop reasoning abilities across modalities—understanding the complementary strengths of each sensor type and dynamically selecting and combining the most informative modalities for a given task. This supports robust, context-aware performance on downstream applications, including detection, tracking, re-identification, anomaly detection, etc.

Ultimately, my work seeks to bridge physical sensing with adaptive multi-modal learning, building perception systems that are accurate, resilient, and capable of operating reliably in complex real-world environments.

Research Platform

We have built a multi-modal sensing platform that integrates diverse physical sensors—including RGB-D cameras, LiDAR, mmWave radar, and event cameras—into a unified system. The goal is to enable robust, long-term perception in complex real-world environments by leveraging the complementary strengths of each modality.

Multi-modal sensor platform concept
Frankenstein: A multi-modal sensor platform integrating RGB-D, LiDAR, mmWave radar, and event cameras.
Example sensor data from the multi-modal platform
Example data captured simultaneously across multiple sensor modalities.

By co-registering and fusing data from these heterogeneous sensors, the platform supports research in cross-domain generalization, multi-view localization, and adaptive modality fusion for various essential downstream tasks in ML.

To evaluate models for multi-modal research, we have recorded longterm large-scale multi-view, multi-modal datasets from wide-range of domains such as campus, industrial, and wildlife environments, capturing diverse real-world conditions that unimodal peception cannot address.

Multi-view, multi-modal dataset captured in an industrial environment from 4 Frankenstein units. The BEV map shows one unified frame across modalities
Multi-modal dataset captured in a wildlife (Rhino) environment.

Ongoing Research

Multi-Modal Localization with Thermal, RGB, and mmWave Radar

Leveraging our multi-modal sensing platform (Frankenstein), this project explores robust localization by fusing thermal imaging, RGB cameras, and mmWave radar. By combining these complementary modalities, the system achieves reliable pose estimation across challenging conditions such as poor lighting, adverse weather, and visually degraded environments.

Extended Perception via Multi-Modal Fusion

In real-world, there are many circumstances where models trained on unimodal can fail significantly. For example, rain and water droplets on camera lenses severely degrade RGB-based perception, causing even SOTA monocular depth estimation algorithms to fail. In safety-critical applications, this can create critical problems. This project investigates how complementary modalities can compensate for vision degradation and overcome the unimodal limitations, enabling reliable perception under all weather conditions.

RGB image with water droplet artifacts from rain
RGB image corrupted by water droplets on the lens due to rain.
Failed depth estimation from corrupted RGB input
SOTA depth estimation fails under rain-induced lens corruption.

Unimodal Enhancement

While RGB-based perception has been extensively studied, there are many other modalities that can address the limitation of RGB. For example, mmWave radar and event cameras remain less explored despite offering essential complementary information. Radar provides robust sensing in adverse weather and lighting conditions, while event cameras capture high-temporal-resolution motion with minimal latency. Enhancing the individual capabilities of these underexplored modalities can extend the perception capability of all downstream intelligent systems.

Radar-based perception for robust sensing.
Event camera data visualization
Event camera capturing high-temporal-resolution hand motion data.

News

  • [Mar. 2026] Our paper WildDepth on a large-scale dataset with calibrated multi-modal sensors for 3D wildlife perception and depth estimation is now available on arXiv!
  • [Nov. 2025] Our survey paper Revisiting U-Net on U-Net as a foundational backbone for modern generative AI got published in Artificial Intelligence Review (Springer)!
  • [Nov. 2025] Our paper Thermal-to-RGB on enhancing low-resolution thermal imagery with diffusion models for wildlife monitoring got published at ACM International Workshop on Thermal Sensing and Computing 2025!
  • [Jun. 2025] Our paper DiffRefine on generative cross-domain detection in 3D got accepted into ICCV 2025 for spotlight presentation!
  • [Mar. 2025] Our paper WildPose on multi-modal sensing dataset for deformable animals got accepted into Journal of Experimental Biology and selected as the cover of the issue!
  • [Jan. 2025] Completed my thesis correction and obtained D.Phil. (PhD) status!
  • [Jan. 2025] Our paper SoundLoc3D on sound source localization got accepted into WACV 2025 for oral presentation!
  • [Oct. 2024] Started my role as a postdoctoral researcher in CPS group.
  • [Oct. 2024] Successfully defended my D.Phil viva! (Internal Examiner: Prof. Christian Rupprecht at VGG, External Examiner: Prof. Dimitrios Kanoulas at UCL).
  • [Sep. 2024] Our paper GroupExp-DA on Domain-Adaptive 3D Detection got accepted into NeurIPS 2024!
  • [May. 2024] Our paper on stereo depth estimation with visual foundation models got accepted into ICRA 2024!
  • [Mar. 2024] Our paper Spherical Mask on 3D instance segmentation got accepted into CVPR 2024!
  • [Feb. 2024] Defended my Confirmation viva (Examiners: Profs. Ronald Clark and Alessandro Abate).
  • [Jan. 2024] Our paper Sound3DVDet on sound source localization got accepted into WACV 2024!
  • [Jul. 2023] Our paper on view synthesis with NeRF got accepted into NeurIPS 2023!
  • [Jan. 2023] Our paper Sample, Crop, Track on self-supervised 3D object detection got accepted into ICRA 2023!
  • [Aug. 2022] Our paper on monocular Night-time Depth Estimation got accepted into CORL 2023!
  • [Mar. 2022] Our paper on monocular SLAM on UAV got accepted into IROS 2022!
  • [Jan. 2022] Defended my Transfer of Status viva (Examiners: Profs. Alex Rogers and Alessandro Abate).
  • [Oct. 2020] Started my D.Phil. (PhD) at the University of Oxford.
  • [Aug. 2020] Concluded my research at Sejong University, South Korea. Plase see my exciting researches at Sejong on reinforcement learning and sensing with videos here!

Selected Publications

Thumbnail for WildDepth paper
WildDepth: A Multimodal Dataset for 3D Wildlife Perception and Depth Estimation
Muhammad Aamir*, Naoya Muramatsu*, Sangyun Shin*, Matthew Wijers, Jiaxing Jhong, Xinyu Hou, Amir Patel, Andrew Markham (*=Equal Contribution)
arXiv arXiv Preprint, 2026
Thumbnail for DiffRefine paper
DiffRefine: Diffusion-based Proposal Specific Point Cloud Densification for Cross-Domain Object Detection
Sangyun Shin, Yuhang He, Xinyu Hou, Samuel Hodgson, Andrew Markham, Niki Trigoni
ICCV International Conference on Computer Vision (ICCV), 2025 (Spotlight)
Thumbnail for WildPose paper
WildPose: a long-range 3D wildlife motion capture system
Naoya Muramatsu, Sangyun Shin, Qianyi Deng, Andrew Markham, Amir Patel
Journal Journal of Experimental Biology, Volume 228, Issue 5 (Cover of the Issue)
Thumbnail for SoundLoc3D paper
SoundLoc3D: Invisible 3D Sound Source Localization and Classification Using a Multimodal RGB-D Acoustic Camera
Yuhang He, Sangyun Shin, Anoop Cherian, Niki Trigoni, Andrew Markham
WACV Winter Conference on Applications of Computer Vision (WACV), 2025 (Oral)
Thumbnail for GroupExp-DA paper
Towards Learning Group-Equivariant Features for Domain Adaptive 3D Detection
Sangyun Shin, Yuhang He, Madhu Vankadari, Ta-Ying Cheng, Qian Xie, Andrew Markham, Niki Trigoni
NeurIPS Neural Information Processing Systems (NeurIPS), 2024
Thumbnail for Spherical Mask paper
Spherical Mask: Coarse-to-Fine 3D Point Cloud Instance Segmentation with Spherical Representation
Sangyun Shin, Kaichen Zhou, Madhu Vankadari, Andrew Markham, Niki Trigoni
CVPR Computer Vision and Pattern Recognition (CVPR), 2024
Thumbnail for Sound3DVDet paper
Sound3DVDet: 3D Sound Source Detection using Multiview Microphone Array and RGB Images
Yuhang He*, Sangyun Shin*, Anoop Cherian, Niki Trigoni, Andrew Markham (*=Equal Contribution)
WACV Winter Conference on Applications of Computer Vision (WACV), 2025
Thumbnail for Sample Crop Track paper
Sample, crop, track: Self-supervised mobile 3d object detection for urban driving lidar
Sangyun Shin, Stuart Golodetz, Madhu Vankadari, Kaichen Zhou, Andrew Markham, Niki Trigoni
ICRA International Conference on Robotics and Automation (ICRA), 2023

Contact

Email: sangyun.shin@cs.ox.ac.uk

I'm open to opportunities, collaborations and research discussions. Feel free to reach out!