As computer vision systems increasingly transition into real-world applications, reliable and scalable localization across heterogeneous devices becomes critical. The CroCoDL workshop brings together researchers from computer vision, robotics, and augmented reality to address the unique challenges of cross-device, multi-agent localization in complex, real-world environments. With a focus on 3D vision, visual localization, egocentric and embodied AI, and AR/VR/MR, this workshop aims to foster dialogue around bridging the gap between academic benchmarks and real-world deployment. This inaugural edition features leading experts from academia and industry and introduces CroCoDL, a new large-scale benchmark dataset capturing synchronized sensor data from smartphones, mixed-reality headsets, and legged robots across diverse environments. Through invited talks, a paper track, and an open competition, the workshop will highlight recent advances and open challenges in localization under domain shifts, sensor diversity, and dynamic scene conditions. By uniting communities working on structure-from-motion, neural rendering, and embodied AI, CroCoDL offers a platform to drive innovation toward robust, scalable localization systems capable of operating across devices, agents, and perspectives.
We invite 8-page full papers for inclusion in the proceedings, as well as 4-page extended abstracts. Extended abstracts may present either new or previously published work; however, they will not be included in the official proceedings.
Please note that 4-page extended abstracts generally do not conflict with the dual submission policies of other conferences. In contrast, 8-page full papers, if accepted, will appear in the proceedings and are therefore subject to the dual submission policy. This means they must not be under review or accepted at another conference at the same time.
All submissions must be anonymous and comply with the official ICCV 2025 guidelines.
The workshop challenge is centered around a newly accepted dataset at CVPR 2025 – CroCoDL: Cross-device Collaborative Dataset for Localization (pre-rebuttal version available at link). To advance research in visual co-localization, we introduce CroCoDL, a significantly larger andmore diverse dataset and benchmark, as shown in Figure 1. CroCoDL is the first dataset to incorporate sensor recordings from both robots and mixed-reality headsets and covering a wider range of real-world environments than any existing cross-device visual localization dataset. It includes synchronized sensor streams from three primary devices: hand-held smartphones, head-mounted HoloLens 2, and the legged robot Spot.
For this challenge, we have selected three large-scale locations—Hydrology, Succulent, and Design Museum—where we will release mapping and query splits. These splits will be used to evaluate visual localization performance in a crossdevice setup, meaning that the map is generated using data from one device, while the goal is to localize images taken by a different device within this map. The primary evaluation metric for submissions will be single-image localization recall at 50 cm and 10 degrees of pose error.
The competition will be split into two tracks:
To be eligible for a prize, participants must provide code that can reproduce their results and commit to releasing it as open-source. This code will be used to validate their submissions.