CroCoDL: Cross-device Collaborative Dataset for Localization

Abstract

Accurate localization plays a pivotal role in the autonomy of systems operating in unfamiliar environments, particularly when interaction with humans is expected. High-accuracy visual localization systems encompass various components, such as feature extractors, matchers, and pose estimation methods. This complexity translates to the necessity of robust evaluation settings and pipelines. However, existing datasets and benchmarks primarily focus on single-agent scenarios, overlooking the critical issue of cross-device localization. Different agents with different sensors will show their own specific strengths and weaknesses, and the data they have available varies substantially.

This work addresses this gap by enhancing an existing augmented reality visual localization benchmark with data from legged robots, and evaluating human-robot, cross-device mapping and localization. Our contributions extend beyond device diversity and include high environment variability, spanning ten distinct locations ranging from disaster sites to art exhibitions. Each scene in our dataset features recordings from robot agents, hand-held and head-mounted devices, and high-accuracy ground truth LiDAR scanners, resulting in a comprehensive multi-agent dataset and benchmark. This work represents a significant advancement in the field of visual localization benchmarking, with key insights into the performance of cross-device localization methods across diverse settings.

Renderings vs. real CroCoDL data

Commonly used datasets for visual localization and SLAM

Dataset	Motion	Env.	Locations	Changes	Sensors	GT pose accuracy	Seqs.
KITTI	🚗	⬛	1	🏃🌦️	RGB, LiDAR, IMU	<10cm (RTKGPS)	22
TUM RGBD	✋🛼	⬜	2	🏃	RGB-D, IMU	1mm (mocap)	80
Malaga	🚗	⬛	1	🏃🌦️	RGB, IMU	(GPS)	15
EUROC	🛸	⬜	2	-	RGB, IMU	1mm (mocap)	11
NCLT	🛼	⬜⬛	1	🏃🪑🌒🌦️	RGB, LiDAR, IMU, GNSS	<10cm (GPS + IMU + LiDAR)	27
PennCOSYVIO	✋	⬜⬛	1	🏃🌦️	RGB, IMU	15cm (visual tags)	4
TUM VIO	✋	⬜⬛	4	-	RGB, IMU	1mm (mocap ends)	28
UZH-FPV	🛸	⬜⬛	2	-	RGB, event camera, IMU	~1cm (total station + VI-BA)	28
ETH3D SLAM	✋	⬜	1	-	RGB, depth, IMU	1mm (mocap)	96
Newer College	✋	⬜⬛	1	-	RGB, LiDAR, IMU	3cm (LiDAR ICP)	3
OpenLoris-Scene	🛼	⬜	5	🏃🪑	RGB-D, IMU, wheel odom.	<10cm (2D LiDAR)	22
TartanAir	syn.	⬜⬛	30	-	RGB	perfect (synthetic)	30
UMA VI	✋🚗	⬜⬛	2	-	RGB, IMU	(visual tags)	32
UrbanLoco	🚗	⬛	12	🏃🪑	RGB, LiDAR, IMU, GNSS, SPAN-CPT	12
Naver Labs	🛼	⬜	5	🏃 🪑	RGB, LiDAR, IMU	<10cm (LiDAR SLAM and SfM)	10
HILTI SLAM	✋	⬜⬛	8	-	RGB, LiDAR, IMU	<5mm (total station)	12
Graco	🛼🛸	⬛	1	🏃 🌦️	RGB, LiDAR, GPS, IMU	≈1cm (GNSS)	14
FusionPortable	2 ∈ ✋🦿🛼🚙	⬜⬛	9	-	RGB, event cameras, LiDAR, IMU, GPS	≈1cm (GNSS RTK)	41
LaMAR	✋🥽	⬜⬛	3	🏃🪑🌒🏗️🌦️	RGB, LiDAR, depth, IMU, WiFi/BT	<10cm (LiDAR + PGO + PGO-BA)	500
CroCoDL	✋🥽🦿[🛸]	⬜⬛	10	🏃🪑🌒🏗️🌦️	RGB, LiDAR, depth, IMU, WiFi/BT	~10cm (LiDAR + PGO + PGO-BA)	500 +800

Dataset

Motion

Env.

Locations

Changes

Sensors

GT pose accuracy

Seqs.

KITTI

🚗

⬛

🏃🌦️

RGB, LiDAR, IMU

<10cm (RTKGPS)

TUM RGBD

✋🛼

⬜

🏃

RGB-D, IMU

1mm (mocap)

Malaga

🚗

⬛

🏃🌦️

RGB, IMU

(GPS)

EUROC

🛸

⬜

RGB, IMU

1mm (mocap)

NCLT

🛼

⬜⬛

🏃🪑🌒🌦️

RGB, LiDAR, IMU, GNSS

<10cm (GPS + IMU + LiDAR)

PennCOSYVIO

✋

⬜⬛

🏃🌦️

RGB, IMU

15cm (visual tags)

TUM VIO

✋

⬜⬛

RGB, IMU

1mm (mocap ends)

UZH-FPV

🛸

⬜⬛

RGB, event camera, IMU

~1cm (total station + VI-BA)

ETH3D SLAM

✋

⬜

RGB, depth, IMU

1mm (mocap)

Newer College

✋

⬜⬛

RGB, LiDAR, IMU

3cm (LiDAR ICP)

OpenLoris-Scene

🛼

⬜

🏃🪑

RGB-D, IMU, wheel odom.

<10cm (2D LiDAR)

TartanAir

syn.

⬜⬛

RGB

perfect (synthetic)

UMA VI

✋🚗

⬜⬛

RGB, IMU

(visual tags)

UrbanLoco

🚗

⬛

🏃🪑

RGB, LiDAR, IMU, GNSS, SPAN-CPT

Naver Labs

🛼

⬜

🏃 🪑

RGB, LiDAR, IMU

<10cm (LiDAR SLAM
and SfM)

HILTI SLAM

✋

⬜⬛

RGB, LiDAR, IMU

<5mm (total station)

Graco

🛼🛸

⬛

🏃 🌦️

RGB, LiDAR, GPS, IMU

≈1cm (GNSS)

FusionPortable

2 ∈ ✋🦿🛼🚙

⬜⬛

RGB, event cameras, LiDAR, IMU, GPS

≈1cm (GNSS RTK)

LaMAR

✋🥽

⬜⬛

🏃🪑🌒🏗️🌦️

RGB, LiDAR, depth, IMU, WiFi/BT

<10cm (LiDAR + PGO + PGO-BA)

500

CroCoDL

✋🥽🦿[🛸]

⬜⬛

🏃🪑🌒🏗️🌦️

RGB, LiDAR, depth, IMU, WiFi/BT

~10cm (LiDAR + PGO + PGO-BA)

500 +800

Legend

Environment:⬜ inside, ⬛ outside;

Changes: 🏃 Structural changes due to moving people, 🪑 long-term changes due to displaced furniture, 🌦️ weather, 🌒 day-night, 🏗️ construction work;

Trajectory motion from sensors mounted on: 🛼 ground vehicle, 🦿 legged robot, 🛸 drone, 🚙 car, ✋ hand-held, 🥽 head-mounted, ‘syn.’ synthetic.

(noted with *: at most 2 devices are recorded in the same location; [🛸]: not aligned, due to safety / permission reasons - we could only capture drone footage in 8/10 locations)

BibTeX

@inproceedings{blum2025crocodl, author = {Blum, Hermann and Mercurio, Alessandro and O'Reilly, Josua and Engelbracht, Tim and Dusmanu, Mihai and Pollefeys, Marc and Bauer, Zuria}, title = {CroCoDL: Cross-device Collaborative Dataset for Localization}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, year = {2025}, }