The RoboSense Challenge 2025

Track #2: Social Navigation

Socially-Aware RGBD Navigation in Dynamic Human Environments

🚀 Submit on EvalAI

👋 Welcome to Track #2: Social Navigation of the 2025 RoboSense Challenge!

This track challenges participants to develop advanced RGBD-based perception and navigation systems that empower autonomous agents to interact safely, efficiently, and socially in dynamic human environments. Participants will design algorithms that interpret human behaviors and contextual cues to generate navigation strategies that strike a balance between navigation efficiency and social compliance. Submissions must address key challenges such as real-time adaptability, occlusion handling, and ethical decision-making in socially complex settings.

🏆 Prize Pool: $2,000 USD (1st: $1,000, 2nd: $600, 3rd: $400) + Innovation Awards

Track 2 Image


🎯 Objective

This track evaluates an agent's ability to perform socially compliant navigation in dynamic indoor environments populated with realistic human agents. Participants must design navigation policies based solely on RGBD observations and odometry, without access to global maps or privileged information.

  • Social Norm Compliance: Agents must maintain safe distances, avoid collisions, and demonstrate socially acceptable behaviors.
  • Realistic Benchmarking: Navigate in large-scale, photo-realistic indoor scenes with dynamic, collision-aware humans from the Social-HM3D and Social-MP3D datasets
  • Egocentric Perception: Agents operate from a first-person perspective, relying solely on their onboard sensors. This includes color sensing, depth information, and relative goal coordinates, simulating how a robot would perceive its surroundings.

Ultimately, the aim is to develop socially-aware agents that can navigate safely, efficiently, and naturally in environments shared with humans.


🗂️ Phases & Requirements

Minival Phase: Sanity Check

Duration: 15 June 2025 – 15 September 2025

This phase serves as a sanity check to ensure that remote evaluation results match those obtained locally. Participants can:

  • Verify that your code produces consistent results both locally and on the competition server
  • Test environment setup, input/output formats, and compatibility with the remote evaluator
  • Use this phase to debug and validate your inference pipeline

Each team is allowed a maximum of 3 submissions per day in this phase. Please use them judiciously.

Phase I: Public Evaluation

Duration: 15 June 2025 – 15 August 2025

This phase involves evaluation on a public test set of approximately 1,000 episode derived from the Social-HM3D validation set. Participants can:

  • Download the official dataset and baseline model as a starting point
  • Develop and test their methods using the Social-HM3D training set
  • Submit reproducible code through the official competition Docker image
  • Receive metric scores from remote evaluation and appear on the public leaderboard

Each team is allowed 1 submission per day, with a total limit of 10 submissions in this phase.

Phase II: Final Evaluation

Duration: 15 August 2025 – 15 September 2025

This final phase evaluates submissions on a private test set containing approximately 1,000 episode, equal in size to Phase I. Participants can:

  • Submit your final model and code for evaluation on the private test set
  • Include model weights and a complete, reproducible implementation
  • Provide a technical report detailing your methodology and innovations
  • Final rankings and awards will be determined based on Phase II results

Each team is allowed 1 submission per day, with a total limit of 10 submissions in this phase.


🗄️ Dataset

The track uses the RoboSense Track 2 Social Navigation Dataset, based on the Social-HM3D and Social-MP3D benchmark. Our benchmark datasets are designed to reflect realistic, diverse, and socially complex navigation environments:

  • Goal-driven Trajectories: Humans navigate with intent, avoiding random or repetitive paths.
  • Natural Behaviors: Movement includes walking, pausing, and realistic avoidance via ORCA.
  • Balanced Density: Human count is scaled to scene size, avoiding over- or under-crowding.
  • Diverse Environments: Includes 844 scenes for Social-HM3D and 72 scenes for Social-MP3D.
Dataset Num. of Scenes Scene Types Human Num. Natural Motion
Social-HM3D 844 Residence, Office, Shop, etc. 0-6 ✔️
Social-MP3D 72 Residence, Office, Gym, etc. 0-6 ✔️

🎬 Dataset Example

We showcase one classic encounter type — Frontal Approach — where the robot and human approach each other head-on. This requires the robot to proactively avoid the human using socially-aware behaviors.

Social-HM3D

Social-MP3D

For more classic encounter types (e.g., Intersection, Blind Corner, Person Following), visit our project website to explore more demo videos.


🛠️ Baseline Model

The baseline model is built upon the Falcon framework, which integrates the following core components:

  • Egocentric Policy: Uses only camera and point-goal inputs, with no access to maps or human positions.
  • Auxiliary Supervision: Trains with privileged cues and removed during evaluation.
  • Future Awareness: Learns from human future trajectories to avoid long-term collisions.
  • Robust Environment: Trained in realistic scenes with dynamic crowds for strong generalization.

Falcon serves as a strong and socially intelligent baseline for this challenge, effectively combining auxiliary learning and future-aware prediction to navigate in complex human environments.


📏 Evaluation Metrics

Our benchmark focuses on two key aspects: task completion and social compliance.

Metric Description
SR (Success Rate) Fraction of episodes where the robot successfully reaches the goal.
SPL (Success weighted by Path Length) Penalizes inefficient navigation. Rewards shorter, successful paths.
PSC (Personal Space Compliance) Measures how well the robot avoids violating human personal space. A higher PSC indicates better social behavior. The threshold is set to 1.0m, considering a 0.3m human radius and 0.25m robot radius.
H-Coll (Human Collision Rate) The proportion of episodes involving any human collision. Collisions imply task failure.
Total Score Weighted combination of the core metrics: Total = 0.4 × SR + 0.3 × SPL + 0.3 × PSC. This score reflects overall navigation quality while implicitly penalizing human collisions.

📊 Baseline Results

The Falcon baseline achieves the following performance on the Phase I evaluation set using the Social-HM3D dataset (∼1,000 test episodes):

Dataset SR ↑ SPL ↑ PSC ↑ H-Coll ↓ Total ↑
Social-HM3D 55.84 51.30 89.47 41.58 64.57

Note: These results are obtained under the official competition environment and may slightly differ from those reported in the paper. All values are presented as percentages (i.e., multiplied by 100).


🔗 Resources

We provide the following resources to support the development of models in this track:

Resource Link Description
GitHub Repository https://github.com/robosense2025/track2 Official baseline code and setup instructions
Dataset HuggingFace Dataset Complete dataset with training and test splits
Baseline Model Pre-trained Model Baseline model Falcon
Registration Google Form Team registration for the challenge
Evaluation Server EvalAI Platform Online evaluation platform

📧 Contact

For questions, technical support, or clarifications about Track 2, please contact:

📖 References

@article{gong2024cognition,
  title={From Cognition to Precognition: A Future-Aware Framework for Social Navigation},
  author={Gong, Zeying and Hu, Tianshuai and Qiu, Ronghe and Liang, Junwei},
  journal={arXiv preprint arXiv:2409.13244},
  year={2024}
}

@article{robosense2025track2,
  title     = {RoboSense Challenge 2025: Track 2 - Social Navigation},
  author    = {RoboSense Challenge Organizers},
  booktitle = {IROS 2025},
  year      = {2025},
  url       = {https://robosense2025.github.io/track2}
}