👋 Welcome to Track #2: Social Navigation of the 2025 RoboSense Challenge!
This track challenges participants to develop advanced RGBD-based perception and navigation systems that empower autonomous agents to interact safely, efficiently, and socially in dynamic human environments.
Participants are expected to design algorithms that interpret human behaviors and contextual cues to generate navigation strategies that strike a balance between navigation efficiency and social compliance. Submissions must address key challenges such as real-time adaptability, occlusion handling, and ethical decision-making in socially complex settings.
🏆 Prize Pool: $2,000 USD (1st: $1,000, 2nd: $600, 3rd: $400) + Innovation Awards
This track evaluates an agent's ability to perform socially compliant navigation in dynamic indoor environments populated with realistic human agents. Participants must design navigation policies based solely on RGBD observations and odometry, without access to global maps or privileged information.
Ultimately, the aim is to develop socially-aware agents that can navigate safely, efficiently, and naturally in environments shared with humans.
All times mentioned are in the Anywhere on Earth (AoE) timezone (UTC-12).
🎯 Public train set validation (~10 episodes)
Duration: June 15th, 2025 (anytime on earth) - September 15th, 2025 (anytime on earth)
Participants are expected to:
Each team is allowed a maximum of 3 submissions per day in this phase. Please use them judiciously.
🎯 Public test set evaluation (~1000 episodes)
Duration: June 15th, 2025 (anytime on earth) - August 15th, 2025 (anytime on earth)
Participants are expected to:
Each team is allowed 1 submission per day, with a total limit of 10 submissions in this phase.
🎯 Private test set evaluation (~500 episodes)
Duration: August 15th, 2025 (anytime on earth) - September 15th, 2025 (anytime on earth)
Important Note:
Teams that were unable to submit evaluations during Phase #1 are welcome to continue participating in Phase #2. The final ranking will be determined solely by the results from Phase #2.
Participants are expected to:
Each team is allowed 2 submission per day, with a total limit of 20 submissions in this phase.
The track uses the RoboSense Track 2 Social Navigation Dataset, based on the Social-HM3D benchmark.
Our benchmark datasets are designed to reflect realistic, diverse, and socially complex navigation environments:
Dataset | Num. of Scenes | Scene Types | Human Num. | Natural Motion |
---|---|---|---|---|
Social-HM3D | 844 | Residence, Office, Shop, etc. | 0-6 | ✔️ |
We showcase some classic encounter episodes in the benchmark. These require the robot to proactively avoid the human using socially-aware behaviors.
Our benchmark focuses on two key aspects: task completion and social compliance.
Metric | Description |
---|---|
SR (Success Rate) | Fraction of episodes where the robot successfully reaches the goal. |
SPL (Success weighted by Path Length) | Penalizes inefficient navigation. Rewards shorter, successful paths. |
PSC (Personal Space Compliance) | Measures how well the robot avoids violating human personal space. A higher PSC indicates better social behavior. The threshold is set to 1.0m, considering a 0.3m human radius and 0.25m robot radius. |
H-Coll (Human Collision Rate) | The proportion of episodes involving any human collision. Collisions imply task failure. |
🎯 Total Score | Weighted combination of the core metrics: Total = 0.4 x SR + 0.3 x SPL + 0.3 x PSC. This combined score reflects the overall navigation quality while implicitly penalizing human collisions. |
The evaluation metrics and scoring formula remain the same for both Phase #1 and Phase #2.
The final rankings will be determined by the results from Phase #2 only. Rankings are based on the Total Score, with ties broken by the higher Success Rate (SR).
The baseline model is built upon the Falcon framework, which integrates the following core components:
Falcon serves as a strong and socially intelligent baseline for this challenge, effectively combining auxiliary learning and future-aware prediction to navigate in complex human environments.
The Falcon baseline achieves the following performance on Phase #1 using Social-HM3D (~1,000 test episodes):
Dataset | Success ↑ | SPL ↑ | PSC ↑ | H-Coll ↓ |
---|---|---|---|---|
Social-HM3D | 55.15 | 55.15 | 89.56 | 42.96 |
Note: These results represent the baseline performance on the 24GB GPU version used for Phase #1 evaluation. Participants are encouraged to develop novel approaches to surpass the results.
We provide the following resources to support the development of models in this track:
Resource | Link | Description |
---|---|---|
GitHub Repository | https://github.com/robosense2025/track2 | Baseline code and setup instructions |
Dataset | HuggingFace Dataset | Dataset with training and test splits |
Baseline Model | Pre-Trained Model | Weights of the baseline model |
Registration | Google Form (Closed on August 15th) | Team registration for the challenge |
Evaluation Server | EvalAI Platform | Online evaluation platform |
Here, we provide a list of Frequently Asked Questions (FAQs) below for better clarity. If you have additional questions on the details about this competition, please reach out at robosense2025@gmail.com.
The most common reason is that your submission did not generate a valid result.json
file in the expected format. Please ensure your submission:
output/result.json
SR
, SPL
, PSC
, H-Coll
Example of correct format:
If your submission requires additional dependencies:
run.sh
script using conda/pipExample run.sh
with dependency installation:
To simplify evaluation, we use a fixed dataset mount point inside the evaluation Docker container:
Important: Your code should always read data from this path, regardless of which phase is being evaluated. Our backend automatically mounts the appropriate dataset for each phase to this location.
All submissions are evaluated inside Docker images with the following versions:
docker pull zeyinggong/robosense_socialnav:v0.5
docker pull zeyinggong/robosense_socialnav:v0.7
Key Changes in v0.7:
The two new actions (4-move_backward, 5-pause) are optional. Teams can continue using only actions 0-3 from Phase 1. Therefore, the action space is fully backward-compatible with Phase 1, ensuring smooth transition for existing methods.
For Phase 2, we recommend using Docker image v0.7 for local testing:
You can manually execute your run.sh
inside the container to verify correctness.
Tip: Refer to the provided Baseline ZIP Submission Example (Updated) for reference.
Note: Phase 2 does not support action submissions since the test dataset is not publicly available.
If your submission remains pending for over 48 hours, please:
Yes, we encourage flexible approaches! You have significant freedom to modify the evaluation pipeline and import your own policy code, including:
Falcon/habitat-baselines/habitat_baselines/eval.py
(main evaluation script)Falcon/habitat-baselines/habitat_baselines/rl/ppo/falcon_evaluator.py
(evaluator implementation)You can import and integrate your custom modules, modify the inference pipeline, or adapt the evaluation logic to suit your approach.
⚠️ However, the following restrictions must be respected:
falcon_evaluator.py
, including accessing extra environmental information or sensors beyond the allowed observation keys, or any other validation checksThese restrictions ensure fair competition while maintaining the scientific integrity of the challenge.
If your question isn't answered here, please reach out to us:
@article{gong2024cognition, title = {From Cognition to Precognition: A Future-Aware Framework for Social Navigation}, author = {Gong, Zeying and Hu, Tianshuai and Qiu, Ronghe and Liang, Junwei}, journal = {arXiv preprint arXiv:2409.13244}, year = {2024} } @misc{robosense2025track2, title = {RoboSense Challenge 2025: Track 2 - Social Navigation}, author = {RoboSense Challenge 2025 Organizers}, year = {2025}, howpublished = {https://robosense2025.github.io/track2} }