👋 Welcome to Track #1: Driving with Language of the 2025 RoboSense Challenge!
In the era of autonomous driving, it is increasingly critical for intelligent agents to understand and act upon language-based instructions. Human drivers naturally interpret complex commands involving spatial, semantic, and temporal cues (e.g., "turn left after the red truck" or "stop at the next gas station on your right"). To enable such capabilities in autonomous systems, vision-language models (VLMs) must be able to perceive dynamic driving scenes, understand natural language commands, and make informed driving decisions accordingly.
🏆 Prize Pool: $2,000 USD (1st: $1,000, 2nd: $600, 3rd: $400) + Innovation Awards
This track evaluates the capability of VLMs to answer high-level driving questions in complex urban environments. Given question including perception, prediction, and planning, and a multi-view camera input, participants are expected to answer the question given the visual corrupted images.
Duration: 15 June 2025 – 15 August 2025
In this phase, participants are expected to answer the high-level driving questions given the clean images. Participants can: - Fine-tune the VLM on custom datasets (including nuScenes, DriveBench, etc.) - Develop and test their approaches - Submit results as a json file - Iterate and improve their models based on public leaderboard feedback
Ranking Metric: Weighted score combining:
Duration: 15 August 2025 – 15 October 2025
In this phase, participants are expected to answer the high-level driving questions given the corrupted images. Participants can: - Fine-tune the VLM on custom datasets (including nuScenes, DriveBench, etc.) - Develop and test their approaches - Submit results as a json file - Iterate and improve their models based on public leaderboard feedback
Ranking Metric: Weighted score combining:
In this track, we adopt Qwen2.5-VL7B as our baseline model. Beyond the provided baseline, participants are encouraged to explore alternative strategies to further boost performance:
In this track, we use ...
Metric | Metric 1 | Metric 2 | Metric 3 | Metric 4 |
---|---|---|---|---|
Baseline Model | x.xx | x.xx | x.xx | x.xx |
We provide the following resources to support the development of models in this track:
Resource | Link | |
---|---|---|
GitHub | https://github.com/robosense2025/track1 | |
Dataset | TBD | |
Registration | Google Form | |
Evaluation Server | TBD |