Overview 总体介绍
This track hosts a competition to evaluate the capability of your policy or VLA on the RoboTwin Benchmark. Participants train a model on the provided training set in RoboTwin2.0 Huggingface platform, generate videos when evaluating, and submit generated videos to Huggingface. We evaluate submissions and publish scores and rankings on the leaderboard. Note that the scores obtained in simulation may not be the final scores.
本赛道核心目标是评测你的policy或VLA在RoboTwin基准测试上的性能”。参赛队伍需要在RoboTwin2.0基准和huggingface平台给定的数据集上上训练policy或VLA,并在评估时生成视频,并将生成结果提交至Huggingface。我们会在本地完成评测并更新leaderboard的分数与排名,需要注意的是,仿真中得到的评测分数并不完全会是最后的评测分数。
Step-by-step Guidance 参赛步骤
Step 1. Get the dataset 步骤 1:获取数据
Dataset: 数据集链接: open-gigaai/CVPR2026_RoboTwin_Track
- 10 robot manipulation tasks in total.
- Each task is generated by the RoboTwin2.0 simulator and collected 50 clean demos and 500 cluttered demos.
- 共 10 种机器人操作任务。
- 每种任务是由 RoboTwin2.0 仿真器生成,并收集了 50 条干净环境的数据和 500 条杂乱环境的数据。
- The test bench is the RoboTwin2.0 simulator. You can follow the instructions in the official website to establish your own RoboTwin Environment for both training and evaluating.
- 测试基准是 RoboTwin2.0 仿真器。你可以参考 官方网站 的安装指南,建立自己的 RoboTwin 环境,用于训练和评估。
- Among the 50 tasks we curate 10 tasks for evaluation. We will evaluate the performance of your policy or VLA on the following 10 tasks: hugging_mug, move_stapler_pad, place_fan, handover_mic, open_microwave, place_can_basket, place_dual_shoes, stack_blocks_three, move_can_pot, blocks_ranking_rgb, block_ranking_size, so you can choose to download your dataset selectively, and the robot body is unified as the agilex-aloha robot body, so pay attention to filtering when downloading.
- 我们在收集的 50 个任务中选择了 10 个任务进行评估。我们将评估你的 policy 或 VLA 在以下 10 个任务上的性能:hugging_mug, move_stapler_pad, place_fan, handover_mic, open_microwave, place_can_basket, place_dual_shoes, stack_blocks_three, move_can_pot, blocks_ranking_rgb, block_ranking_size,所以在下载的时候也可以选择性下载你的数据集,另外我们的本体统一使用agilex-aloha机器人本体,所以注意下载时进行筛选。
Step 2. Training & evaluation 步骤 2:训练与评测
You can follow the documentation for your policy training, but you should be aware that your policy should not surpass 2B, or the results will not be valid.
你可以根据官方文档来训练你的policy,但是你必须要注意,policy的权重大小不能超过2B,否则结果无效!
We will compute progress-aware success rate for long-horizon tasks(e.g. stack_blocks_three) and compute direct success rate for straight-forward tasks(e.g. place fan)
我们会基于完成的进度计算长程任务的成功率(比如基于任务中间的成功节点进行分段式给分),并且直接计算简单任务的成功率。
Step 3. Submission & leaderboard 步骤 3:提交与榜单
Submit your generated videos for the test set via Hugging Face. We will evaluate submissions and update the leaderboard in three rounds.
参赛者仅通过 Hugging Face 提交 test set 的生成视频。我们会组织评测,并在三个 round 中更新 leaderboard。
- To reduce the risk of test-set hacking/overfitting, we provide 3 evaluation rounds.
- Only your best score across all rounds will be kept as the final score.
- 为降低 test set 被 hack/过拟合的风险,我们提供 3 次评测机会(3 个 round)。
- 最终成绩仅保留你在所有 round 中的最高分。
- Submit videos and track ranking updates. 提交视频并跟踪榜单更新。
- Final score keeps your best round. 最终成绩取三轮最高分。
All timestamps below are in UTC. 以下时间均为 UTC(全球统一时区)。