← Back to Tracks

RoboTwin Track Guide RoboTwin 赛道指南

Evaluate Your Policy or VLA on RoboTwin Benchmark. 评测你的 policy 或 VLA 在 RoboTwin 基准测试上的性能。

Overview 总体介绍

This track hosts a competition to evaluate the capability of your policy or VLA on the RoboTwin Benchmark. Participants train a model on the provided training set in RoboTwin2.0 Huggingface platform, generate videos when evaluating, and submit generated videos to Huggingface. We evaluate submissions and publish scores and rankings on the leaderboard. Note that the scores obtained in simulation may not be the final scores.

本赛道核心目标是评测你的policy或VLA在RoboTwin基准测试上的性能”。参赛队伍需要在RoboTwin2.0基准和huggingface平台给定的数据集上上训练policy或VLA,并在评估时生成视频,并将生成结果提交至Huggingface。我们会在本地完成评测并更新leaderboard的分数与排名,需要注意的是,仿真中得到的评测分数并不完全会是最后的评测分数。

Step-by-step Guidance 参赛步骤

Step 1. Get the dataset 步骤 1:获取数据

Dataset: 数据集链接: open-gigaai/CVPR2026_RoboTwin_Track

Training Set 训练数据
  • 10 robot manipulation tasks in total.
  • Each task is generated by the RoboTwin2.0 simulator and collected 50 clean demos and 500 cluttered demos.
  • 共 10 种机器人操作任务。
  • 每种任务是由 RoboTwin2.0 仿真器生成,并收集了 50 条干净环境的数据和 500 条杂乱环境的数据。
Test Bench 测试基准
  • The test bench is the RoboTwin2.0 simulator. You can follow the instructions in the official website to establish your own RoboTwin Environment for both training and evaluating.
  • 测试基准是 RoboTwin2.0 仿真器。你可以参考 官方网站 的安装指南,建立自己的 RoboTwin 环境,用于训练和评估。
The Task We Will Evaluate 我们将评估的任务
  • Among the 50 tasks we curate 10 tasks for evaluation. We will evaluate the performance of your policy or VLA on the following 10 tasks: hugging_mug, move_stapler_pad, place_fan, handover_mic, open_microwave, place_can_basket, place_dual_shoes, stack_blocks_three, move_can_pot, blocks_ranking_rgb, block_ranking_size, so you can choose to download your dataset selectively, and the robot body is unified as the agilex-aloha robot body, so pay attention to filtering when downloading.
  • 我们在收集的 50 个任务中选择了 10 个任务进行评估。我们将评估你的 policy 或 VLA 在以下 10 个任务上的性能:hugging_mug, move_stapler_pad, place_fan, handover_mic, open_microwave, place_can_basket, place_dual_shoes, stack_blocks_three, move_can_pot, blocks_ranking_rgb, block_ranking_size,所以在下载的时候也可以选择性下载你的数据集,另外我们的本体统一使用agilex-aloha机器人本体,所以注意下载时进行筛选。

Step 2. Training & evaluation 步骤 2:训练与评测

You can follow the documentation for your policy training, but you should be aware that your policy should not surpass 2B, or the results will not be valid.

你可以根据官方文档来训练你的policy,但是你必须要注意,policy的权重大小不能超过2B,否则结果无效!

Evaluation Metrics 评估指标
Progress-Aware Success Rate 基于任务完成进度的成功率

We will compute progress-aware success rate for long-horizon tasks(e.g. stack_blocks_three) and compute direct success rate for straight-forward tasks(e.g. place fan)

我们会基于完成的进度计算长程任务的成功率(比如基于任务中间的成功节点进行分段式给分),并且直接计算简单任务的成功率。

Step 3. Submission & leaderboard 步骤 3:提交与榜单

Submission 提交 Leaderboard 榜单

Submit your generated videos for the test set via Hugging Face. We will evaluate submissions and update the leaderboard in three rounds.

参赛者仅通过 Hugging Face 提交 test set 的生成视频。我们会组织评测,并在三个 round 中更新 leaderboard。

Policy 规则说明
  • To reduce the risk of test-set hacking/overfitting, we provide 3 evaluation rounds.
  • Only your best score across all rounds will be kept as the final score.
  • 为降低 test set 被 hack/过拟合的风险,我们提供 3 次评测机会(3 个 round)。
  • 最终成绩仅保留你在所有 round 中的最高分。
Leaderboard 榜单
Hosted on Hugging Face • open-gigaai 托管于 Hugging Face • open-gigaai
3 rounds 3 轮
  • Submit videos and track ranking updates. 提交视频并跟踪榜单更新。
  • Final score keeps your best round. 最终成绩取三轮最高分。
Open leaderboard 打开榜单
Schedule & countdown (UTC) 时间安排与倒计时(UTC)
UTC UTC

All timestamps below are in UTC. 以下时间均为 UTC(全球统一时区)。

Round 1 update 第一次更新
Includes submissions before Mar 31, 11:00 UTC. 更新 3 月 31 日 11:00(UTC)之前提交的结果。
Round 2 update 第二次更新
Includes submissions from Mar 31, 11:00 to Apr 7, 11:00 UTC. 更新 3 月 31 日 11:00 ~ 4 月 7 日 11:00(UTC)提交的结果。
Round 3 update 第三次更新
Includes submissions from Apr 7, 11:00 to Apr 14, 11:00 UTC. 更新 4 月 7 日 11:00 ~ 4 月 14 日 11:00(UTC)提交的结果。