🏃🏻‍♂️ Training and Evaluation#

This document guides you through:

Minimal Validation Training#

We provide several built-in policies such as GR00T-N1, GR00T-N1.5, Pi-0, DP-CLIP, and ACT-CLIP. To quickly verify your setup, you can train the DP-CLIP model on the genmanip-demo dataset (300 demonstrations of the instruction “Move the milk carton to the top of the ceramic bowl”). This requires 1 GPU with at least 24GB memory:

torchrun --nnodes 1 --nproc_per_node 1 \       # number of processes per node, e.g., 1
   scripts/train/train.py \
   --config run_configs/train/dp_clip_genmanip_v1.yaml # Config file that specifies which model to train on which dataset, along with hyperparameters

😄 When you run the script, it will prompt you to log in to Weights & Biases (WandB). This integration allows you to monitor your training process in real time via the WandB dashboard.

The script will also automatically download all required models and datasets from Hugging Face into the Hugging Face cache directory (by default located at ~/.cache/huggingface/). If you’re concerned about storage space or want to customize the cache location, you can set the cache directory using an environment variable:

export HF_HOME=your/custom/cache/path

💡 Note! The download process may take some time depending on your network speed—please be patient.

⚠️ Common Issues#

  1. Authentication Required: If you see an error related to missing access rights, make sure you’ve logged into Hugging Face CLI:

    huggingface-cli login
    
  2. 403 Forbidden: Gated Repository Access: If you encounter the following error:

    403 Forbidden: Please enable access to public gated repositories in your fine-grained token settings to view this repository.
    

    Then ensure that your Hugging Face access token has the correct fine-grained permissions enabled for accessing gated repositories. You can verify and adjust these in your Hugging Face account’s Access Tokens settings.

Large-Scale Finetuning#

Single Node (Multi-GPU)#

To finetune a built-in model such as Pi-0 on the GenManip dataset using 8 GPUs, you can use the following srun command:

srun --job-name=pi0_genmanip --gres=gpu:8 --ntasks-per-node=1 \
torchrun \
   --nnodes 1 \
   --nproc_per_node 8 \
   scripts/train/train.py \
   --config run_configs/train/pi0_genmanip_v1.yaml

Multi-Node Multi-GPU (Slurm)#

We also provide Slurm scripts for multi-node training.

Step 1: Create train_pi0_genmanip_slurm.sh:

#!/bin/bash
set -e

export PYTHONPATH="$(pwd):$PYTHONPATH"
source .venv/pi0/bin/activate

master_addr=$(scontrol show hostnames $SLURM_JOB_NODELIST | head -n 1)

export NCCL_SOCKET_IFNAME=$1
export NCCL_IB_HCA=$2

torchrun \
   --nnodes=$SLURM_NNODES \
   --nproc_per_node=8 \
   --node_rank=$SLURM_PROCID \
   --master_port=29500 --master_addr=$master_addr \
   scripts/train/train.py \
   --config run_configs/train/pi0_genmanip_v1.yaml

Step 2: Create multinode_submit.slurm:

#!/bin/bash
#SBATCH -N 2
#SBATCH --ntasks-per-node=1   # 1 task per node
#SBATCH --gpus-per-task=8     # 8 GPUs per task
srun bash train_pi0_genmanip_slurm.sh

Step 3: Start training:

sbatch multinode_submit.slurm

Customizing Training with Your Own YAML Config#

If you would like to train with your own choice of model and dataset, you can simply create a custom YAML configuration file and pass it to the --config argument in the training script.

For example, to train the pre-registered Pi-0 model on the GenManip dataset, a minimal YAML configuration might look like this:

model_type: pi0                        # Name of a pre-registered model
dataset_path: InternRobotics/InternData-GenmanipTest  # Can be a HuggingFace Hub ID or local path
data_config: genmanip_v1              # Pre-registered dataset configuration
base_model_path: lerobot/pi0          # (Optional) Overrides the model checkpoint path; will default to HF if omitted

💡 Notes:

  • model_type: Must match the name of a model that has already been registered within InternManip.

  • dataset_path: Can be a HuggingFace ID (e.g., InternRobotics/InternData-GenmanipTest) or a local directory where the dataset is downloaded.

  • data_config: Refers to a dataset configuration preset (e.g., for preprocessing or loading behavior), also pre-registered in the codebase.

  • base_model_path: This is optional. If the selected model_type is supported and known, InternManip will automatically resolve and download the correct checkpoint from HuggingFace. If you’ve already downloaded a model locally or want to use a custom one, you can specify the path here directly.

By editing or extending this YAML file, you can quickly try different models, datasets, or training setups — all without modifying the training script.

Available Models and Datasets#

When creating your own YAML config file for training or evaluation, you can directly refer to the following officially supported values:

  • Use values from the ${model_type} and ${base_model_path} columns below to populate the corresponding fields in your YAML.

  • Similarly, values from the ${data_config} and ${dataset_path} columns can be used to specify the dataset configuration and loading path.

The following are the supported models along with their HuggingFace IDs:

${model_type} ${base_model_path}
pi0 lerobot/pi0
pi0fast pi0fast_base
gr00t_n1 nvidia/GR00T-N1-2B
gr00t_n1_5 nvidia/GR00T-N1.5-3B
dp_clip None
act_clip None

Below are the datasets officially integrated into InternManip:

${data_config} ${dataset_path}
genmanip_v1 InternRobotics/InternData-GenmanipTest
calvin_abc InternRobotics/InternData-Calvin_ABC
google_robot InternRobotics/InternData-fractal20220817_data
bridgedata_v2 InternRobotics/InternData-BridgeV2

Evaluation and Benchmarking (WIP)#

By default, the inference of model will be running in the main loop sharing the same process with the env. You can evaluate pi0 on the Genmanip benchmark in a single process using the following command:

python scripts/eval/start_evaluator.py \
   --config scripts/eval/config/pi0_on_genmanip.py

The terminal prints SR (Success Rate) information for each episode and task:

{
    "success_episodes": [
        {"task_name": "tasks/...", "episode_name": "010", "episode_sr": 1.0, ...}
    ],
    "failure_episodes": [],
    "success_rate": 1.0
}

You can view the images generated during evaluation in the eval_results directory.

You can modify the bash script according to your resource availability and requirements.

Available Benchmarks#

The following benchmarks are currently available for evaluation:

InternManip offers implementations of multiple manipulation policy models—GR00T-N1, GR00T-N1.5, Pi-0, DP-CLIP, and ACT-CLIP—as well as curated datasets including GenManip, Simpler-Env, and CALVIN, all organized in the standardized LeRobot format.

The available ${MODEL}, ${DATASET}, ${BENCHMARK} and their results are summarized in the following tables:

CALVIN (ABC-D) Benchmark#

Model

Dataset/Benchmark

Score (Main Metric)

Model Weights

gr00t_n1

calvin_abcd

gr00t_n1_5

calvin_abcd

pi0

calvin_abcd

dp_clip

calvin_abcd

act_clip

calvin_abcd

Simpler-Env Benchmark#

Model

Dataset/Benchmark

Success Rate

Model Weights

gr00t_n1

google_robot

gr00t_n1_5

google_robot

pi0

google_robot

dp_clip

google_robot

act_clip

google_robot

gr00t_n1

bridgedata_v2

gr00t_n1_5

bridgedata_v2

pi0

bridgedata_v2

dp_clip

bridgedata_v2

act_clip

bridgedata_v2

Genmanip Benchmark#

Model

Dataset/Benchmark

Success Rate

Model Weights

gr00t_n1

genmanip_v1

gr00t_n1_5

genmanip_v1

pi0

genmanip_v1

dp_clip

genmanip_v1

act_clip

genmanip_v1

What’s Next?#

Now that you’ve completed the training and evaluation process, you may want to incorporate your own dataset, model, or benchmark. To do so, please refer to the following guides:

Once you’ve set them up, you can follow the same command structures used above—just replace the relevant configuration entries (e.g., --config) with your custom definitions.