# 🏃🏻‍♂️ Training and Evaluation This document guides you through: - **[Minimal validation](#minimal-validation-training).** Verify that your environment and setup can successfully train a model on a small dataset. - **[Large-scale multi-node training](#large-scale-finetuning).** Learn how to finetune models on multiple GPUs and nodes. - **[Supported models and datasets](#available-models-and-datasets).** Get an overview of the built-in policies and datasets you can use. - **[Evaluate your trained models](#evaluation-and-benchmarking)** using **closed-loop benchmarking**, allowing you to measure the **success rate (SR)** on various tasks. - **[Extend the framework](#available-benchmarks)** by adding your own **custom benchmarks**. ## Minimal Validation Training We provide several built-in policies such as **GR00T-N1**, **GR00T-N1.5**, **Pi-0**, **DP-CLIP**, and **ACT-CLIP**. To quickly verify your setup, you can train the **DP-CLIP** model on the `genmanip-demo` dataset (300 demonstrations of the instruction *"Move the milk carton to the top of the ceramic bowl"*). This requires **1 GPU with at least 24GB memory**: ```bash torchrun --nnodes 1 --nproc_per_node 1 \ # number of processes per node, e.g., 1 scripts/train/train.py \ --config run_configs/train/dp_clip_genmanip_v1.yaml # Config file that specifies which model to train on which dataset, along with hyperparameters ``` > 😄 When you run the script, it will prompt you to log in to Weights & Biases (WandB). This integration allows you to monitor your training process in real time via the WandB dashboard. The script will also automatically download all required models and datasets from Hugging Face into the Hugging Face cache directory (by default located at `~/.cache/huggingface/`). If you're concerned about storage space or want to customize the cache location, you can set the cache directory using an environment variable: ```bash export HF_HOME=your/custom/cache/path ``` > 💡 Note! The download process may take some time depending on your network speed—please be patient. ### ⚠️ Common Issues 1. **Authentication Required:** If you see an error related to missing access rights, make sure you’ve logged into Hugging Face CLI: ```bash huggingface-cli login ``` 2. **403 Forbidden: Gated Repository Access:** If you encounter the following error: ```pgsql 403 Forbidden: Please enable access to public gated repositories in your fine-grained token settings to view this repository. ``` Then ensure that your Hugging Face access token has the correct fine-grained permissions enabled for accessing gated repositories. You can verify and adjust these in your Hugging Face account's [Access Tokens settings](https://huggingface.co/settings/tokens). ## Large-Scale Finetuning ### Single Node (Multi-GPU) To finetune a built-in model such as **Pi-0** on the **GenManip** dataset using **8 GPUs**, you can use the following srun command: ```bash srun --job-name=pi0_genmanip --gres=gpu:8 --ntasks-per-node=1 \ torchrun \ --nnodes 1 \ --nproc_per_node 8 \ scripts/train/train.py \ --config run_configs/train/pi0_genmanip_v1.yaml ``` ### Multi-Node Multi-GPU (Slurm) We also provide Slurm scripts for multi-node training. **Step 1:** Create `train_pi0_genmanip_slurm.sh`: ```bash #!/bin/bash set -e export PYTHONPATH="$(pwd):$PYTHONPATH" source .venv/pi0/bin/activate master_addr=$(scontrol show hostnames $SLURM_JOB_NODELIST | head -n 1) export NCCL_SOCKET_IFNAME=$1 export NCCL_IB_HCA=$2 torchrun \ --nnodes=$SLURM_NNODES \ --nproc_per_node=8 \ --node_rank=$SLURM_PROCID \ --master_port=29500 --master_addr=$master_addr \ scripts/train/train.py \ --config run_configs/train/pi0_genmanip_v1.yaml ``` **Step 2:** Create `multinode_submit.slurm`: ```bash #!/bin/bash #SBATCH -N 2 #SBATCH --ntasks-per-node=1 # 1 task per node #SBATCH --gpus-per-task=8 # 8 GPUs per task srun bash train_pi0_genmanip_slurm.sh ``` **Step 3:** Start training: ```bash sbatch multinode_submit.slurm ``` ## Customizing Training with Your Own YAML Config If you would like to train with your own choice of model and dataset, you can simply create a custom YAML configuration file and pass it to the `--config` argument in the training script. For example, to train the pre-registered **Pi-0** model on the **GenManip** dataset, a minimal YAML configuration might look like this: ```yaml model_type: pi0 # Name of a pre-registered model dataset_path: InternRobotics/InternData-GenmanipTest # Can be a HuggingFace Hub ID or local path data_config: genmanip_v1 # Pre-registered dataset configuration base_model_path: lerobot/pi0 # (Optional) Overrides the model checkpoint path; will default to HF if omitted ``` **💡 Notes:** - `model_type`: Must match the name of a model that has already been registered within InternManip. - `dataset_path`: Can be a HuggingFace ID (e.g., `InternRobotics/InternData-GenmanipTest`) or a local directory where the dataset is downloaded. - `data_config`: Refers to a dataset configuration preset (e.g., for preprocessing or loading behavior), also pre-registered in the codebase. - `base_model_path`: This is optional. If the selected `model_type` is supported and known, InternManip will automatically resolve and download the correct checkpoint from HuggingFace. If you’ve already downloaded a model locally or want to use a custom one, you can specify the path here directly. By editing or extending this YAML file, you can quickly try different models, datasets, or training setups — all without modifying the training script. ## Available Models and Datasets When creating your own YAML config file for training or evaluation, you can directly refer to the following officially supported values: - Use values from the `${model_type}` and `${base_model_path}` columns below to populate the corresponding fields in your YAML. - Similarly, values from the `${data_config}` and `${dataset_path}` columns can be used to specify the dataset configuration and loading path. The following are the supported models along with their HuggingFace IDs:

`${model_type}`	`${base_model_path}`
`pi0`	`lerobot/pi0`
`pi0fast`	`pi0fast_base`
`gr00t_n1`	`nvidia/GR00T-N1-2B`
`gr00t_n1_5`	`nvidia/GR00T-N1.5-3B`
`dp_clip`	`None`
`act_clip`	`None`

Below are the datasets officially integrated into InternManip:

`${data_config}`	`${dataset_path}`
`genmanip_v1`	`InternRobotics/InternData-GenmanipTest`
`calvin_abc`	`InternRobotics/InternData-Calvin_ABC`
`google_robot`	`InternRobotics/InternData-fractal20220817_data`
`bridgedata_v2`	`InternRobotics/InternData-BridgeV2`

## Evaluation and Benchmarking (WIP) By default, the inference of model will be running in the main loop sharing the same process with the `env`. You can evaluate `pi0` on the `Genmanip` benchmark in a single process using the following command: ```bash python scripts/eval/start_evaluator.py \ --config scripts/eval/config/pi0_on_genmanip.py ``` The terminal prints SR (Success Rate) information for each episode and task: ```json { "success_episodes": [ {"task_name": "tasks/...", "episode_name": "010", "episode_sr": 1.0, ...} ], "failure_episodes": [], "success_rate": 1.0 } ``` You can view the images generated during evaluation in the `eval_results` directory.

> You can modify the bash script according to your resource availability and requirements. ## Available Benchmarks The following benchmarks are currently available for evaluation: - **[GenManip](https://arxiv.org/abs/2506.10966)** - **[CALVIN](https://github.com/mees/calvin)** - **[Simpler-Env](https://github.com/simpler-env/SimplerEnv)** InternManip offers implementations of multiple manipulation policy models—**GR00T-N1**, **GR00T-N1.5**, **Pi-0**, **DP-CLIP**, and **ACT-CLIP**—as well as curated datasets including **GenManip**, **Simpler-Env**, and **CALVIN**, all organized in the standardized **LeRobot** format. The available `${MODEL}`, `${DATASET}`, `${BENCHMARK}` and their results are summarized in the following tables: ### CALVIN (ABC-D) Benchmark | Model | Dataset/Benchmark | Score (Main Metric) | Model Weights | | ------------ | ---- | ------------- | ------- | | `gr00t_n1` | `calvin_abcd` | | | | `gr00t_n1_5` | `calvin_abcd` | | | | `pi0` | `calvin_abcd` | | | | `dp_clip` | `calvin_abcd` | | | | `act_clip` | `calvin_abcd` | | | ### Simpler-Env Benchmark | Model | Dataset/Benchmark | Success Rate | Model Weights | | ------------ | ------------- | ------------- | ------- | | `gr00t_n1` | `google_robot` | | | | `gr00t_n1_5` | `google_robot` | | | | `pi0` | `google_robot` | | | | `dp_clip` | `google_robot` | | | | `act_clip` | `google_robot` | | | | `gr00t_n1` | `bridgedata_v2` | | | | `gr00t_n1_5` | `bridgedata_v2` | | | | `pi0` | `bridgedata_v2` | | | | `dp_clip` | `bridgedata_v2` | | | | `act_clip` | `bridgedata_v2` | | | ### Genmanip Benchmark | Model | Dataset/Benchmark | Success Rate | Model Weights | | ------------ | ------------- | ------------- | ------- | | `gr00t_n1` | `genmanip_v1` | | | | `gr00t_n1_5` | `genmanip_v1` | | | | `pi0` | `genmanip_v1` | | | | `dp_clip` | `genmanip_v1` | | | | `act_clip` | `genmanip_v1` | | | ## What's Next? Now that you’ve completed the training and evaluation process, you may want to incorporate your **own dataset**, **model**, or **benchmark**. To do so, please refer to the following guides: * 📁 **[How to customize your dataset](../quick_start/add_dataset.md)** – Learn how to prepare and register your dataset for training or evaluation. * 🧠 **[How to add a model](../quick_start/add_model.md)** – Learn how to integrate your own model into InternManip’s training pipeline. * 🧪 **[How to add your benchmark](../quick_start/add_benchmark.md)** – Learn how to implement and register a new evaluation benchmark. Once you’ve set them up, you can follow the same command structures used above—just replace the relevant configuration entries (e.g., `--config`) with your custom definitions.