Training
This guide covers data format conversion and policy training for validating generated simulation data.
Part 1: LMDB to LeRobot Data Conversion
The simulation data generated by InternDataEngine is stored in LMDB format. To use this data for policy training, you need to convert it to LeRobot format.
Step 1: Install LeRobot v2.1
We use LeRobot v2.1 format for data storage. Install the LeRobot 2.1 repo.
Step 2: Convert LMDB to LeRobot v2.1
Use the conversion scripts in policy/lmdb2lerobotv21 directory.
We provide conversion scripts for different robot platforms:
- lmdb2lerobot_lift2_a1.py (script): Lift2 (ARX).
- lmdb2lerobot_split_aloha_a1.py (script): Split Aloha.
- lmdb2lerobot_genie1_a1.py (script): Genie1.
- lmdb2lerobot_franka_a1.py (script): Franka FR3.
- lmdb2lerobot_frankarobotiq_a1.py (script): Franka with Robotiq gripper.
Example usage:
python lmdb2lerobot_lift2_a1.py \
--src_path ${src_path} \
--save_path ${save_path} \
--repo_id ${repo_id} \
--num-threads ${num_threads} \
--num_demos ${num_demos}Parameters:
- --src_path (str): Path to the source LMDB data directory.
- --save_path (str): Path to save the converted LeRobot dataset.
- --repo_id (str): Dataset repository identifier.
- --num-threads (int): Number of threads for parallel processing.
- --num_demos (int): Number of demonstrations to convert (optional).
Step 3: Convert to LeRobot v3.0 (Optional)
If you need LeRobot v3.0 format for training, please install LeRobot 3.0. Then use the conversion script:
python convertv21_to_v30.py --input_path ${v21_path} --output_path ${v30_path}The conversion code is available at policy/lmdb2lerobotv21/convertv21_to_v30.py.
Part 2: Policy Training with π0
As described in the InternData-A1 paper, we used multi-machine, multi-GPU JAX-based π0 for data validation.
We have implemented a JAX-based, multi-nodes, multi-GPU training pipeline that supports multi-dataset mixed training for π0.
Features
- Multi-machine, multi-GPU training: Scale training across multiple nodes
- Multi-dataset mixed training: Train on multiple datasets simultaneously
- JAX-based implementation: High-performance training with JAX/Flax
Installation, Training, and Deployment
For detailed instructions on installation, training, and deployment, please refer to the openpi-InternData-A1 README.
References
- LeRobot - HuggingFace LeRobot
- InternData-A1 Paper - InternData-A1: A High-Fidelity Synthetic Data Generator for Robotic Manipulation
- openpi-InternData-A1 - JAX-based π0 training code