Dataset Preparation#

We prepared high-quality data for training system1/system2 and evaluation on isaac sim and habitat sim environment. These trajectories were collected using the training episodes from R2R and RxR under the Matterport3D environment.

Data and Checkpoints Checklist#

To get started with the training and evaluation, we need to prepare the data and checkpoints properly.

  1. InternVLA-N1 pretrained Checkpoints

  • Download our latest pretrained checkpoint of InternVLA-N1 and run the following script to inference with visualization results. Move the checkpoint to the checkpoints directory.

  1. DepthAnything v2 Checkpoints

  • Download the depthanything v2 pretrained checkpoint. Move the checkpoint to the checkpoints directory.

  1. InternData-N1 Dataset Episodes

  • Download the InternData-N1. You only need to download the dataset relevant to your chosen task. Download vln_ce for VLNCE evaluation in habitat, vln_pe for VLNPE evaluation in internutopia.

  1. Scene-N1

  • Download the SceneData-N1 for mp3d_ce or mp3d_pe. Extract them into the data/scene_data/ directory.

  1. Embodiments

  • Download the Embodiments and place it under the Embodiments/. These embodiment assets are used by the Isaac Sim environment.

The final folder structure should look like this:

InternNav/
β”œβ”€β”€ checkpoints/
β”‚   β”œβ”€β”€ InternVLA-N1/
β”‚   β”‚   β”œβ”€β”€ model-00001-of-00004.safetensors
β”‚   β”‚   β”œβ”€β”€ config.json
β”‚   β”‚   └── ...
β”‚   β”œβ”€β”€ InternVLA-N1-S2
β”‚   β”‚   β”œβ”€β”€ model-00001-of-00004.safetensors
β”‚   β”‚   β”œβ”€β”€ config.json
β”‚   β”‚   └── ...
β”‚   β”œβ”€β”€ depth_anything_v2_vits.pth
β”‚   └── r2r
β”‚       β”œβ”€β”€ fine_tuned
β”‚       └── zero_shot
β”œβ”€β”€ data/
|   β”œβ”€β”€ Embodiments/
β”‚   β”œβ”€β”€ scene_data/
β”‚   β”‚   β”œβ”€β”€ mp3d_ce/
β”‚   β”‚   β”‚   └── mp3d/
β”‚   β”‚   β”‚       β”œβ”€β”€ 17DRP5sb8fy/
β”‚   β”‚   β”‚       β”œβ”€β”€ 1LXtFkjw3qL/
β”‚   β”‚   β”‚       └── ...
β”‚   β”‚   └── mp3d_pe/
β”‚   β”‚       β”œβ”€β”€17DRP5sb8fy/
β”‚   β”‚       β”œβ”€β”€ 1LXtFkjw3qL/
β”‚   β”‚       └── ...
|   β”œβ”€β”€ vln_n1/
|   |   └── traj_data/
β”‚   β”œβ”€β”€ vln_ce/
β”‚   β”‚   β”œβ”€β”€ raw_data/
β”‚   β”‚   β”‚   β”œβ”€β”€ r2r
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ train
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ val_seen
β”‚   β”‚   β”‚   β”‚   β”‚   └── val_seen.json.gz
β”‚   β”‚   β”‚   β”‚   └── val_unseen
β”‚   β”‚   β”‚   β”‚       └── val_unseen.json.gz
β”‚   β”‚   └── traj_data/
β”‚   └── vln_pe/
β”‚       β”œβ”€β”€ raw_data/    # JSON files defining tasks, navigation goals, and dataset splits
β”‚       β”‚   └── r2r/
β”‚       β”‚       β”œβ”€β”€ train/
β”‚       β”‚       β”œβ”€β”€ val_seen/
β”‚       β”‚       β”‚   └── val_seen.json.gz
β”‚       β”‚       └── val_unseen/
β”‚       └── traj_data/   # training sample data for two types of scenes
β”‚           β”œβ”€β”€ interiornav/
β”‚           β”‚   └── kujiale_xxxx.tar.gz
β”‚           └── r2r/
β”‚               └── trajectory_0/
β”‚                   β”œβ”€β”€ data/
β”‚                   β”œβ”€β”€ meta/
β”‚                   └── videos/
β”œβ”€β”€ internnav/
β”‚   └── ...