Training and Evaluation#

This document presents how to train and evaluate models for different systems with InternNav.

Whole-system#

Evaluation#

Before evaluation, we should download the robot assets from InternUTopiaAssets. Model weights of InternVLA-N1 can be downloaded from InternVLA-N1.

Evaluation on isaac sim#

The main architecture of the whole-system evaluation adopts a client-server model. In the client, we specify the corresponding configuration (*.cfg), which includes settings such as the scenarios to be evaluated, robots, models, and parallelization parameters. The client sends requests to the server, which then submits tasks to the Ray distributed framework based on the corresponding cfg file, enabling the entire evaluation process to run.

First start the ray server:

ray disable-usage-stats
ray stop
ray start --head

Then change the β€˜model_path’ in the cfg file to the path of the InternVLA-N1 weights. Start the evaluation server:

INTERNUTOPIA_ASSETS_PATH=/path/to/InternUTopiaAssets MESA_GL_VERSION_OVERRIDE=4.6 python scripts/eval/eval.py --config scripts/eval/configs/h1_internvla_n1_cfg.py

Finally, start the client:

python -m internnav.agent.utils.server --config scripts/eval/configs/h1_internvla_n1_cfg.py

The evaluation results will be saved in the eval_results.log file in the output_dir of the config file. The whole evaluation process takes about 3 hours at RTX4090 platform.

Evaluation on habitat#

Evaluate on Single-GPU:

python scripts/eval/eval_habitat.py --model_path checkpoints/InternVLA-N1 --continuous_traj --output_path result/InternVLA-N1/val_unseen_32traj_8steps

For multi-gpu inference, currently we only support inference on SLURM.

./scripts/eval/eval_dual_system.sh

System1#

Training#

Download the training data from Hugging Face, and extract them into the data/datasets/ directory.

./scripts/train/start_train.sh --name "$NAME" --model-name navdp

Evaluation#

We support the evaluation of diverse System-1 baselines separately in NavDP to make it easy to use and deploy. To install the environment, we provide a quick start below:

Step 0: Create the conda environment#

conda create -n isaaclab python=3.10
conda activate isaaclab

Step 1: Install Isaacsim 4.2#

pip install --upgrade pip
pip install isaacsim==4.2.0.2 isaacsim-extscache-physics==4.2.0.2 isaacsim-extscache-kit==4.2.0.2 isaacsim-extscache-kit-sdk==4.2.0.2 --extra-index-url https://pypi.nvidia.com
# (optional) you can check the installation by running the following
isaacsim omni.isaac.sim.python.kit

Step 2: Install IsaacLab 1.2.0#

git clone https://github.com/isaac-sim/IsaacLab.git
cd IsaacLab/
git checkout tags/v1.2.0
# (optional) you can check the installation by running the following
./isaaclab.sh -p source/standalone/tutorials/00_sim/create_empty.py

Step 3: Install the dependencies for InternVLA-N1(S1)#

git clone https://github.com/OpenRobotLab/NavDP.git
cd NavDP
git checkout navdp_benchmark
pip install -r requirements.txt

Step 4: Start the InternVLA-N1(S1) server#

cd system1_baselines/navdp
python navdp_server.py --port {PORT} --checkpoint {CHECKPOINT_path}

Step 5: Running the Evaluation#

python eval_pointgoal_wheeled.py --port {PORT} --scene_dir {SCENE_DIR}

System2#

Data Preparation#

Please download the following VLN-CE datasets and insert them into the data folder following the same structure.

  1. VLN-CE Episodes

    Download the VLN-CE episodes:

    • r2r (rename R2R_VLNCE_v1/ -> r2r/)

    • rxr (rename RxR_VLNCE_v0/ -> rxr/)

    • envdrop (rename R2R_VLNCE_v1-3_preprocessed/envdrop/ -> envdrop/)

    Extract them into the data/datasets/ directory.

  2. InternData-N1

We provide pre-collected observation-action trajectory data for training. These trajectories were collected using the training episodes from R2R and RxR under the Matterport3D environment. Download the InternData-N1 and SceneData-N1. The final folder structure should look like this:

data/
β”œβ”€β”€ scene_data/
β”‚   β”œβ”€β”€ mp3d_pe/
β”‚   β”‚   β”œβ”€β”€17DRP5sb8fy/
β”‚   β”‚   β”œβ”€β”€ 1LXtFkjw3qL/
β”‚   β”‚   └── ...
β”‚   β”œβ”€β”€ mp3d_ce/
β”‚   └── mp3d_n1/
β”œβ”€β”€ vln_pe/
β”‚   β”œβ”€β”€ raw_data/
β”‚   β”‚   β”œβ”€β”€ train/
β”‚   β”‚   β”œβ”€β”€ val_seen/
β”‚   β”‚   β”‚   └── val_seen.json.gz
β”‚   β”‚   └── val_unseen/
β”‚   β”‚       └── val_unseen.json.gz
β”œβ”€β”€ └── traj_data/
β”‚       └── mp3d/
β”‚           └── trajectory_0/
β”‚               β”œβ”€β”€ data/
β”‚               β”œβ”€β”€ meta/
β”‚               └── videos/
β”œβ”€β”€ vln_ce/
β”‚   β”œβ”€β”€ raw_data/
β”‚   └── traj_data/
└── vln_n1/
    └── traj_data/

Training#

Currently, we only support training of small VLN models (CMA, RDP, Seq2Seq) in this repo. For the trainning of LLM-based VLN (Navid, StreamVLN, etc), please refer to StreamVLN for training details.

# train cma model
./scripts/train/start_train.sh --name cma_train --model cma

# train rdp model
./scripts/train/start_train.sh --name rdp_train --model rdp

# train seq2seq model
./scripts/train/start_train.sh --name seq2seq_train --model seq2seq

Evaluation#

Currently we only support evaluate single System2 on Habitat:

Evaluate on Single-GPU:

python scripts/eval/eval_habitat.py --model_path checkpoints/InternVLA-N1-S2 --mode system2 --output_path results/InternVLA-N1-S2/val_unseen \

For multi-gpu inference, currently we only support inference on SLURM.

./scripts/eval/eval_system2.sh