Evaluation#
This document describes how to evaluate models in InternNav.
InternVLA-N1 (Dual System)#
Model weights of InternVLA-N1 (Dual System) can be downloaded from InternVLA-N1-DualVLN and InternVLA-N1-w-NavDP.
Evaluation on Isaac Sim#
Before evaluation, we should download the robot assets from InternUTopiaAssets and move them to the data/ directory.
InternNav supports two execution modes for running the model during evaluation.
1) In-Process Mode (use_agent_server = False)#
We now support running the local model and Isaac Sim in the same process, enabling single-GPU evaluation without launching a separate agent service, and also supports multi-process execution, where each process hosts its own simulator and local model.
python scripts/eval/eval.py --config scripts/eval/configs/h1_internvla_n1_async_cfg.py
# set config with the following fields
eval_cfg = EvalCfg(
task=TaskCfg(
task_settings={
'use_distributed': False, # disable Ray-based distributed evaluation
}
),
eval_settings={
'use_agent_server': False, # run the model in the same process as the simulator
},
)
For multi-gpu inference, currently we support inference on environments that expose a torchrun-compatible runtime model (e.g., Torchrun or Aliyun DLC).
# for torchrun
./scripts/eval/bash/torchrun_eval.sh \
--config scripts/eval/configs/h1_internvla_n1_async_cfg.py
# for alicloud dlc
./scripts/eval/bash/eval_vln_distributed.sh \
internutopia \
--config scripts/eval/configs/h1_internvla_n1_async_cfg.py
2) Agent Server Mode (use_agent_server = True)#
We also support running the model in a separate process. First, change the ‘model_path’ in the cfg file to the path of the InternVLA-N1 weights. Start the evaluation server:
# from one process
conda activate <model_env>
python scripts/eval/start_server.py --config scripts/eval/configs/h1_internvla_n1_async_cfg.py
Then, start the client to run evaluation:
# from another process
conda activate <internutopia>
MESA_GL_VERSION_OVERRIDE=4.6 python scripts/eval/eval.py --config scripts/eval/configs/h1_internvla_n1_async_cfg.py
# set config with the following fields
eval_cfg = EvalCfg(
eval_settings={
'use_agent_server': True, # run the model in the same process as the simulator
},
)
The evaluation results will be saved in the eval_results.log file in the output_dir of the config file. The whole evaluation process takes about 10 hours at RTX-4090 graphics platform.
The simulation can be visualized by set vis_output=True in eval_cfg.
Evaluation on Habitat Sim#
Evaluate on Single-GPU:
python scripts/eval/eval.py --config scripts/eval/configs/habitat_dual_system_cfg.py
For multi-gpu inference, currently we support inference on SLURM as well as environments that expose a torchrun-compatible runtime model (e.g., Aliyun DLC).
# for slurm
./scripts/eval/bash/eval_dual_system.sh
# for torchrun
./scripts/eval/bash/torchrun_eval.sh \
--config scripts/eval/configs/habitat_dual_system_cfg.py
# for alicloud dlc
./scripts/eval/bash/eval_vln_distributed.sh \
habitat \
--config scripts/eval/configs/habitat_dual_system_cfg.py
InternVLA-N1 (System 2)#
Model weights of InternVLA-N1 (System2) can be downloaded from InternVLA-N1-System2.
Currently we only support evaluate single System2 on Habitat:
Evaluate on Single-GPU:
python scripts/eval/eval.py --config scripts/eval/configs/habitat_s2_cfg.py
# set config with the following fields
eval_cfg = EvalCfg(
agent=AgentCfg(
model_name='internvla_n1',
model_settings={
"mode": "system2", # inference mode: dual_system or system2
"model_path": "checkpoints/<s2_checkpoint>", # path to model checkpoint
}
)
)
For multi-gpu inference, currently we only support inference on SLURM.
./scripts/eval/bash/eval_system2.sh
VN Systems (System 1)#
We support the evaluation of diverse System-1 baselines separately in NavDP to make it easy to use and deploy. To install the environment, we provide a quick start below:
Step 0: Create the conda environment#
conda create -n isaaclab python=3.10
conda activate isaaclab
Step 1: Install Isaacsim 4.2#
pip install --upgrade pip
pip install isaacsim==4.2.0.2 isaacsim-extscache-physics==4.2.0.2 isaacsim-extscache-kit==4.2.0.2 isaacsim-extscache-kit-sdk==4.2.0.2 --extra-index-url https://pypi.nvidia.com
# (optional) you can check the installation by running the following
isaacsim omni.isaac.sim.python.kit
Step 2: Install IsaacLab 1.2.0#
git clone https://github.com/isaac-sim/IsaacLab.git
cd IsaacLab/
git checkout tags/v1.2.0
# (optional) you can check the installation by running the following
./isaaclab.sh -p source/standalone/tutorials/00_sim/create_empty.py
Step 3: Install the dependencies for InternVLA-N1(S1)#
git clone https://github.com/OpenRobotLab/NavDP.git
cd NavDP
git checkout navdp_benchmark
pip install -r requirements.txt
Step 4: Start the InternVLA-N1(S1) server#
cd system1_baselines/navdp
python navdp_server.py --port {PORT} --checkpoint {CHECKPOINT_path}
Step 5: Running the Evaluation#
python eval_pointgoal_wheeled.py --port {PORT} --scene_dir {SCENE_DIR}
Single-System VLN Baselines#
We provide three small Single-System VLN baselines (Seq2Seq, CMA, RDP) for evaluation in the InterUtopia (Isaac-Sim) environment.
Download the baseline models:
# ddppo-models
$ mkdir -p checkpoints/ddppo-models
$ wget -P checkpoints/ddppo-models https://dl.fbaipublicfiles.com/habitat/data/baselines/v1/ddppo/ddppo-models/gibson-4plus-mp3d-train-val-test-resnet50.pth
# longclip-B
$ huggingface-cli download --include 'longclip-B.pt' --local-dir-use-symlinks False --resume-download Beichenzhang/LongCLIP-B --local-dir checkpoints/clip-long
# download r2r finetuned baseline checkpoints
$ git clone https://huggingface.co/InternRobotics/VLN-PE && mv VLN-PE/r2r checkpoints/
Start Evaluation:
# Please modify the first line of the bash file to your own conda path
# seq2seq model
./scripts/eval/bash/start_eval.sh --config scripts/eval/configs/h1_seq2seq_cfg.py
# cma model
./scripts/eval/bash/start_eval.sh --config scripts/eval/configs/h1_cma_cfg.py
# rdp model
./scripts/eval/bash/start_eval.sh --config scripts/eval/configs/h1_rdp_cfg.py
The evaluation results will be saved in the eval_results.log file in the output_dir of the config file.