# Evaluation

This document describes how to evaluate models in **InternNav**.

## InternVLA-N1 (Dual System) 

Model weights of InternVLA-N1 (Dual System) can be downloaded from [InternVLA-N1-DualVLN](https://huggingface.co/InternRobotics/InternVLA-N1-DualVLN) and [InternVLA-N1-w-NavDP](https://huggingface.co/InternRobotics/InternVLA-N1-w-NavDP).

---

### Evaluation on Isaac Sim
Before evaluation, we should download the robot assets from [InternUTopiaAssets](https://huggingface.co/datasets/InternRobotics/Embodiments) and move them to the `data/` directory.

[UPDATE] We support using local model and isaac sim in one process now. Evaluate on Single-GPU:

```bash
python scripts/eval/eval.py --config scripts/eval/configs/h1_internvla_n1_async_cfg.py    
```

For multi-gpu inference, currently we support inference on environments that expose a torchrun-compatible runtime model (e.g., Torchrun or Aliyun DLC).

```bash
# for torchrun
./scripts/eval/bash/torchrun_eval.sh \
    --config scripts/eval/configs/h1_internvla_n1_async_cfg.py

# for alicloud dlc
./scripts/eval/bash/eval_vln_distributed.sh \
    internutopia \
    --config scripts/eval/configs/h1_internvla_n1_async_cfg.py
```

The main architecture of the whole-system evaluation adopts a client-server model. In the client, we specify the corresponding configuration (*.cfg), which includes settings such as the scenarios to be evaluated, robots, models, and parallelization parameters. The client sends requests to the server, which then submits tasks to the Ray distributed framework based on the corresponding cfg file, enabling the entire evaluation process to run.

First, change the 'model_path' in the cfg file to the path of the InternVLA-N1 weights. Start the evaluation server:
```bash
# from one process
conda activate <model_env>
python scripts/eval/start_server.py --config scripts/eval/configs/h1_internvla_n1_async_cfg.py
```

Then, start the client to run evaluation:
```bash
# from another process
conda activate <internutopia>
MESA_GL_VERSION_OVERRIDE=4.6 python scripts/eval/eval.py --config scripts/eval/configs/h1_internvla_n1_async_cfg.py
```

The evaluation results will be saved in the `eval_results.log` file in the output_dir of the config file. The whole evaluation process takes about 10 hours at RTX-4090 graphics platform.
The simulation can be visualized by set `vis_output=True` in eval_cfg.

<img src="../../../_static/video/nav_eval.gif" alt="My GIF">

### Evaluation on Habitat Sim
Evaluate on Single-GPU:

```bash
python scripts/eval/eval.py --config scripts/eval/configs/habitat_dual_system_cfg.py
```

For multi-gpu inference, currently we support inference on SLURM as well as environments that expose a torchrun-compatible runtime model (e.g., Aliyun DLC).

```bash
# for slurm
./scripts/eval/bash/eval_dual_system.sh

# for torchrun
./scripts/eval/bash/torchrun_eval.sh \
    --config scripts/eval/configs/habitat_dual_system_cfg.py

# for alicloud dlc
./scripts/eval/bash/eval_vln_distributed.sh \
    habitat \
    --config scripts/eval/configs/habitat_dual_system_cfg.py
```

## InternVLA-N1 (System 2)

Model weights of InternVLA-N1 (System2) can be downloaded from [InternVLA-N1-System2](https://huggingface.co/InternRobotics/InternVLA-N1-System2).

Currently we only support evaluate single System2 on Habitat:

Evaluate on Single-GPU:

```bash
python scripts/eval/eval.py --config scripts/eval/configs/habitat_s2_cfg.py

# set config with the following fields
eval_cfg = EvalCfg(
    agent=AgentCfg(
        model_name='internvla_n1',
        model_settings={
            "mode": "system2",  # inference mode: dual_system or system2
            "model_path": "checkpoints/<s2_checkpoint>",  # path to model checkpoint
        }
    )
)
```

For multi-gpu inference, currently we only support inference on SLURM.

```bash
./scripts/eval/bash/eval_system2.sh
```

## VN Systems (System 1) 

We support the evaluation of diverse System-1 baselines separately in [NavDP](https://github.com/InternRobotics/NavDP/tree/navdp_benchmark) to make it easy to use and deploy.
To install the environment, we provide a quick start below:
#### Step 0: Create the conda environment
```bash
conda create -n isaaclab python=3.10
conda activate isaaclab
```
#### Step 1: Install Isaacsim 4.2
```bash
pip install --upgrade pip
pip install isaacsim==4.2.0.2 isaacsim-extscache-physics==4.2.0.2 isaacsim-extscache-kit==4.2.0.2 isaacsim-extscache-kit-sdk==4.2.0.2 --extra-index-url https://pypi.nvidia.com
# (optional) you can check the installation by running the following
isaacsim omni.isaac.sim.python.kit
```

#### Step 2: Install IsaacLab 1.2.0
```bash
git clone https://github.com/isaac-sim/IsaacLab.git
cd IsaacLab/
git checkout tags/v1.2.0
# (optional) you can check the installation by running the following
./isaaclab.sh -p source/standalone/tutorials/00_sim/create_empty.py
```

#### Step 3: Install the dependencies for InternVLA-N1(S1)
```bash
git clone https://github.com/OpenRobotLab/NavDP.git
cd NavDP
git checkout navdp_benchmark
pip install -r requirements.txt
```
#### Step 4: Start the InternVLA-N1(S1) server
```bash
cd system1_baselines/navdp
python navdp_server.py --port {PORT} --checkpoint {CHECKPOINT_path}
```

#### Step 5: Running the Evaluation
```bash
python eval_pointgoal_wheeled.py --port {PORT} --scene_dir {SCENE_DIR}
```

## Single-System VLN Baselines
We provide three small Single-System VLN baselines (Seq2Seq, CMA, RDP) for evaluation in the InterUtopia (Isaac-Sim) environment.

Download the baseline models:
```bash
# ddppo-models
$ mkdir -p checkpoints/ddppo-models
$ wget -P checkpoints/ddppo-models https://dl.fbaipublicfiles.com/habitat/data/baselines/v1/ddppo/ddppo-models/gibson-4plus-mp3d-train-val-test-resnet50.pth
# longclip-B
$ huggingface-cli download --include 'longclip-B.pt' --local-dir-use-symlinks False --resume-download Beichenzhang/LongCLIP-B --local-dir checkpoints/clip-long
# download r2r finetuned baseline checkpoints
$ git clone https://huggingface.co/InternRobotics/VLN-PE && mv VLN-PE/r2r checkpoints/
```

Start Evaluation:
```bash
# Please modify the first line of the bash file to your own conda path
# seq2seq model
./scripts/eval/bash/start_eval.sh --config scripts/eval/configs/h1_seq2seq_cfg.py
# cma model
./scripts/eval/bash/start_eval.sh --config scripts/eval/configs/h1_cma_cfg.py
# rdp model
./scripts/eval/bash/start_eval.sh --config scripts/eval/configs/h1_rdp_cfg.py
```


The evaluation results will be saved in the `eval_results.log` file in the `output_dir` of the config file.