Installation Guide#

This page provides detailed guidance on simulation environment setup and quantitative model evaluation. If you want to reproduce the results of the technical report, you should follow this page. Howerver, for inference-only usage, such as deploying InternVLA-N1 in your own robot or self-built dataset, you could follow this simpler guideline to setup the environment and run inference with the model.

Prerequisites#

InternNav works across most hardware setups. Just note the following exceptions:

  • Benchmark based on Isaac Sim such as VN and VLN-PE benchmarks must run on NVIDIA RTX series GPUs (e.g., RTX 4090).

Simulation Requirements#

  • OS: Ubuntu 20.04/22.04

  • GPU Compatibility:

GPU Model Training & Inference Simulation
VLN-CE VN VLN-PE
NVIDIA RTX Series
(Driver: 535.216.01+ )
โœ… โœ… โœ… โœ…
NVIDIA V/A/H100 โœ… โœ… โŒ โŒ

Note

We provide a flexible installation tool for users who want to use InternNav for different purposes. Users can choose to install the training and inference environment, and the individual simulation environment independently.

Model-Specific Requirements#

Models Minimum GPU Requirement System RAM
(Train/Inference)
Training Inference
StreamVLN & InternVLA-N1 A100 RTX 4090 / A100 80GB / 24GB
NavDP (VN Models) RTX 4090 / A100 RTX 3060 / A100 16GB / 2GB
CMA (VLN-PE Small Models) RTX 4090 / A100 RTX 3060 / A100 8GB / 1GB

Quick Installation#

Clone the InternNav repository:

git clone https://github.com/InternRobotics/InternNav.git --recursive

Our toolchain provides two Python environment solutions to accommodate different usage scenarios with the InternNav-N1 series model:

  • For quick trials and evaluations of the InternNav-N1 model, we recommend using the Habitat environment. This option offer allowing you to quickly test and eval the InternVLA-N1 models with minimal configuration.

  • If you require high-fidelity rendering, training capabilities, and physical property evaluations within the environment, we suggest using the Isaac Sim environment. This solution provides enhanced graphical rendering and more accurate physics simulations for comprehensive testing.

Choose the environment that best fits your specific needs to optimize your experience with the InternNav-N1 model. Note that both environments support the training of the system1 model NavDP.

Isaac Sim Environment#

Prerequisite#

  • Ubuntu 20.04, 22.04

  • Python 3.10.16 (3.10.* should be ok)

  • NVIDIA Omniverse Isaac Sim 4.5.0

  • NVIDIA GPU (RTX 2070 or higher)

  • NVIDIA GPU Driver (recommended version 535.216.01+)

  • PyTorch 2.5.1, 2.6.0 (recommended)

  • CUDA 11.8, 12.4 (recommended)

Before proceeding with the installation, ensure that you have Isaac Sim 4.5.0 and Conda installed.

Pull our latest Docker image with everything you need

$ docker pull crpi-mdum1jboc8276vb5.cn-beijing.personal.cr.aliyuncs.com/iros-challenge/internnav:v1.2

Run the container

$ xhost +local:root # Allow the container to access the display

$ cd PATH/TO/INTERNNAV/

$ docker run --name internnav -it --rm --gpus all --network host \
  -e "ACCEPT_EULA=Y" \
  -e "PRIVACY_CONSENT=Y" \
  -e "DISPLAY=${DISPLAY}" \
  --entrypoint /bin/bash \
  -w /root/InternNav \
  -v /tmp/.X11-unix/:/tmp/.X11-unix \
  -v ${PWD}:/root/InternNav \
  -v ${HOME}/docker/isaac-sim/cache/kit:/isaac-sim/kit/cache:rw \
  -v ${HOME}/docker/isaac-sim/cache/ov:/root/.cache/ov:rw \
  -v ${HOME}/docker/isaac-sim/cache/pip:/root/.cache/pip:rw \
  -v ${HOME}/docker/isaac-sim/cache/glcache:/root/.cache/nvidia/GLCache:rw \
  -v ${HOME}/docker/isaac-sim/cache/computecache:/root/.nv/ComputeCache:rw \
  -v ${HOME}/docker/isaac-sim/logs:/root/.nvidia-omniverse/logs:rw \
  -v ${HOME}/docker/isaac-sim/data:/root/.local/share/ov/data:rw \
  -v ${HOME}/docker/isaac-sim/documents:/root/Documents:rw \
  -v ${PWD}/data/scene_data/mp3d_pe:/isaac-sim/Matterport3D/data/v1/scans:rw \
  crpi-mdum1jboc8276vb5.cn-beijing.personal.cr.aliyuncs.com/iros-challenge/internnav:v1.0

Conda installation from Scretch#

conda create -n <env> python=3.10 libxcb=1.14

# Install InternUtopia through pip.(2.1.1 and 2.2.0 recommended)
conda activate <env>
pip install internutopia

# Configure the conda environment.
python -m internutopia.setup_conda_pypi
conda deactivate && conda activate <env>

For InternUtopia installation, you can find more detailed docs in InternUtopia.

# Install PyTorch based on your CUDA version
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu118

# Install other deps
pip install -r requirements/isaac_requirements.txt

If you need to train or evaluate models on Habitat without physics simulation, we recommend the following setup and easier environment installation.

Habitat Environment#

Prerequisite#

  • Python 3.9

  • Pytorch 2.6.0

  • CUDA 12.4

  • GPU: NVIDIA A100 or higher (optional for VLA training)

conda create -n <env> python=3.9
conda activate <env>

Install habitat sim and habitat lab:

conda install habitat-sim==0.2.4 withbullet headless -c conda-forge -c aihabitat
git clone --branch v0.2.4 https://github.com/facebookresearch/habitat-lab.git
cd habitat-lab
pip install -e habitat-lab  # install habitat_lab
pip install -e habitat-baselines # install habitat_baselines

Install pytorch and other requirements:

pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements/habitat_requirements.txt

Verification#

Data/Checkpoints Preparation#

To get started, we need to prepare the data and checkpoints.

  1. InternVLA-N1 pretrained Checkpoints

  • Download our latest pretrained checkpoint of InternVLA-N1 and run the following script to inference with visualization results. Move the checkpoint to the checkpoints directory.

  1. DepthAnything v2 Checkpoints

  • Download the depthanything v2 pretrained checkpoint. Move the checkpoint to the checkpoints directory.

  1. InternData-N1 Dataset Episodes

  • Download the InternData-N1. Extract them into the data/vln_ce/ and data/vln_pe/ directory.

  1. Scene-N1

  • Download the SceneData-N1 for mp3d_ce. Extract them into the data/scene_data/ directory.

  1. Embodiments

  1. Baseline models

# ddppo-models
$ mkdir -p checkpoints/ddppo-models
$ wget -P checkpoints/ddppo-models https://dl.fbaipublicfiles.com/habitat/data/baselines/v1/ddppo/ddppo-models/gibson-4plus-mp3d-train-val-test-resnet50.pth
# longclip-B
$ huggingface-cli download --include 'longclip-B.pt' --local-dir-use-symlinks False --resume-download Beichenzhang/LongCLIP-B --local-dir checkpoints/clip-long
# download r2r finetuned baseline checkpoints
$ git clone https://huggingface.co/InternRobotics/VLN-PE && mv VLN-PE/r2r checkpoints/

The final folder structure should look like this:

InternNav/
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ scene_data/
โ”‚   โ”‚   โ”œโ”€โ”€ mp3d_ce/
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ mp3d/
โ”‚   โ”‚   โ”‚       โ”œโ”€โ”€ 17DRP5sb8fy/
โ”‚   โ”‚   โ”‚       โ”œโ”€โ”€ 1LXtFkjw3qL/
โ”‚   โ”‚   โ”‚       โ””โ”€โ”€ ...
โ”‚   โ”‚   โ””โ”€โ”€ mp3d_pe/
โ”‚   โ”‚       โ”œโ”€โ”€17DRP5sb8fy/
โ”‚   โ”‚       โ”œโ”€โ”€ 1LXtFkjw3qL/
โ”‚   โ”‚       โ””โ”€โ”€ ...
โ”‚   โ”œโ”€โ”€ vln_ce/
โ”‚   โ”‚   โ”œโ”€โ”€ raw_data/
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ r2r
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ train
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ val_seen
โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ val_seen.json.gz
โ”‚   โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ val_unseen
โ”‚   โ”‚   โ”‚   โ”‚       โ””โ”€โ”€ val_unseen.json.gz
โ”‚   โ”‚   โ””โ”€โ”€ traj_data/
โ”‚   โ””โ”€โ”€ vln_pe/
โ”‚       โ”œโ”€โ”€ raw_data/    # JSON files defining tasks, navigation goals, and dataset splits
โ”‚       โ”‚   โ””โ”€โ”€ r2r/
โ”‚       โ”‚       โ”œโ”€โ”€ train/
โ”‚       โ”‚       โ”œโ”€โ”€ val_seen/
โ”‚       โ”‚       โ”‚   โ””โ”€โ”€ val_seen.json.gz
โ”‚       โ”‚       โ””โ”€โ”€ val_unseen/
โ”‚       โ””โ”€โ”€ traj_data/   # training sample data for two types of scenes
โ”‚           โ”œโ”€โ”€ interiornav/
โ”‚           โ”‚   โ””โ”€โ”€ kujiale_xxxx.tar.gz
โ”‚           โ””โ”€โ”€ r2r/
โ”‚               โ””โ”€โ”€ trajectory_0/
โ”‚                   โ”œโ”€โ”€ data/
โ”‚                   โ”œโ”€โ”€ meta/
โ”‚                   โ””โ”€โ”€ videos/
โ”œโ”€โ”€ checkpoints/
โ”‚   โ”œโ”€โ”€ InternVLA-N1/
โ”‚   โ”‚   โ”œโ”€โ”€ model-00001-of-00004.safetensors
โ”‚   โ”‚   โ”œโ”€โ”€ config.json
โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚   โ”œโ”€โ”€ InternVLA-N1-S2
โ”‚   โ”‚   โ”œโ”€โ”€ model-00001-of-00004.safetensors
โ”‚   โ”‚   โ”œโ”€โ”€ config.json
โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚   โ”œโ”€โ”€ depth_anything_v2_vits.pth
โ”‚   โ”œโ”€โ”€ r2r
โ”‚   โ”‚   โ”œโ”€โ”€ fine_tuned
โ”‚   โ”‚   โ””โ”€โ”€ zero_shot
โ”œโ”€โ”€ internnav/
โ”‚   โ””โ”€โ”€ ...

Gradio demo#

Currently the gradio demo is only available in habitat environment. Replace the โ€˜model_pathโ€™ variable in โ€˜vln_gradio_backend.pyโ€™ with the path of InternVLA-N1 checkpoint.

conda activate <habitat-env>
python3 scripts/demo/vln_gradio_backend.py

Find the IP address of the node allocated by Slurm. Then change the BACKEND_URL in the gradio client (navigation_ui.py) to the serverโ€™s IP address. Start the gradio.

python scripts/demo/navigation_ui.py

Note that itโ€™s better to run the Gradio client on a machine with a graphical user interface (GUI) but ensure there is proper network connectivity between the client and the server. Download the gradio scene assets from huggingface and extract them into the scene_assets directory of the client. Then open a browser and enter the Gradio address (such as http://0.0.0.0:5700). We can see the interface as shown below. img.png

Click the โ€˜Start Navigation Simulationโ€™ button to send a VLN request to the backend. The backend will submit a task to ray server and simulate the VLN task with InternVLA-N1 models. Wait about 2 minutes, the VLN task will be finished and return a result video. We can see the result video in the gradio like this. img.png

๐ŸŽ‰ Congratulations! You have successfully installed InternNav.

InternData-N1 Dataset Preparation#

Due to network throttling restrictions on HuggingFace, InternData-N1 has not been fully uploaded yet. Please wait patiently for several days.

We also prepare high-quality data for training system1/system2 and evaluation on isaac sim environment. To set up the dataset, please follow the steps below:

  1. Download Datasets

  1. Directory Structure

After downloading, organize the datasets into the following structure:

data/
โ”œโ”€โ”€ scene_data/
โ”‚   โ”œโ”€โ”€ mp3d_pe/
โ”‚   โ”‚   โ”œโ”€โ”€ 17DRP5sb8fy/
โ”‚   โ”‚   โ”œโ”€โ”€ 1LXtFkjw3qL/
โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚   โ”œโ”€โ”€ mp3d_ce/
โ”‚   โ”‚   โ”œโ”€โ”€ mp3d/
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ 17DRP5sb8fy/
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ 1LXtFkjw3qL/
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚   โ””โ”€โ”€ mp3d_n1/
โ”œโ”€โ”€ vln_pe/
โ”‚   โ”œโ”€โ”€ raw_data/
โ”‚   โ”‚   โ”œโ”€โ”€ train/
โ”‚   โ”‚   โ”œโ”€โ”€ val_seen/
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ val_seen.json.gz
โ”‚   โ”‚   โ””โ”€โ”€ val_unseen/
โ”‚   โ”‚       โ””โ”€โ”€ val_unseen.json.gz
โ”œโ”€โ”€ โ””โ”€โ”€ traj_data/
โ”‚       โ””โ”€โ”€ mp3d/
โ”‚           โ””โ”€โ”€ 17DRP5sb8fy/
โ”‚           โ””โ”€โ”€ 1LXtFkjw3qL/
โ”‚           โ””โ”€โ”€ ...
โ”œโ”€โ”€ vln_ce/
โ”‚   โ”œโ”€โ”€ raw_data/
โ”‚   โ”‚   โ”œโ”€โ”€ r2r
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ train
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ val_seen
โ”‚   โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ val_seen.json.gz
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ val_unseen
โ”‚   โ”‚   โ”‚       โ””โ”€โ”€ val_unseen.json.gz
โ”‚   โ””โ”€โ”€ traj_data/
โ””โ”€โ”€ vln_n1/
    โ””โ”€โ”€ traj_data/