Installation Guide#

๐Ÿ˜„ Donโ€™t worry โ€” both Quick Installation and Dataset Preparation are beginner-friendly.

Detailed technical report will be released in about two weeks.

Prerequisites#

InternNav works across most hardware setups. Just note the following exceptions:

  • Benchmark based on Isaac Sim such as VN and VLN-PE benchmarks must run on NVIDIA RTX series GPUs (e.g., RTX 4090).

Simulation Requirements#

  • OS: Ubuntu 20.04/22.04

  • GPU Compatibility:

GPU Model Training & Inference Simulation
VLN-CE VN VLN-PE
NVIDIA RTX Series
(Driver: 535.216.01+ )
โœ… โœ… โœ… โœ…
NVIDIA V/A/H100 โœ… โœ… โŒ โŒ

Note

We provide a flexible installation tool for users who want to use InternNav for different purposes. Users can choose to install the training and inference environment, and the individual simulation environment independently.

Model-Specific Requirements#

Models Minimum GPU Requirement System RAM
(Train/Inference)
Training Inference
StreamVLN & InternVLA-N1 A100 RTX 4090 / A100 80GB / 24GB
NavDP (VN Models) RTX 4090 / A100 RTX 3060 / A100 16GB / 2GB
CMA (VLN-PE Small Models) RTX 4090 / A100 RTX 3060 / A100 8GB / 1GB

Quick Installation#

Clone the InternNav repository:

git clone https://github.com/InternRobotics/InternNav.git --recursive

Our toolchain provides two Python environment solutions to accommodate different usage scenarios with the InternNav-N1 series model:

  • For quick trials and evaluations of the InternNav-N1 model, we recommend using the Habitat environment. This option offer allowing you to quickly test and eval the InternVLA-N1 models with minimal configuration.

  • If you require high-fidelity rendering, training capabilities, and physical property evaluations within the environment, we suggest using the Isaac Sim environment. This solution provides enhanced graphical rendering and more accurate physics simulations for comprehensive testing.

Choose the environment that best fits your specific needs to optimize your experience with the InternNav-N1 model. Note that both environments support the training of the system1 model NavDP.

Isaac Sim Environment#

Prerequisite#

  • Ubuntu 20.04, 22.04

  • Python 3.10.16 (3.10.* should be ok)

  • NVIDIA Omniverse Isaac Sim 4.5.0

  • NVIDIA GPU (RTX 2070 or higher)

  • NVIDIA GPU Driver (recommended version 535.216.01+)

  • PyTorch 2.5.1, 2.6.0 (recommended)

  • CUDA 11.8, 12.4 (recommended)

Before proceeding with the installation, ensure that you have Isaac Sim 4.5.0 and Conda installed.

Conda installation#

conda create -n <env> python=3.10 libxcb=1.14

# Install InternUtopia through pip.(2.1.1 and 2.2.0 recommended)
conda activate <env>
pip install internutopia

# Configure the conda environment.
python -m internutopia.setup_conda_pypi
conda deactivate && conda activate <env>

For InternUtopia installation, you can find more detailed docs in InternUtopia.

# Install PyTorch based on your CUDA version
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu118

# Install other deps
pip install -r requirements/isaac_requirements.txt

If you need to train or evaluate models on Habitat without physics simulation, we recommend the following setup and easier environment installation.

Habitat Environment#

Prerequisite#

  • Python 3.9

  • Pytorch 2.1.2

  • CUDA 12.4

  • GPU: NVIDIA A100 or higher (optional for VLA training)

conda create -n <env> python=3.9
conda activate <env>

Install habitat sim and habitat lab:

conda install habitat-sim==0.2.4 withbullet headless -c conda-forge -c aihabitat
git clone --branch v0.2.4 https://github.com/facebookresearch/habitat-lab.git
cd habitat-lab
pip install -e habitat-lab  # install habitat_lab
pip install -e habitat-baselines # install habitat_baselines

Install pytorch and other requirements:

pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url โ€‹https://download.pytorch.org/whl/cu124
pip install -r requirements/habitat_requirements.txt

Verification#

Data/Checkpoints Preparation#

To get started, we need to prepare the data and checkpoints.

  1. InternVLA-N1 pretrained Checkpoints Please download our latest pretrained checkpoint of InternVLA-N1 and run the following script to inference with visualization results. Move the checkpoint to the checkpoints directory.

  2. DepthAnything v2 Checkpoints Please download the depthanything v2 pretrained checkpoint. Move the checkpoint to the checkpoints directory.

  3. InternData-N1 VLN-CE Episodes Download the InternData-N1 for vln-ce. Extract them into the data/vln_ce/ directory.

  4. Scene-N1 Download the SceneData-N1 for mp3d_ce. Extract them into the data/scene_data/ directory.

The final folder structure should look like this:

InternNav/
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ vln_ce/
โ”‚   โ”‚   โ”œโ”€โ”€ raw_data/
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ r2r
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ train
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ val_seen
โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ val_seen.json.gz
โ”‚   โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ val_unseen
โ”‚   โ”‚   โ”‚   โ”‚       โ””โ”€โ”€ val_unseen.json.gz
โ”‚   โ”‚   โ””โ”€โ”€ traj_data/
โ”‚   โ”œโ”€โ”€ scene_data/
โ”‚   โ”‚   โ”œโ”€โ”€ mp3d_ce/
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ mp3d/
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ 17DRP5sb8fy/
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ 1LXtFkjw3qL/
โ”‚   โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ ...

โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ ...

โ”œโ”€โ”€ checkpoints/
โ”‚   โ”œโ”€โ”€ InternVLA-N1/
โ”‚   โ”‚   โ”œโ”€โ”€ model-00001-of-00004.safetensors
โ”‚   โ”‚   โ”œโ”€โ”€ config.json
โ”‚   โ”‚   โ”œโ”€โ”€ ...
โ”‚   โ”œโ”€โ”€ InternVLA-N1-S2
โ”‚   โ”‚   โ”œโ”€โ”€ model-00001-of-00004.safetensors
โ”‚   โ”‚   โ”œโ”€โ”€ config.json
โ”‚   โ”‚   โ”œโ”€โ”€ ...
โ”‚   โ”‚   depth_anything_v2_vits.pth

Gradio demo#

Currently the gradio demo is only available in habitat environment. Replace the โ€˜model_pathโ€™ variable in โ€˜vln_ray_backend.pyโ€™ with the path of InternVLA-N1 checkpoint.

conda activate <habitat-env>
srun -p {partition_name} --cpus-per-task 16 --gres gpu:1 python3 scripts/eval/vln_ray_backend.py

Find the IP address of the node allocated by Slurm. Then change the BACKEND_URL in the gradio client (navigation_ui.py) to the serverโ€™s IP address. Start the gradio.

python scripts/eval/navigation_ui.py

Note that itโ€™s better to run the Gradio client on a machine with a graphical user interface (GUI) but ensure there is proper network connectivity between the client and the server. Download the gradio scene assets from huggingface and extract them into the scene_assets directory of the client. Then open a browser and enter the Gradio address (such as http://0.0.0.0:5700). We can see the interface as shown below. img.png

Click the โ€˜Start Navigation Simulationโ€™ button to send a VLN request to the backend. The backend will submit a task to ray server and simulate the VLN task with InternVLA-N1 models. Wait about 1 minutes, the VLN task will be finished and return a result video. We can see the result video in the gradio like this. img.png

๐ŸŽ‰ Congratulations! You have successfully installed InternNav.

InternData-N1 Dataset Preparation#

Due to network throttling restrictions on HuggingFace, InternData-N1 has not been fully uploaded yet. Please wait patiently for several days.

We also prepare high-quality data for training system1/system2 and evaluation on isaac sim environment. To set up the dataset, please follow the steps below:

  1. Download Datasets

  1. Directory Structure

After downloading, organize the datasets into the following structure:

data/
โ”œโ”€โ”€ scene_data/
โ”‚   โ”œโ”€โ”€ mp3d_pe/
โ”‚   โ”‚   โ”œโ”€โ”€ 17DRP5sb8fy/
โ”‚   โ”‚   โ”œโ”€โ”€ 1LXtFkjw3qL/
โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚   โ”œโ”€โ”€ mp3d_ce/
โ”‚   โ”‚   โ”œโ”€โ”€ mp3d/
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ 17DRP5sb8fy/
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ 1LXtFkjw3qL/
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚   โ””โ”€โ”€ mp3d_n1/
โ”œโ”€โ”€ vln_pe/
โ”‚   โ”œโ”€โ”€ raw_data/
โ”‚   โ”‚   โ”œโ”€โ”€ train/
โ”‚   โ”‚   โ”œโ”€โ”€ val_seen/
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ val_seen.json.gz
โ”‚   โ”‚   โ””โ”€โ”€ val_unseen/
โ”‚   โ”‚       โ””โ”€โ”€ val_unseen.json.gz
โ”œโ”€โ”€ โ””โ”€โ”€ traj_data/
โ”‚       โ””โ”€โ”€ mp3d/
โ”‚           โ””โ”€โ”€ 17DRP5sb8fy/
โ”‚           โ””โ”€โ”€ 1LXtFkjw3qL/
โ”‚           โ””โ”€โ”€ ...
โ”œโ”€โ”€ vln_ce/
โ”‚   โ”œโ”€โ”€ raw_data/
โ”‚   โ”‚   โ”œโ”€โ”€ r2r
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ train
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ val_seen
โ”‚   โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ val_seen.json.gz
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ val_unseen
โ”‚   โ”‚   โ”‚       โ””โ”€โ”€ val_unseen.json.gz
โ”‚   โ””โ”€โ”€ traj_data/
โ””โ”€โ”€ vln_n1/
    โ””โ”€โ”€ traj_data/