Installation Guide#

๐Ÿ˜„ Donโ€™t worry โ€” both Quick Installation and Dataset Preparation are beginner-friendly.

Prerequisites#

InternNav works across most hardware setups. Just note the following exceptions:

  • Benchmark based on Isaac Sim such as VN and VLN-PE benchmarks must run on NVIDIA RTX series GPUs (e.g., RTX 4090).

Simulation Requirements#

  • OS: Ubuntu 20.04/22.04

  • GPU Compatibility:

GPU Model Training & Inference Simulation
VLN-CE VN VLN-PE
NVIDIA RTX Series
(Driver: 535.216.01+ )
โœ… โœ… โœ… โœ…
NVIDIA V/A/H100 โœ… โœ… โŒ โŒ

Note

We provide a flexible installation tool for users who want to use InternNav for different purposes. Users can choose to install the training and inference environment, and the individual simulation environment independently.

Model-Specific Requirements#

Models Minimum GPU Requirement System RAM
(Train/Inference)
Training Inference
StreamVLN & InternVLA-N1 A100 RTX 4090 / A100 80GB / 24GB
NavDP (VN Models) RTX 4090 / A100 RTX 3060 / A100 16GB / 2GB
CMA (VLN-PE Small Models) RTX 4090 / A100 RTX 3060 / A100 8GB / 1GB

Quick Installation#

Our toolchain provides two Python environment solutions to accommodate different usage scenarios with the InternNav-N1 series model:

  • For quick trials and evaluations of the InternNav-N1 model, we recommend using the Habitat environment. This option offer allowing you to quickly test and eval the InternVLA-N1 models with minimal configuration.

  • If you require high-fidelity rendering, training capabilities, and physical property evaluations within the environment, we suggest using the Isaac Sim environment. This solution provides enhanced graphical rendering and more accurate physics simulations for comprehensive testing.

Choose the environment that best fits your specific needs to optimize your experience with the InternNav-N1 model. Note that both environments support the training of the system1 model NavDP.

Isaac Sim Environment#

Prerequisite#

  • Ubuntu 20.04, 22.04

  • Conda

  • Python 3.10.16 (3.10.* should be ok)

  • NVIDIA Omniverse Isaac Sim 4.5.0

  • NVIDIA GPU (RTX 2070 or higher)

  • NVIDIA GPU Driver (recommended version 535.216.01+)

  • PyTorch 2.5.1, 2.6.0 (recommended)

  • CUDA 11.8, 12.4 (recommended)

  • Docker (Optional)

  • NVIDIA Container Toolkit (Optional)

Before proceeding with the installation, ensure that you have Isaac Sim 4.5.0 and Conda installed.

To help you get started quickly, weโ€™ve prepared a Docker image pre-configured with Isaac Sim 4.5 and InternUtopia. You can pull the image and run evaluations in the container using the following command:

docker pull registry.cn-hangzhou.aliyuncs.com/internutopia/internutopia:2.2.0
docker run -it --name internutopia-container registry.cn-hangzhou.aliyuncs.com/internutopia/internutopia:2.2.0

Conda installation#

$ conda create -n <env> python=3.10 libxcb=1.14

# Install InternUtopia through pip.(2.1.1 and 2.2.0 recommended)
$ conda activate <env>
$ pip install internutopia

# Configure the conda environment.
$ python -m internutopia.setup_conda_pypi
$ conda deactivate && conda activate <env>

For InternUtopia installation, you can find more detailed docs in InternUtopia.

# Install PyTorch based on your CUDA version
$ pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu118

# Install other deps
$ pip install -r isaac_requirements.txt

If you need to train or evaluate models on Habitat without physics simulation, we recommend the following setup and easier environment installation.

Habitat Environment#

Prerequisite#

  • Python 3.9

  • Pytorch 2.1.2

  • CUDA 12.4

  • GPU: NVIDIA A100 or higher (optional for VLA training)

conda install habitat-sim==0.2.4 withbullet headless -c conda-forge -c aihabitat
git clone --branch v0.2.4 https://github.com/facebookresearch/habitat-lab.git
cd habitat-lab
pip install -e habitat-lab  # install habitat_lab
pip install -e habitat-baselines # install habitat_baselines
pip install -r habitat_requirements.txt

Verification#

Please download our latest pretrained checkpoint of InternVLA-N1 and run the following script to inference with visualization results. Move the checkpoint to the checkpoints directory. Download the VLN-CE dataset from huggingface. The final folder structure should look like this:

InternNav/
|-- data/
|   |-- datasets
        |-- vln
        |-- vln_datasets
    |-- scene_datasets
    |-- hm3d
    |-- mp3d

|-- src/
|   |-- ...

|-- checkpoints/
|   |-- InternVLA-N1/
|   |   |-- model-00001-of-00004.safetensors
|   |   |-- config.json
|   |   |-- ...
|   |-- InternVLA-N1-S2
|   |   |-- model-00001-of-00004.safetensors
|   |   |-- config.json
|   |   |-- ...

Replace the โ€˜model_pathโ€™ variable in โ€˜vln_ray_backend.pyโ€™ with the path of InternVLA-N1 checkpoint.

srun -p {partition_name} --cpus-per-task 16 --gres gpu:1 python3 scripts/eval/vln_ray_backend.py

Find the IP address of the node allocated by Slurm. Then change the BACKEND_URL in the gradio client (navigation_ui.py) to the serverโ€™s IP address. Start the gradio.

python navigation_ui.py

Note that itโ€™s better to run the Gradio client on a machine with a graphical user interface (GUI) but ensure there is proper network connectivity between the client and the server. Then open a browser and enter the Gradio address (such as http://0.0.0.0:5700). We can see the interface as shown below. img.png

Click the โ€˜Start Navigation Simulationโ€™ button to send a VLN request to the backend. The backend will submit a task to ray server and simulate the VLN task with InternVLA-N1 models. Wait about 3 minutes, the VLN task will be finished and return a result video. We can see the result video in the gradio like this. img.png

๐ŸŽ‰ Congratulations! You have successfully installed InternNav.

Dataset Preparation#

We also prepare high-quality data for trainning system1/system2. To set up the trainning dataset, please follow the steps below:

  1. Download Datasets

  1. Directory Structure

After downloading, organize the datasets into the following structure:

data/
โ”œโ”€โ”€ scene_data/
โ”‚   โ”œโ”€โ”€ mp3d_pe/
โ”‚   โ”‚   โ”œโ”€โ”€17DRP5sb8fy/
โ”‚   โ”‚   โ”œโ”€โ”€ 1LXtFkjw3qL/
โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚   โ”œโ”€โ”€ mp3d_ce/
โ”‚   โ””โ”€โ”€ mp3d_n1/
โ”œโ”€โ”€ vln_pe/
โ”‚   โ”œโ”€โ”€ raw_data/
โ”‚   โ”‚   โ”œโ”€โ”€ train/
โ”‚   โ”‚   โ”œโ”€โ”€ val_seen/
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ val_seen.json.gz
โ”‚   โ”‚   โ””โ”€โ”€ val_unseen/
โ”‚   โ”‚       โ””โ”€โ”€ val_unseen.json.gz
โ”œโ”€โ”€ โ””โ”€โ”€ traj_data/
โ”‚       โ””โ”€โ”€ mp3d/
โ”‚           โ””โ”€โ”€ trajectory_0/
โ”‚               โ”œโ”€โ”€ data/
โ”‚               โ”œโ”€โ”€ meta/
โ”‚               โ””โ”€โ”€ videos/
โ”œโ”€โ”€ vln_ce/
โ”‚   โ”œโ”€โ”€ raw_data/
โ”‚   โ””โ”€โ”€ traj_data/
โ””โ”€โ”€ vln_n1/
    โ””โ”€โ”€ traj_data/