Installation Guide#
๐ Donโt worry โ both Quick Installation and Dataset Preparation are beginner-friendly.
Prerequisites#
InternNav works across most hardware setups. Just note the following exceptions:
Benchmark based on Isaac Sim such as VN and VLN-PE benchmarks must run on NVIDIA RTX series GPUs (e.g., RTX 4090).
Simulation Requirements#
OS: Ubuntu 20.04/22.04
GPU Compatibility:
GPU | Model Training & Inference | Simulation | ||
VLN-CE | VN | VLN-PE | ||
NVIDIA RTX Series (Driver: 535.216.01+ ) |
โ | โ | โ | โ |
NVIDIA V/A/H100 | โ | โ | โ | โ |
Note
We provide a flexible installation tool for users who want to use InternNav for different purposes. Users can choose to install the training and inference environment, and the individual simulation environment independently.
Model-Specific Requirements#
Models | Minimum GPU Requirement |
System RAM (Train/Inference) |
|
Training | Inference | ||
StreamVLN & InternVLA-N1 | A100 | RTX 4090 / A100 | 80GB / 24GB |
NavDP (VN Models) | RTX 4090 / A100 | RTX 3060 / A100 | 16GB / 2GB |
CMA (VLN-PE Small Models) | RTX 4090 / A100 | RTX 3060 / A100 | 8GB / 1GB |
Quick Installation#
Our toolchain provides two Python environment solutions to accommodate different usage scenarios with the InternNav-N1 series model:
For quick trials and evaluations of the InternNav-N1 model, we recommend using the Habitat environment. This option offer allowing you to quickly test and eval the InternVLA-N1 models with minimal configuration.
If you require high-fidelity rendering, training capabilities, and physical property evaluations within the environment, we suggest using the Isaac Sim environment. This solution provides enhanced graphical rendering and more accurate physics simulations for comprehensive testing.
Choose the environment that best fits your specific needs to optimize your experience with the InternNav-N1 model. Note that both environments support the training of the system1 model NavDP.
Isaac Sim Environment#
Prerequisite#
Ubuntu 20.04, 22.04
Conda
Python 3.10.16 (3.10.* should be ok)
NVIDIA Omniverse Isaac Sim 4.5.0
NVIDIA GPU (RTX 2070 or higher)
NVIDIA GPU Driver (recommended version 535.216.01+)
PyTorch 2.5.1, 2.6.0 (recommended)
CUDA 11.8, 12.4 (recommended)
Docker (Optional)
NVIDIA Container Toolkit (Optional)
Before proceeding with the installation, ensure that you have Isaac Sim 4.5.0 and Conda installed.
To help you get started quickly, weโve prepared a Docker image pre-configured with Isaac Sim 4.5 and InternUtopia. You can pull the image and run evaluations in the container using the following command:
docker pull registry.cn-hangzhou.aliyuncs.com/internutopia/internutopia:2.2.0
docker run -it --name internutopia-container registry.cn-hangzhou.aliyuncs.com/internutopia/internutopia:2.2.0
Conda installation#
$ conda create -n <env> python=3.10 libxcb=1.14
# Install InternUtopia through pip.(2.1.1 and 2.2.0 recommended)
$ conda activate <env>
$ pip install internutopia
# Configure the conda environment.
$ python -m internutopia.setup_conda_pypi
$ conda deactivate && conda activate <env>
For InternUtopia installation, you can find more detailed docs in InternUtopia.
# Install PyTorch based on your CUDA version
$ pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu118
# Install other deps
$ pip install -r isaac_requirements.txt
If you need to train or evaluate models on Habitat without physics simulation, we recommend the following setup and easier environment installation.
Habitat Environment#
Prerequisite#
Python 3.9
Pytorch 2.1.2
CUDA 12.4
GPU: NVIDIA A100 or higher (optional for VLA training)
conda install habitat-sim==0.2.4 withbullet headless -c conda-forge -c aihabitat
git clone --branch v0.2.4 https://github.com/facebookresearch/habitat-lab.git
cd habitat-lab
pip install -e habitat-lab # install habitat_lab
pip install -e habitat-baselines # install habitat_baselines
pip install -r habitat_requirements.txt
Verification#
Please download our latest pretrained checkpoint of InternVLA-N1 and run the following script to inference with visualization results. Move the checkpoint to the checkpoints
directory. Download the VLN-CE dataset from huggingface. The final folder structure should look like this:
InternNav/
|-- data/
| |-- datasets
|-- vln
|-- vln_datasets
|-- scene_datasets
|-- hm3d
|-- mp3d
|-- src/
| |-- ...
|-- checkpoints/
| |-- InternVLA-N1/
| | |-- model-00001-of-00004.safetensors
| | |-- config.json
| | |-- ...
| |-- InternVLA-N1-S2
| | |-- model-00001-of-00004.safetensors
| | |-- config.json
| | |-- ...
Replace the โmodel_pathโ variable in โvln_ray_backend.pyโ with the path of InternVLA-N1 checkpoint.
srun -p {partition_name} --cpus-per-task 16 --gres gpu:1 python3 scripts/eval/vln_ray_backend.py
Find the IP address of the node allocated by Slurm. Then change the BACKEND_URL in the gradio client (navigation_ui.py) to the serverโs IP address. Start the gradio.
python navigation_ui.py
Note that itโs better to run the Gradio client on a machine with a graphical user interface (GUI) but ensure there is proper network connectivity between the client and the server. Then open a browser and enter the Gradio address (such as http://0.0.0.0:5700). We can see the interface as shown below.
Click the โStart Navigation Simulationโ button to send a VLN request to the backend. The backend will submit a task to ray server and simulate the VLN task with InternVLA-N1 models. Wait about 3 minutes, the VLN task will be finished and return a result video. We can see the result video in the gradio like this.
๐ Congratulations! You have successfully installed InternNav.
Dataset Preparation#
We also prepare high-quality data for trainning system1/system2. To set up the trainning dataset, please follow the steps below:
Download Datasets
Download the InternData-N1 for:
vln_pe/
vln_ce/
vln_n1/
Download the SceneData-N1 for the
scene_data/
.
Directory Structure
After downloading, organize the datasets into the following structure:
data/
โโโ scene_data/
โ โโโ mp3d_pe/
โ โ โโโ17DRP5sb8fy/
โ โ โโโ 1LXtFkjw3qL/
โ โ โโโ ...
โ โโโ mp3d_ce/
โ โโโ mp3d_n1/
โโโ vln_pe/
โ โโโ raw_data/
โ โ โโโ train/
โ โ โโโ val_seen/
โ โ โ โโโ val_seen.json.gz
โ โ โโโ val_unseen/
โ โ โโโ val_unseen.json.gz
โโโ โโโ traj_data/
โ โโโ mp3d/
โ โโโ trajectory_0/
โ โโโ data/
โ โโโ meta/
โ โโโ videos/
โโโ vln_ce/
โ โโโ raw_data/
โ โโโ traj_data/
โโโ vln_n1/
โโโ traj_data/