Create Your Model and Agent#

Development Overview#

The main architecture of the evaluation code adopts a client-server model. In the client, we specify the corresponding configuration (*.cfg), which includes settings such as the scenarios to be evaluated, robots, models, and parallelization parameters. The client sends requests to the server, which then make model to predict and response to the client.

The InternNav project adopts a modular design, allowing developers to easily add new navigation algorithms. The main components include:

  • Model: Implements the specific neural network architecture and inference logic

  • Agent: Serves as a wrapper for the Model, handling environment interaction and data preprocessing

  • Config: Defines configuration parameters for the model and training

Custom Model#

A Model is the concrete implementation of your algorithm. Implement model under baselines/models. A model ideally would inherit from the base model and implement the following key methods:

  • forward(train_batch) -> dict(output, loss)

  • inference(obs_batch, state) -> output_for_agent

Create a Custom Config Class#

In the model file, define a Config class that inherits from PretrainedConfig. A reference implementation is CMAModelConfig in cma_model.py.

Registration and Integration#

In internnav/model/__init__.py:

  • Add the new model to get_policy.

  • Add the new model’s configuration to get_config.

Create a Custom Agent#

The Agent handles interaction with the environment, data preprocessing/postprocessing, and calls the Model for inference. A custom Agent usually inherits from Agent and implements the following key methods:

  • reset(): Resets the Agent’s internal state (e.g., RNN states, action history). Called at the start of each episode.

  • inference(obs): Receives environment observations obs, performs preprocessing (e.g., tokenizing instructions, padding), calls the model for inference, and returns an action.

  • step(obs): The external interface, usually calls inference, and can include logging or timing.

Example: CMAAgent

For each step, the agent should expect an observation from environment.

For the vln benchmark under internutopia:

action = self.agent.step(obs)

obs has format:

obs = [{
    'globalgps': [X, Y, Z]              # robot location
    'globalrotation': [X, Y, Z, W]      # robot orientation in quaternion
    'rgb': np.array(256, 256, 3)        # rgb camera image
    'depth': np.array(256, 256, 1)      # depth image
}]

action has format:

action = List[int]                      # action for each environments
# 0: stop
# 1: move forward
# 2: turn left
# 3: turn right

Create a Trainer#

The Trainer manages the training loop, including data loading, forward pass, loss calculation, and backpropagation. A custom trainer usually inherits from the Base Trainer and implements:

  • train_epoch(): Runs one training epoch (batch iteration, forward pass, loss calculation, parameter update).

  • eval_epoch(): Evaluates the model on the validation set and records metrics.

  • save_checkpoint(): Saves model weights, optimizer state, and training progress.

  • load_checkpoint(): Loads pretrained models or resumes training.

Example: CMATrainer shows how to handle sequence data, compute action loss, and implement imitation learning.

Training Data#

The training data is under data/vln_pe/traj_data. Our dataset provides trajectory data collected from the H1 robot as it navigates through the task environment. Each observation in the trajectory is paired with its corresponding action.

You may also incorporate external datasets to improve model generalization.

Evaluation Data#

In raw_data/val, for each task, the model should guide the robot at the start position and rotation to the target position with language instruction.

Set the Corresponding Configuration#

Refer to existing training configuration files for customization:

Configuration files should define:

  • ExpCfg (experiment config)

  • EvalCfg (evaluation config)

  • IlCfg (imitation learning config)

Ensure your configuration is imported and registered in __init__.py.

Key parameters include:

  • name: Experiment name

  • model_name: Must match the name used during model registration

  • batch_size: Batch size

  • lr: Learning rate

  • epochs: Number of training epochs

  • dataset_*_root_dir: Dataset paths

  • lmdb_features_dir: Feature storage path

Refer to existing evaluation config files for customization:

Main fields:

  • name: Evaluation experiment name

  • model_name: Must match the name used during training

  • ckpt_to_load: Path to the model checkpoint

  • task: Define the tasks settings, number of env, scene, robots

  • dataset: Load r2r or interiornav dataset

  • split: Dataset split (val_seen, val_unseen, test, etc.)