Create Your Model and Agent#
Development Overview#
The main architecture of the evaluation code adopts a client-server model. In the client, we specify the corresponding configuration (*.cfg), which includes settings such as the scenarios to be evaluated, robots, models, and parallelization parameters. The client sends requests to the server, which then make model to predict and response to the client.
The InternNav project adopts a modular design, allowing developers to easily add new navigation algorithms. The main components include:
Model: Implements the specific neural network architecture and inference logic
Agent: Serves as a wrapper for the Model, handling environment interaction and data preprocessing
Config: Defines configuration parameters for the model and training
Custom Model#
A Model is the concrete implementation of your algorithm. Implement model under baselines/models
. A model ideally would inherit from the base model and implement the following key methods:
forward(train_batch) -> dict(output, loss)
inference(obs_batch, state) -> output_for_agent
Create a Custom Config Class#
In the model file, define a Config
class that inherits from PretrainedConfig
.
A reference implementation is CMAModelConfig
in cma_model.py
.
Registration and Integration#
In internnav/model/__init__.py
:
Add the new model to
get_policy
.Add the new model’s configuration to
get_config
.
Create a Custom Agent#
The Agent handles interaction with the environment, data preprocessing/postprocessing, and calls the Model for inference.
A custom Agent usually inherits from Agent
and implements the following key methods:
reset()
: Resets the Agent’s internal state (e.g., RNN states, action history). Called at the start of each episode.inference(obs)
: Receives environment observationsobs
, performs preprocessing (e.g., tokenizing instructions, padding), calls the model for inference, and returns an action.step(obs)
: The external interface, usually callsinference
, and can include logging or timing.
Example: CMAAgent
For each step, the agent should expect an observation from environment.
For the vln benchmark under internutopia:
action = self.agent.step(obs)
obs has format:
obs = [{
'globalgps': [X, Y, Z] # robot location
'globalrotation': [X, Y, Z, W] # robot orientation in quaternion
'rgb': np.array(256, 256, 3) # rgb camera image
'depth': np.array(256, 256, 1) # depth image
}]
action has format:
action = List[int] # action for each environments
# 0: stop
# 1: move forward
# 2: turn left
# 3: turn right
Create a Trainer#
The Trainer manages the training loop, including data loading, forward pass, loss calculation, and backpropagation.
A custom trainer usually inherits from the Base Trainer
and implements:
train_epoch()
: Runs one training epoch (batch iteration, forward pass, loss calculation, parameter update).eval_epoch()
: Evaluates the model on the validation set and records metrics.save_checkpoint()
: Saves model weights, optimizer state, and training progress.load_checkpoint()
: Loads pretrained models or resumes training.
Example: CMATrainer
shows how to handle sequence data, compute action loss, and implement imitation learning.
Training Data#
The training data is under data/vln_pe/traj_data
. Our dataset provides trajectory data collected from the H1 robot as it navigates through the task environment.
Each observation in the trajectory is paired with its corresponding action.
You may also incorporate external datasets to improve model generalization.
Evaluation Data#
In raw_data/val
, for each task, the model should guide the robot at the start position and rotation to the target position with language instruction.
Set the Corresponding Configuration#
Refer to existing training configuration files for customization:
CMA Model Config:
cma_exp_cfg
Configuration files should define:
ExpCfg
(experiment config)EvalCfg
(evaluation config)IlCfg
(imitation learning config)
Ensure your configuration is imported and registered in __init__.py
.
Key parameters include:
name
: Experiment namemodel_name
: Must match the name used during model registrationbatch_size
: Batch sizelr
: Learning rateepochs
: Number of training epochsdataset_*_root_dir
: Dataset pathslmdb_features_dir
: Feature storage path
Refer to existing evaluation config files for customization:
CMA Model Evaluation Config:
h1_cma_cfg.py
Main fields:
name
: Evaluation experiment namemodel_name
: Must match the name used during trainingckpt_to_load
: Path to the model checkpointtask
: Define the tasks settings, number of env, scene, robotsdataset
: Load r2r or interiornav datasetsplit
: Dataset split (val_seen
,val_unseen
,test
, etc.)