# Customizing Models and Agents in InternNav This tutorial provides a detailed guide for registering new agent and model within the InternNav framework --- ## Development Overview The main architecture of the evaluation code adopts a client-server model. In the client, we specify the corresponding configuration (*.cfg), which includes settings such as the scenarios to be evaluated, robots, models, and parallelization parameters. The client sends requests to the server, which then make model to predict and response to the client. The InternNav project adopts a modular design, allowing developers to easily add new navigation algorithms. The main components include: - **Model**: Implements the specific neural network architecture and inference logic - **Agent**: Serves as a wrapper for the Model, handling environment interaction and data preprocessing - **Config**: Defines configuration parameters for the model and training ## Supported Models - InternVLA-N1 - CMA (Cross-Modal Attention) - RDP (Recurrent Diffusion Policy) - Navid (RSS2023) - Seq2Seq Policy ## Custom Model A Model is the concrete implementation of your algorithm. Implement model under `baselines/models`. A model ideally would inherit from the base model and implement the following key methods: - `forward(train_batch) -> dict(output, loss)` - `inference(obs_batch, state) -> output_for_agent` ## Create a Custom Config Class In the model file, define a `Config` class that inherits from `PretrainedConfig`. A reference implementation is `CMAModelConfig` in [`cma_model.py`](https://github.com/InternRobotics/InternNav/blob/main/internnav/model/cma/cma_policy.py). ## Registration and Integration In [`internnav/model/__init__.py`](https://github.com/InternRobotics/InternNav/blob/main/internnav/model/__init__.py): - Add the new model to `get_policy`. - Add the new model's configuration to `get_config`. ## Create a Custom Agent The Agent handles interaction with the environment, data preprocessing/postprocessing, and calls the Model for inference. A custom Agent usually inherits from [`Agent`](https://github.com/InternRobotics/InternNav/blob/main/internnav/agent/base.py) and implements the following key methods: - `reset()`: Resets the Agent's internal state (e.g., RNN states, action history). Called at the start of each episode. - `inference(obs)`: Receives environment observations `obs`, performs preprocessing (e.g., tokenizing instructions, padding), calls the model for inference, and returns an action. - `step(obs)`: The external interface, usually calls `inference`, and can include logging or timing. Example: [`CMAAgent`](https://github.com/InternRobotics/InternNav/blob/main/internnav/agent/cma_agent.py) For each step, the agent should expect an observation from environment. For the vln benchmark under internutopia: ``` action = self.agent.step(obs) ``` **obs** has format: ``` obs = [{ 'globalgps': [X, Y, Z] # robot location 'globalrotation': [X, Y, Z, W] # robot orientation in quaternion 'rgb': np.array(256, 256, 3) # rgb camera image 'depth': np.array(256, 256, 1) # depth image 'instruction': str # language instruction for the navigation task }] ``` **action** has format: ``` action = List[int] # action for each environments # 0: stop # 1: move forward # 2: turn left # 3: turn right ``` ## Registration The agent should be registered to internnav.agent, so it can be used by the name through configs. ``` from internnav.agent.base import Agent from internnav.configs.agent import AgentCfg @Agent.register('cma') class NewAgent(Agent): def __init__(self, agent_config: AgentCfg): ... ``` Make sure you also import it inside `internnav/agent/__init__.py` ``` # make the register decorator taking effect from internnav.agent.internvla_n1_agent import InternVLAN1Agent ``` ## Agent and Model Initialization Refer to existing **evaluation** config files for customization: ``` agent_cfg=AgentCfg( server_host='localhost', server_port=8023, model_name='internvla_n1', ckpt_path='', model_settings={ policy_name='InternVLAN1_Policy', state_encoder=None, }, ) ``` ## Typical Usage Example ``` from internnav.configs.agent import AgentCfg cfg = AgentCfg(server_host="127.0.0.1", server_port=8087) client = AgentClient(cfg) # step once obs = [{"rgb": ..., "depth": ..., "instruction": "go to kitchen"}] action = client.step(obs) print("Predicted action:", action) # reset agent client.reset() ```