🥇 Add a New Benchmark#

This guide walks you through adding a custom agent and custom evaluation benchmark to the InternManip framework.

1. Implement Your Model Agent#

To support a new model in InternManip, define a subclass of BaseAgent. You must implement two core methods:

  • step(): given an observation, returns an action.

  • reset(): resets internal states, if needed.

Example: Define a Custom Agent

from internmanip.agent.base import BaseAgent
from internmanip.configs import AgentCfg

class MyCustomAgent(BaseAgent):
    def __init__(self, config: AgentCfg):
        super().__init__(config)
        # Custom model initialization here

    def step(self, obs):
        # Implement forward logic here
        return action

    def reset(self):
        # Optional: reset internal state
        pass

Register Your Agent

In internmanip/agent/base.py, register your agent in the AgentRegistry:

class AgentRegistry(Enum):
    ...
    CUSTOM = "MyCustomAgent"

    @property
    def value(self):
        if self.name == "CUSTOM":
            from internmanip.agent.my_custom_agent import MyCustomAgent
            return MyCustomAgent
        ...

2. Creating a New Evaluator#

To add support for a new evaluation environment, inherit from the Evaluator base class and implement required methods:

from internmanip.evaluator.base import Evaluator
from internmanip.configs import EvalCfg

class CustomEvaluator(Evaluator):

    def __init__(self, config: EvalCfg):
        super().__init__(config)
        # Custom initialization logic
        ...

    @classmethod
    def _get_all_episodes_setting_data(cls, episodes_config_path) -> List[Any]:
        """Get all episodes setting data from the given path."""
        ...

    def eval(self):
        """The default entrypoint of the evaluation pipeline."""
        ...

3. Registering the Evaluator#

Register the new evaluator in EvaluatorRegistry under internmanip/evaluator/base.py:

# In internmanip/evaluator/base.py
class EvaluatorRegistry(Enum):
    ...
    CUSTOM = "CustomEvaluator"  # Add new evaluator

    @property
    def value(self):
        if self.name == "CUSTOM":
            from internmanip.evaluator.custom_evaluator import CustomEvaluator
            return CustomEvaluator
    ...

4. Creating Configuration Files#

Create configuration files for the new evaluator:

# scripts/eval/configs/custom_agent_on_custom_bench.py
from internmanip.configs import *
from pathlib import Path

eval_cfg = EvalCfg(
    eval_type="custom_bench",  # Corresponds to the name registered in EvaluatorRegistry
    agent=AgentCfg(
        agent_type="custom_agent", # Corresponds to the name registered in AgentRegistry
        model_name_or_path="path/to/model",
        model_kwargs={...},
        server_cfg=ServerCfg(  # Optional server configuration
            server_host="localhost",
            server_port=5000,
        ),
    ),
    env=EnvCfg(
        env_type="custom_env", # Corresponds to the name registered in EnvWrapperRegistry
        config_path="path/to/env_config.yaml",
        env_settings=CustomEnvSettings(...)
    ),
    logging_dir="logs/eval/custom",
    distributed_cfg=DistributedCfg( # Optional distributed configuration
        num_workers=4,
        ray_head_ip="auto",  # Use "auto" for local machine
        include_dashboard=True,
        dashboard_port=8265,
    )
)

5. Launch the Evaluator#

python scripts/eval/start_evaluator.py \
  --config scripts/eval/configs/custom_on_custom.py

Use --distributed for Ray-based multi-GPU, and --server for client-server mode.