🥇 Add a New Benchmark#

This guide walks you through adding a custom benchmark into the InternManip framework, including defining your own Agent and Evaluator classes, as well as registering and launching them.

1. Define a Custom Agent#

In the updated design, an Agent is tied to the benchmark (evaluation environment) rather than to a specific policy model. It is responsible for interfacing between the environment and the control policy, handling observation preprocessing and action postprocessing, and coordinating resets.

All agents must inherit from BaseAgent and implement the following two methods:

  • step(): given an observation, returns an action.

  • reset(): resets internal states, if needed.

Example: Define a Custom Agent

from internmanip.agent.base import BaseAgent
from internmanip.configs import AgentCfg

class MyCustomAgent(BaseAgent):
    def __init__(self, config: AgentCfg):
        super().__init__(config)
        # Custom model initialization here

    def step(self, obs):
        # Implement forward logic here
        return action

    def reset(self):
        # Optional: reset internal state
        pass

Register Your Agent

In internmanip/agent/base.py, register your agent in the AgentRegistry:

class AgentRegistry(Enum):
    ...
    CUSTOM = "MyCustomAgent"

    @property
    def value(self):
        if self.name == "CUSTOM":
            from internmanip.agent.my_custom_agent import MyCustomAgent
            return MyCustomAgent
        ...

2. Creating a New Evaluator#

To add support for a new evaluation environment, inherit from the Evaluator base class and implement required methods:

from internmanip.evaluator.base import Evaluator
from internmanip.configs import EvalCfg

class CustomEvaluator(Evaluator):

    def __init__(self, config: EvalCfg):
        super().__init__(config)
        # Custom initialization logic
        ...

    @classmethod
    def _get_all_episodes_setting_data(cls, episodes_config_path) -> List[Any]:
        """Get all episodes setting data from the given path."""
        ...

    def eval(self):
        """The default entrypoint of the evaluation pipeline."""
        ...

3. Registering the Evaluator#

Register the new evaluator in EvaluatorRegistry under internmanip/evaluator/base.py:

# In internmanip/evaluator/base.py
class EvaluatorRegistry(Enum):
    ...
    CUSTOM = "CustomEvaluator"  # Add new evaluator

    @property
    def value(self):
        if self.name == "CUSTOM":
            from internmanip.evaluator.custom_evaluator import CustomEvaluator
            return CustomEvaluator
    ...

4. Creating Configuration Files#

Create configuration files for the new evaluator:

# scripts/eval/configs/custom_agent_on_custom_bench.py
from internmanip.configs import *
from pathlib import Path

eval_cfg = EvalCfg(
    eval_type="custom_bench",  # Corresponds to the name registered in EvaluatorRegistry
    agent=AgentCfg(
        agent_type="custom_agent", # Corresponds to the name registered in AgentRegistry
        base_model_path="path/to/model",
        agent_settings={...},
        model_kwargs={
            'HF_cache_dir': None,
        },
        server_cfg=ServerCfg(  # Optional server configuration
            server_host="localhost",
            server_port=5000,
        ),
    ),
    env=EnvCfg(
        env_type="custom_env", # Corresponds to the name registered in EnvWrapperRegistry
        config_path="path/to/env_config.yaml",
        env_settings=CustomEnvSettings(...)
    ),
    logging_dir="logs/eval/custom",
    distributed_cfg=DistributedCfg( # Optional distributed configuration
        num_workers=4,
        ray_head_ip="auto",  # Use "auto" for local machine
        include_dashboard=True,
        dashboard_port=8265,
    )
)

5. Launch the Evaluator#

python scripts/eval/start_evaluator.py \
  --config scripts/eval/configs/custom_on_custom.py

💡 Use --server for client-server mode, and --distributed for Ray-based multi-GPU (WIP).