Environment#

This document provides the guide of environment installation, config parameters description and I/O Specifications.

Table of Contents#

Calvin
Simpler-env
Genmanip

Calvin (WIP)#

Simpler-env (WIP)#

Genmanip#

1. Environment Dependency Installation (Conda)#

Step 1‌: Verify installation of Anaconda and Isaac Sim 4.5.0. You can download Isaac Sim 4.5.0 from here and extract to target directory.

Step 2‌: Execute installation script:

./install.sh --genmanip

# !Note
# - When prompted, provide Isaac Sim 4.5 folder path
# - Post-installation, activate Conda environment: `conda activate genmanip`

2. Evaluation Environment Configuration Parameters#

Evaluation requires configuration file declaring evalcfg.env.env_setting as GenmanipEnvSettings implementation:

eval_cfg = EvalCfg(
    eval_type="genmanip",
    agent=AgentCfg(...),
    env=EnvCfg(
        env_type="genmanip",
        env_settings=GenmanipEnvSettings(
            dataset_path="path/to/genmanip/benchmark_data",
            eval_tasks=[],
            res_save_path="path/to/save/results",
            is_save_img=False,
            robot_type="aloha_split",
            gripper_type="panda",
            franka_camera_enable=FrankaCameraEnable(
                realsense=False, obs_camera=False, obs_camera_2=False
            ),
            aloha_split_camera_enable=AlohaSplitCameraEnable(
                top_camera=False, left_camera=False, right_camera=False
            ),
            depth_obs=False,
            max_step=1000,
            max_success_step=100,
            env_num=1,
            physics_dt=1/30,
            rendering_dt=1/30,
            headless=True,
            ray_distribution=None,
        )
    ),
    ...
)

Core Parameters

Parameter	Type	Default	Description
`dataset_path`	str	None	If None, it will be automatically downloaded from Hugging Face dataset URL on the first run.
`eval_tasks`	list[str]	[]	The relative path to the task folder under dataset_path. If `[]`, all tasks in the `dataset_path` will be automatically detected.
`res_save_path`	str	None	Evaluation results storage directory (disabled if None).
`is_save_img`	bool	False	Enable per-step multi-camera image capture (requires disk space)
`robote_type`	enum	“aloha_split”	robot selection (`"franka"` or `"aloha_split"`)
`gripper_type`	enum	“panda”	End effector selection when robote_type is `franka` (`"panda"` or `"robotiq"`)
`franka_camera_enable`	FrankaCameraEnable	`FrankaCameraEnable(` `realsense=False,` `obs_camera=False,` `obs_camera_2=False)`	Camera activation config: • `realsense`: Ego-view gripper cam • `obs_camera`: Fixed rear-view cam • `obs_camera_2`: Fixed front-view cam Note that this only works when robote_type is `franka`
`aloha_split_camera_enable`	AlohaSplitCameraEnable	`AlohaSplitCameraEnable(` `top_camera=False,` `left_camera=False,` `right_camera=False)`	Camera activation config: • `top_camera`: Ego-view top head cam • `left_camera`: Ego-view left gripper cam • `right_camera`: Ego-view right gripper cam Note that this only works when robote_type is `aloha_split`
`depth_obs`	bool	False	Generate depth maps for active cameras
`max_step`	int	1000	Episode termination step threshold
`headless`	bool	True	Disable GUI

Advanced Parameters

Parameter	Scope
`max_success_step`	Early termination upon task completion
`physics_dt`	Physics simulation timestep
`rendering_dt`	Render interval
`env_num`	Concurrent environment instances in one isaac sim
`ray_distribution`	Multi-process config (`RayDistributionCfg`): • `proc_num`: Process count • `gpu_num_per_proc`: GPUs per process # e.g., `RayDistributionCfg(proc_num=2, gpu_num_per_proc=0.5, head_address=None, working_dir=None)`

Concurrency Note: When env_num > 1 or ray_distribution.proc_num > 1, environment outputs become multi-instance tensors. Agents must process batched observations and return batched actions.

3. I/O Specifications: Environment Outputs and Action Data Formats#

3.1 when robote_type is franka#

Observation Structure

observations: List[Dict] = [
    {
        "robot": {
            "robot_pose": Tuple[array, array], # (position, oritention(quaternion: (w, x, y, z)))
            "joints_state": {
                "positions": array, # (9,) or (13,) -> panda or robotiq
                "velocities": array # (9,) or (13,) -> panda or robotiq
            },
            "eef_pose": Tuple[array, array], # (position, oritention(quaternion: (w, x, y, z)))
            "sensors": {
                "realsense": {
                    "rgb": array, # uint8 (480, 640, 3)
                    "depth": array, # float32 (480, 640)
                },
                "obs_camera": {
                    "rgb": array,
                    "depth": array,
                },
                "obs_camera_2": {
                    "rgb": array,
                    "depth": array,
                },
            },
            "instruction": str,
            "metric": {
                "task_name": str,
                "episode_name": str,
                "episode_sr": int,
                "first_success_step": int,
                "episode_step": int
            },
            "step": int,
            "render": bool
        }
    },
    ...
]

Action Space Specifications Agents must output List[Union[List[float], dict]] = [action_1, action_2 , ...] of the same length as the input observations.

actions: List[Union[List[float], dict]] = [
    action_1,
    action_2,
    ...
]

The action_x supports any of the following formats:

ActionFormat1:

List[float] # (9,) or (13,) -> panda or robotiq

ActionFormat2:

{
    'arm_action': List[float], # (7,)
    'gripper_action': Union[List[float], int], # (2,) or (6,) -> panda or robotiq || -1 or 1 -> open or close
}

ActionFormat3:

{
    'eef_position': List[float], # (3,) -> (x, y, z)
    'eef_orientation': List[float], # (4,) -> (quaternion: (w, x, y, z))
    'gripper_action': Union[List[float], int], # (2,) or (6,) -> panda or robotiq || -1 or 1 -> open or close
}

3.2 when robote_type is aloha_split#