Environment#
This document provides the guide of environment installation, config parameters description and I/O Specifications.
Table of Contents#
Calvin (WIP)#
Simpler-env (WIP)#
Genmanip#
1. Environment Dependency Installation (Conda)#
Step 1: Verify installation of Anaconda and Isaac Sim 4.5.0. You can download Isaac Sim 4.5.0 from here and extract to target directory.
Step 2: Execute installation script:
./install.sh --genmanip
# !Note
# - When prompted, provide Isaac Sim 4.5 folder path
# - Post-installation, activate Conda environment: `conda activate genmanip`
2. Evaluation Environment Configuration Parameters#
Evaluation requires configuration file declaring evalcfg.env.env_setting
as GenmanipEnvSettings
implementation:
eval_cfg = EvalCfg(
eval_type="genmanip",
agent=AgentCfg(...),
env=EnvCfg(
env_type="genmanip",
env_settings=GenmanipEnvSettings(
dataset_path="path/to/genmanip/benchmark_data",
eval_tasks=[],
res_save_path="path/to/save/results",
is_save_img=False,
robot_type="aloha_split",
gripper_type="panda",
franka_camera_enable=FrankaCameraEnable(
realsense=False, obs_camera=False, obs_camera_2=False
),
aloha_split_camera_enable=AlohaSplitCameraEnable(
top_camera=False, left_camera=False, right_camera=False
),
depth_obs=False,
max_step=1000,
max_success_step=100,
env_num=1,
physics_dt=1/30,
rendering_dt=1/30,
headless=True,
ray_distribution=None,
)
),
...
)
Core Parameters
Parameter |
Type |
Default |
Description |
---|---|---|---|
|
str |
None |
If None, it will be automatically downloaded from Hugging Face dataset URL on the first run. |
|
list[str] |
[] |
The relative path to the task folder under dataset_path. |
|
str |
None |
Evaluation results storage directory (disabled if None). |
|
bool |
False |
Enable per-step multi-camera image capture (requires disk space) |
|
enum |
“aloha_split” |
robot selection ( |
|
enum |
“panda” |
End effector selection when robote_type is |
|
FrankaCameraEnable |
|
Camera activation config: |
|
AlohaSplitCameraEnable |
|
Camera activation config: |
|
bool |
False |
Generate depth maps for active cameras |
|
int |
1000 |
Episode termination step threshold |
|
bool |
True |
Disable GUI |
Advanced Parameters
Parameter |
Scope |
---|---|
|
Early termination upon task completion |
|
Physics simulation timestep |
|
Render interval |
|
Concurrent environment instances in one isaac sim |
|
Multi-process config ( |
Concurrency Note: When
env_num > 1
orray_distribution.proc_num > 1
, environment outputs become multi-instance tensors. Agents must process batched observations and return batched actions.
3. I/O Specifications: Environment Outputs and Action Data Formats#
3.1 when robote_type is franka#
Observation Structure
observations: List[Dict] = [
{
"robot": {
"robot_pose": Tuple[array, array], # (position, oritention(quaternion: (w, x, y, z)))
"joints_state": {
"positions": array, # (9,) or (13,) -> panda or robotiq
"velocities": array # (9,) or (13,) -> panda or robotiq
},
"eef_pose": Tuple[array, array], # (position, oritention(quaternion: (w, x, y, z)))
"sensors": {
"realsense": {
"rgb": array, # uint8 (480, 640, 3)
"depth": array, # float32 (480, 640)
},
"obs_camera": {
"rgb": array,
"depth": array,
},
"obs_camera_2": {
"rgb": array,
"depth": array,
},
},
"instruction": str,
"metric": {
"task_name": str,
"episode_name": str,
"episode_sr": int,
"first_success_step": int,
"episode_step": int
},
"step": int,
"render": bool
}
},
...
]
Action Space Specifications
Agents must output List[Union[List[float], dict]] = [action_1, action_2 , ...]
of the same length as the input observations.
actions: List[Union[List[float], dict]] = [
action_1,
action_2,
...
]
The action_x
supports any of the following formats:
ActionFormat1:
List[float] # (9,) or (13,) -> panda or robotiq
ActionFormat2:
{
'arm_action': List[float], # (7,)
'gripper_action': Union[List[float], int], # (2,) or (6,) -> panda or robotiq || -1 or 1 -> open or close
}
ActionFormat3:
{
'eef_position': List[float], # (3,) -> (x, y, z)
'eef_orientation': List[float], # (4,) -> (quaternion: (w, x, y, z))
'gripper_action': Union[List[float], int], # (2,) or (6,) -> panda or robotiq || -1 or 1 -> open or close
}
3.2 when robote_type is aloha_split#
Observation Structure
observations: List[Dict] = [
{
"robot": {
"robot_pose": Tuple[array, array], # (position, oritention(quaternion: (w, x, y, z)))
"joints_state": {
"positions": array, # (28,)
"velocities": array # (28,)
},
"left_eef_pose": Tuple[array, array], # (position, oritention(quaternion: (w, x, y, z))) -> left gripper eef pose
"right_eef_pose": Tuple[array, array], # (position, oritention(quaternion: (w, x, y, z))) -> right gripper eef pose
"sensors": {
"top_camera": {
"rgb": array, # uint8 (480, 640, 3)
"depth": array, # float32 (480, 640)
},
"left_camera": {
"rgb": array,
"depth": array,
},
"right_camera": {
"rgb": array,
"depth": array,
},
},
"instruction": str,
"metric": {
"task_name": str,
"episode_name": str,
"episode_sr": int,
"first_success_step": int,
"episode_step": int
},
"step": int,
"render": bool
}
},
...
]
Note the following when using
observations["robot"]["joints_state"]["positions"]
.The aloha_split robot’s dof names (in order) as follows:
['mobile_translate_x', 'mobile_translate_y', 'mobile_rotate', 'fl_steering_joint', 'fr_steering_joint', 'rl_steering_joint', 'rr_steering_joint', 'lifting_joint', 'fl_wheel', 'fr_wheel', 'rl_wheel', 'rr_wheel', 'fl_joint1', 'fr_joint1', 'fl_joint2', 'fr_joint2', 'fl_joint3', 'fr_joint3', 'fl_joint4', 'fr_joint4', 'fl_joint5', 'fr_joint5', 'fl_joint6', 'fr_joint6', 'fl_joint7', 'fl_joint8', 'fr_joint7', 'fr_joint8']
.Thus:
The left arm coordinate:
[12, 14, 16, 18, 20, 22]
The left gripper coordinate:
[24, 25]
The right arm coordinate:
[13, 15, 17, 19, 21, 23]
The right gripper coordinate:
[26, 27]
Example:
If you want to use the joint position values of the left arm, you should do this:left_arm_joint_indices = [12, 14, 16, 18, 20, 22] left_arm_joint_positions = [observations['robot']['joints_state']['positions'][idx] for idx in left_arm_joint_indices]
Action Space Specifications
Agents must output List[dict] = [action_1, action_2 , ...]
of the same length as the input observations.
actions: List[dict] = [
action_1,
action_2,
...
]
The action_x
supports any of the following formats:
ActionFormat1:
{
'left_arm_action': List[float], # (6,)
'left_gripper_action': Union[List[float], int], # (2,) || -1 or 1 -> open or close
'right_arm_action': List[float], # (6,)
'right_gripper_action': Union[List[float], int], # (2,) || -1 or 1 -> open or close
}
ActionFormat2:
{
'left_eef_position': List[float], # (3,) -> (x, y, z)
'left_eef_orientation': List[float], # (4,) -> (quaternion: (w, x, y, z))
'left_gripper_action': Union[List[float], int], # (2,) || -1 or 1 -> open or close
'right_eef_position': List[float], # (3,) -> (x, y, z)
'right_eef_orientation': List[float], # (4,) -> (quaternion: (w, x, y, z))
'right_gripper_action': Union[List[float], int], # (2,)|| -1 or 1 -> open or close
}
None Handling Protocol: If observation element is None
or invalid value, corresponding action must be []
.