π¦ Add a New Dataset#
This section explains how to register and add a custom dataset with the InternManip framework. The process involves two main steps: ensuring the dataset format and registering it in code.
Dataset Structure#
All datasets must follow the LeRobotDataset Format to ensure compatibility with the data loaders and training pipelines. The expected structure is:
<your_dataset_root> # Root directory of your dataset
β
βββ data # Structured episode data in .parquet format
β β
β βββ chunk-000 # Episodes 000000 - 000999
β β βββ episode_000000.parquet
β β βββ episode_000001.parquet
β β βββ ...
β β
β βββ chunk-001 # Episodes 001000 - 001999
β β βββ ...
β β
β βββ ...
β β
β βββ chunk-00n # Follows the same convention (1,000 episodes per chunk)
β βββ ...
β
βββ meta # Metadata and statistical information
β βββ episodes.jsonl # Per-episode metadata (length, subtask, etc.)
β βββ info.json # Dataset-level information
β βββ tasks.jsonl # Task definitions
β βββ modality.json # Key dimensions and mapping information for each modality
β βββ stats.json # Global dataset statistics (mean, std, min, max, quantiles)
β
βββ videos # Multi-view videos for each episode
β
βββ chunk-000 # Videos for episodes 000000 - 000999
β βββ observation.images.head # Head (main front-view) camera
β β βββ episode_000000.mp4
β β βββ ...
β βββ observation.images.hand_left # Left hand camera
β βββ observation.images.hand_right # Right hand camera
β
βββ chunk-001 # Videos for episodes 001000 - 001999
β
βββ ...
β
βββ chunk-00n # Follows the same naming and structure
π‘ Note: For more detailed tutorials, please refer to the Dataset section.
This separation of raw data, video files, and metadata makes it easier to standardize transformations and modality handling across different datasets.
Implementation Steps#
Register a Dataset Class#
Create a new dataset class under internmanip/datasets/
, inheriting from LeRobotDataset
:
from internmanip.datasets import LeRobotDataset
class CustomDataset(LeRobotDataset):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def load_data(self):
# Implement custom data loading logic here
pass
This class defines how to read your datasetβs raw files and convert them into a standardized format for training.
Define a Data Configuration#
Each dataset needs a data configuration class that specifies modalities, keys, and transformations.
Create a new configuration file under internmanip/configs/data_configs/
. Hereβs a minimal example:
class CustomDataConfig(BaseDataConfig):
"""Data configuration for the custom dataset."""
video_keys = ["video.rgb"]
state_keys = ["state.pos"]
action_keys = ["action.delta_pos"]
language_keys = ["annotation.instruction"]
# Temporal indices
observation_indices = [0] # Current timestep for observations
action_indices = list(range(16)) # Future timesteps for actions (0-15)
def modality_config(self) -> dict[str, ModalityConfig]:
"""Define modality configurations."""
return {
"video": ModalityConfig(self.observation_indices, self.video_keys),
"state": ModalityConfig(self.observation_indices, self.state_keys),
"action": ModalityConfig(self.action_indices, self.action_keys),
"language": ModalityConfig(self.observation_indices, self.language_keys),
}
def transform(self):
"""Define preprocessing pipelines."""
return [
# Video preprocessing
VideoToTensor(apply_to=self.video_keys),
VideoResize(apply_to=self.video_keys, height=224, width=224),
# State preprocessing
StateActionToTensor(apply_to=self.state_keys),
StateActionTransform(
apply_to=self.state_keys,
normalization_modes={"state.pos": "mean_std"},
),
# Action preprocessing
StateActionToTensor(apply_to=self.action_keys),
StateActionTransform(
apply_to=self.action_keys,
normalization_modes={"action.delta_pos": "mean_std"},
),
# Concatenate modalities
ConcatTransform(
video_concat_order=self.video_keys,
state_concat_order=self.state_keys,
action_concat_order=self.action_keys,
),
]
Register Your Config#
Finally, register your custom config by adding it to DATA_CONFIG_MAP
.
DATA_CONFIG_MAP = {
...,
"custom": CustomDataConfig(),
}
π‘ Tips: Adjust the key names (
video_keys
,state_keys
, etc.) andnormalization_modes
based on your dataset. For multi-view video or multi-joint actions, just add more keys and update the transforms accordingly.
This config sets up how to load and process different modalities, and ensures compatibility with the training framework.
Whatβs Next?#
After registration, you can use your dataset by passing --dataset_path <path>
and --data_config custom
to the training YAML file.