# πŸ“¦ Add a New Dataset This section explains how to register and add a custom dataset with the InternManip framework. The process involves two main steps: **[ensuring the dataset format](#dataset-structure)** and **[registering it in code](#implementation-steps)**. ## Dataset Structure All datasets must follow the [LeRobotDataset Format](#https://github.com/huggingface/lerobot) to ensure compatibility with the data loaders and training pipelines. The expected structure is: ``` # Root directory of your dataset β”‚ β”œβ”€β”€ data # Structured episode data in .parquet format β”‚ β”‚ β”‚ β”œβ”€β”€ chunk-000 # Episodes 000000 - 000999 β”‚ β”‚ β”œβ”€β”€ episode_000000.parquet β”‚ β”‚ β”œβ”€β”€ episode_000001.parquet β”‚ β”‚ └── ... β”‚ β”‚ β”‚ β”œβ”€β”€ chunk-001 # Episodes 001000 - 001999 β”‚ β”‚ └── ... β”‚ β”‚ β”‚ β”œβ”€β”€ ... β”‚ β”‚ β”‚ └── chunk-00n # Follows the same convention (1,000 episodes per chunk) β”‚ └── ... β”‚ β”œβ”€β”€ meta # Metadata and statistical information β”‚ β”œβ”€β”€ episodes.jsonl # Per-episode metadata (length, subtask, etc.) β”‚ β”œβ”€β”€ info.json # Dataset-level information β”‚ β”œβ”€β”€ tasks.jsonl # Task definitions β”‚ β”œβ”€β”€ modality.json # Key dimensions and mapping information for each modality β”‚ └── stats.json # Global dataset statistics (mean, std, min, max, quantiles) β”‚ └── videos # Multi-view videos for each episode β”‚ β”œβ”€β”€ chunk-000 # Videos for episodes 000000 - 000999 β”‚ β”œβ”€β”€ observation.images.head # Head (main front-view) camera β”‚ β”‚ β”œβ”€β”€ episode_000000.mp4 β”‚ β”‚ └── ... β”‚ β”œβ”€β”€ observation.images.hand_left # Left hand camera β”‚ └── observation.images.hand_right # Right hand camera β”‚ β”œβ”€β”€ chunk-001 # Videos for episodes 001000 - 001999 β”‚ β”œβ”€β”€ ... β”‚ └── chunk-00n # Follows the same naming and structure ``` > πŸ’‘ Note: For more detailed tutorials, please refer to the [Dataset](../tutorials/dataset.md) section. This separation of raw data, video files, and metadata makes it easier to standardize transformations and modality handling across different datasets. ## Implementation Steps ### Register a Dataset Class Create a new dataset class under `internmanip/datasets/`, inheriting from `LeRobotDataset`: ```python from internmanip.datasets import LeRobotDataset class CustomDataset(LeRobotDataset): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) def load_data(self): # Implement custom data loading logic here pass ``` This class defines how to read your dataset’s raw files and convert them into a standardized format for training. ### Define a Data Configuration Each dataset needs a data configuration class that specifies modalities, keys, and transformations. Create a new configuration file under `internmanip/configs/data_configs/`. Here’s a minimal example: ```python class CustomDataConfig(BaseDataConfig): """Data configuration for the custom dataset.""" video_keys = ["video.rgb"] state_keys = ["state.pos"] action_keys = ["action.delta_pos"] language_keys = ["annotation.instruction"] # Temporal indices observation_indices = [0] # Current timestep for observations action_indices = list(range(16)) # Future timesteps for actions (0-15) def modality_config(self) -> dict[str, ModalityConfig]: """Define modality configurations.""" return { "video": ModalityConfig(self.observation_indices, self.video_keys), "state": ModalityConfig(self.observation_indices, self.state_keys), "action": ModalityConfig(self.action_indices, self.action_keys), "language": ModalityConfig(self.observation_indices, self.language_keys), } def transform(self): """Define preprocessing pipelines.""" return [ # Video preprocessing VideoToTensor(apply_to=self.video_keys), VideoResize(apply_to=self.video_keys, height=224, width=224), # State preprocessing StateActionToTensor(apply_to=self.state_keys), StateActionTransform( apply_to=self.state_keys, normalization_modes={"state.pos": "mean_std"}, ), # Action preprocessing StateActionToTensor(apply_to=self.action_keys), StateActionTransform( apply_to=self.action_keys, normalization_modes={"action.delta_pos": "mean_std"}, ), # Concatenate modalities ConcatTransform( video_concat_order=self.video_keys, state_concat_order=self.state_keys, action_concat_order=self.action_keys, ), ] ``` ### Register Your Config Finally, register your custom config by adding it to `DATA_CONFIG_MAP`. ```python DATA_CONFIG_MAP = { ..., "custom": CustomDataConfig(), } ``` > πŸ’‘ Tips: Adjust the key names (`video_keys`, `state_keys`, etc.) and `normalization_modes` based on your dataset. For multi-view video or multi-joint actions, just add more keys and update the transforms accordingly. This config sets up how to load and process different modalities, and ensures compatibility with the training framework. ### What's Next? After registration, you can use your dataset by passing `--dataset_path ` and `--data_config custom` to the training YAML file.