SIM1 | Intern Robotics

SIM1: The Same One as Reality

"What I cannot create, I do not understand."
Richard Feynman

Robotics is advancing at astonishing speed. Machines now sort parcels at warehouse tempo, weld with micron-level precision, and assist in surgery with superhuman steadiness. Yet give that same robot a wrinkled shirt and ask it to fold, and the illusion breaks. Cloth is not just another object category. It is a moving state space. Every grasp reshapes the surface. Every pull rewrites the geometry. Every contact changes what happens next. Compared with rigid manipulation, deformable tasks demand vastly broader coverage of states, recoveries, and interaction modes, making them among the most data-hungry problems in all of robotics.

That is exactly where the field hits a wall. Robot foundation models such as π₀ have already shown the core scaling law: more demonstrations unlock more capability. But cloth needs far more data than pick-and-place or rigid assembly, because success depends on seeing not one canonical pose, but thousands of messy, partially folded, self-occluded, failure-prone configurations. Collecting that data in the real world is brutally expensive: expert operators, costly hardware, and roughly 100 trajectories per day. The moment you ask for the volume required for robust generalization, the economics collapse.

In principle, simulation should be the escape hatch. Run forever, scale cheaply, generate trajectories by the million. And indeed, simulation-driven data generation has shown promise for rigid-object manipulation. But cloth has remained the exception. Prior attempts break on exactly the dimensions that matter most for deformables: geometry that drifts from reality, contact and friction that destabilize long-horizon behavior, and robot motions that no longer resemble the structure of human demonstrations. The result is a frustrating compromise the community has learned to live with: use simulation to help, but trust real data when it counts.

One simulation, spanning multiple tasks and scenes: these 50× replays summarize 45 hours of generated trajectories from SIM1, with physically credible cloth dynamics at scale.

50× Speed

SIM1 begins by rejecting that compromise. Our premise is not that synthetic data should be cheaper; it is that synthetic data must become faithful. When the simulator is geometrically precise, dynamically stable, and behaviorally aligned with real demonstrations, simulation stops being a rough pre-training proxy and becomes a real source of supervision. That is the idea behind SIM1: close the loop between human demonstrations and high-fidelity cloth simulation tightly enough that generated trajectories can serve as a direct, zero-shot substitute for real-world data.

From just 200 human demonstrations, SIM1 produces over 10,000 synthetic trajectories at 27× lower cost and 6.8× higher throughput. Policies trained entirely on this synthetic data achieve 90% zero-shot success on real robots — and outperform real-data baselines by over 50% when the environment changes. For cloth manipulation, this is the turning point: simulation is no longer a warm-up. It is the engine.

Scan It, Simulate It, Scale It

Three Bridges Between Virtual and Real

The gap between simulation and reality isn't one problem — it's three. The scene looks wrong. The physics feel wrong. The motions act wrong. SIM1 closes all three with a unified real-to-sim-to-real pipeline.

First, we make the virtual world look like the real one. Real garments are laser-scanned at sub-millimeter resolution, producing digital twins so precise they preserve individual wrinkles and texture weave. The robot, the table, the lighting — everything is reconstructed or imported at true-to-life scale. This isn't approximate modeling. It's digital forensics.

Second, we make the virtual world feel like the real one. Our custom physics solver ensures that when a simulated robot pulls cloth, the fabric drapes, stretches, and resists exactly as it would on a physical table. We calibrate by running identical motions on the real and simulated robot simultaneously, then tuning parameters until the two are visually indistinguishable.

Third, we make the virtual robot act like a real one. Rather than scripting rigid pick-and-place sequences, we teach a diffusion model to generate fluid, human-like trajectories. A learned quality filter discards the rare implausible motion, and visual randomization — different materials, lighting, camera angles — ensures that trained policies can't memorize appearances. The result is a data engine that converts a weekend of human demonstration into months of diverse, photorealistic training data.

The pipeline begins with real garments and teleoperated demonstrations, reconstructs them into a high-fidelity simulation, generates large-scale synthetic trajectories with aligned physics and behavior, and finally trains policies that transfer zero-shot back to physical robots.

Teaching Simulators to Respect Cloth

Pull a real shirt from one corner, and the entire fabric responds instantly — ripples propagate, folds cascade, the cloth moves as one connected surface. Existing physics engines can't reproduce this in real time. Forces travel too slowly through the mesh, particles lag behind the gripper, and the result is a jittery, stretchy mess that looks nothing like real fabric. This is why sim-to-real for cloth has remained a pipe dream.

SIM1's solver fixes this with an elegant idea: give the cloth a nervous system. Whenever any mesh edge stretches beyond a physical threshold, a corrective spring instantly activates and snaps the fabric back into shape. When one part is pulled, the correction signal races across the entire surface within a single simulation step. The naive VBD solver (left) produces chaotic, localized deformation. Our deformation-stable solver (right) keeps the mesh coherent and physically plausible. Drag the particles yourself to feel the difference.

Naive VBD slow global propagation

Few, weak global passes—the mesh lags: distant particles respond only after many frames.

Deformation-stable adaptive edge coupling

Strain-gated switches (warm dots on edges) engage extra local projections—deformation equilibrates fast.

The interactive comparison above shows why the solver matters; the gallery below shows what that fidelity buys us. SIM1 reproduces deformable motion with convincing realism across twisting, fluttering, folding, and impact, while remaining stable under rigid-deformable contact. Even during aggressive gripper interaction, the cloth stays coherent and the contact remains penetration-free.

Garment Twisting

Wind-Blown Garment

Garment Grasping (No Penetration)

Paper Folding

Multi-Layer Cloth Drop

Cloth Falling onto Spheres

Together, these examples mark the difference between an approximate cloth animation and a solver that can actually support learning. SIM1 does not just make deformables look plausible in isolation; it preserves the contact, stability, and non-penetrating interaction patterns that policies must rely on before they ever touch a real robot.

A Data Factory, Not a Data Collection

Collecting robot data the traditional way is artisanal — one skilled operator, one demonstration at a time, one task per session. SIM1 turns this craft into manufacturing.

From 200 teleoperated demonstrations, we extract the essential vocabulary of manipulation: grasps, lifts, folds, releases. These atomic interactions are preserved exactly as the human performed them — they're too delicate to synthesize. The creative work — deciding how to connect one action to the next — is handled by a diffusion model that generates smooth, natural-looking transitions. The pipeline assembles these fragments into complete, novel trajectories that have never been performed but are physically valid. Each is rendered with randomized materials, lighting, and camera angles. The left panel shows raw traces from human demonstrations; the right shows one of thousands of synthesized trajectories — the data factory at work.

Simulation joint traces (A, B, C)

Solid / dashed lines: streams. Cut points: interacting poses; longer segments: moving segments.

Generated trajectories (one highlighted, many possible)

Bright curve and dots: the highlighted trajectory instance. Faint curves: other valid samples from the same stitching and gap-fill procedure. The pipeline scales to arbitrarily many trajectories.

From the same interaction vocabulary, SIM1 can assemble many distinct long-horizon rollouts, preserving task structure while expanding coverage far beyond what a human operator could manually record.

Does Synthetic Data Look Real?

A synthetic dataset is only useful if the actions it teaches actually resemble what a real human operator would do. To find out, we project every robot action — from real demonstrations, teleoperated simulation, and SIM1's generated data — into a shared 3D space. The result speaks for itself: SIM1's synthetic actions don't just overlap with real demonstrations — they surround and extend them, covering a richer manifold of behaviors while staying firmly grounded in the same distribution. This isn't noise being mistaken for data. It's genuine manipulation skill, manufactured at scale. Drag to rotate, scroll to zoom.

In other words, SIM1 does not generate actions that merely look plausible in isolation. It produces trajectories that remain anchored to the same behavioral structure as real demonstrations, while broadening the coverage needed for robust policy learning.

"When simulation becomes faithful enough, real data no longer has to carry the full burden of learning."

15 synthetic trajectories ≈ 1 real demo · 27× cheaper · 6.8× faster · 90% zero-shot on real robots

Seeing Is Believing

The central question is simple: can simulation-trained policies really replace real-data training for deformable manipulation? We answer this from three angles at once: matched-budget transfer, robustness under distribution shift, and how performance scales as more synthetic data is generated.

Under equal budgets, simulation already comes surprisingly close to real-world supervision. In the representative π_0.5 setting, 200 real demonstrations reach 97% success, while the corresponding simulation-trained policy reaches 87%, leaving only a 10-point gap. Once the environment changes, however, the picture flips: in spatial, texture, and lighting shifts, simulation-trained policies consistently hold up better because SIM1 exposes the model to variation that limited real collection cannot cover.

The strongest sanity check comes from training π_0.5 from scratch. Real data alone collapses to 0% success, while SIM1's synthetic data alone reaches 76%. That result is important: it shows the gain is not merely inherited from pretrained priors. It comes from the data distribution itself.

The bar summary below makes this comparison explicit across model scales and evaluation settings. It shows that simulation is already competitive in-domain under matched budgets, and becomes decisively stronger once robustness matters.

Success Rate

Real Data

Sim Teleoperated Data

Sim Generated Data

π0.5 (scratch)

π0.5

π0

The next question is not just whether synthetic data works, but how it scales. The two scaling plots unpack this story for π_0.5: the left chart tracks the in-domain regime, while the right chart shows the harder texture-randomized setting where visual and frictional statistics shift away from training.

These curves reveal the core asymmetry between real and synthetic supervision. Real data is stronger in the extreme low-data regime, but synthetic data scales far more effectively: performance keeps rising as more generated trajectories are added, while real-data gains saturate. In the representative in-domain setting, one real demonstration is worth roughly 15 synthetic trajectories near saturation; under texture generalization, that equivalence tightens to roughly 5 synthetic trajectories per real sample.

Numbers alone do not show what zero-shot transfer actually looks like on a robot. The videos below therefore move from the representative long-horizon T-shirt folding benchmark to harder deployments on garments whose material, shape, and appearance differ substantially from the training distribution.

The top row highlights successful real-robot execution on the benchmark T-shirt task from grasping through folding completion. The bottom row emphasizes generalization: SIM1-trained policies remain effective on out-of-domain garments, including cases with material, size, and geometry configurations not seen during training, where real-data baselines become markedly less reliable.

Try It Yourself

SIM1's simulation runs in real time on GPU. Below is the same teleoperation interface our operators use to record the demonstrations that seed the entire pipeline. This is where the data journey begins — from here, SIM1 handles the rest.

One More Thing

We are also building a next-generation deformable simulator based on IPC, designed for high-frame-rate, high-fidelity cloth and contact simulation. The goal is not just visual realism, but robust, penetration-free interaction under aggressive rigid-deformable contact at speeds practical for large-scale data generation.

This system is still under active development. We will share quantitative results, videos, and technical details as soon as the simulator is ready.

IPC-Based Simulator Preview

Citation

@misc{zhou2026sim1physicsalignedsimulatorzeroshot,
      title={SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds}, 
      author={Yunsong Zhou and Hangxu Liu and Xuekun Jiang and Xing Shen and Yuanzhen Zhou and Hui Wang and Baole Fang and Yang Tian and Mulin Yu and Qiaojun Yu and Li Ma and Hengjie Li and Hanqing Wang and Jia Zeng and Jiangmiao Pang},
      year={2026},
      eprint={2604.08544},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2604.08544}, 
}

Stay in the Loop

Get notified about live demos, challenges, and the latest research updates.

No spam, ever. Unsubscribe anytime.

SIM1 Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds