🏆 IROS 2025 Challenge#

Welcome to the IROS 2025 Challenge of Multimodal Robot Learning in InternUtopia and Real World! InternManip provides the official baseline and evaluation toolkit for Track: Vision-Language Manipulation in Open Tabletop Environments, featured at the IROS 2025 Workshop.

🚀 Challenge Overview#

In this challenge, participants will develop end-to-end policies that fuse vision and language to control robots in simulated physics-based environment. Models are trained using the InternManip framework and GenManip dataset, and evaluated in a closed-loop benchmark on unseen private scenes.

This repository serves as the starter kit and evaluation toolkit—you can use it to:

Implement your own policy models
Train them on GenManip public data
Submit them via Docker for final evaluation

📚 More information#

You can get information about the competition here, including resources, time and rewards, etc.

🛠️ guided tutorial#

We’ve provided a concise guided tutorial for challengers, divided into three parts: Environment Setup, Local Development & Testing, and Packaging & Submission.

😄 Good luck, and we look forward to your innovations!