EBench Docs

Run Evaluation

1. Start the server

python ray_eval_server.py --host 0.0.0.0 --port 8087 --no_save_process

Or with a local Isaac Sim installation:

/isaac-sim/python.sh ray_eval_server.py --host 0.0.0.0 --port 8087 --no_save_process

2. Submit a task

From the client environment, submit a benchmark job:

gmp submit ebench/generalist/test_mini --run_id my_first_run

Available task settings:

Task setting	Description
`ebench/mobile_manip/<split>`	Pick-and-place with mobile base
`ebench/table_top_manip/<split>`	Dexterous tabletop tasks
`ebench/generalist/<split>`	Mixed tasks across categories

Splits: val_train, val_unseen, test_mini

Examples:

Submit all tasks at once: gmp submit ebench --run_id full_run.
Submit all evaluation tasks in mobile_manip: gmp submit ebench/mobile_manip/test_mini --run_id evaluate_mobile_manip.
Submit collect_coffee_beans in table_top_manip: gmp submit ebench/table_top_manip/test_mini/collect_coffee_beans --run_id evaluate_only_one_task.

3. Connect your model

Quick connectivity check with the built-in baseline:

gmp eval -a r5a -g lift2 --worker_ids 0

For your own model, see Integrate Your Own Model.

4. Check results

gmp status

Results are saved to saved/eval_results/<task>/<run_id>/.

When running server and client on different machines, pass --host <ip> --port <port> to all gmp commands. See the GMP CLI reference for all options.