Learning robust policies through domain randomization

New techniques for training policies that transfer from simulation to real hardware.

Domain randomization research

The sim-to-real gap remains one of the biggest challenges in robotics AI. Policies that perform perfectly in simulation often fail catastrophically on real hardware. Domain randomization—varying simulation parameters during training—has emerged as the leading approach to bridge this gap.

In this post, we share new techniques we've developed for domain randomization that significantly improve sim-to-real transfer.

The problem with naive randomization

Traditional domain randomization treats all parameters equally: vary friction, mass, sensor noise, and everything else uniformly across wide ranges. While this produces robust policies, it also makes training harder and can prevent policies from learning precise behaviors.

The key insight is that not all parameters matter equally for transfer. Some parameters (like exact lighting conditions) have minimal impact on policy behavior, while others (like contact friction) are critical.

Adaptive domain randomization

We've developed an adaptive approach that automatically identifies which parameters matter most for a given task. The system starts with narrow parameter ranges and gradually expands them based on policy performance.

Key features of our approach:

Results

In our experiments across manipulation and locomotion tasks, adaptive domain randomization achieves:

Built into RebelAI

These techniques are built directly into the RebelAI platform. When you train a policy, the system automatically applies adaptive domain randomization based on your robot and task definitions.

We're continuing to research new approaches to sim-to-real transfer. If you're interested in collaborating or have challenging transfer problems, reach out to us.