Learning robust policies through domain randomization

The sim-to-real gap remains one of the biggest challenges in robotics AI. Policies that perform perfectly in simulation often fail catastrophically on real hardware. Domain randomization—varying simulation parameters during training—has emerged as the leading approach to bridge this gap.

In this post, we share new techniques we've developed for domain randomization that significantly improve sim-to-real transfer.

The problem with naive randomization

Traditional domain randomization treats all parameters equally: vary friction, mass, sensor noise, and everything else uniformly across wide ranges. While this produces robust policies, it also makes training harder and can prevent policies from learning precise behaviors.

The key insight is that not all parameters matter equally for transfer. Some parameters (like exact lighting conditions) have minimal impact on policy behavior, while others (like contact friction) are critical.

Adaptive domain randomization

We've developed an adaptive approach that automatically identifies which parameters matter most for a given task. The system starts with narrow parameter ranges and gradually expands them based on policy performance.

Key features of our approach:

Automatic parameter selection — The system identifies which simulation parameters most affect policy behavior.
Curriculum-based expansion — Parameter ranges start narrow and expand as the policy improves.
Task-specific randomization — Different tasks get different randomization strategies automatically.
Real-world validation — We continuously validate against real hardware to tune randomization ranges.

Results

In our experiments across manipulation and locomotion tasks, adaptive domain randomization achieves:

40% faster training convergence compared to uniform randomization
25% higher success rates on real hardware
Better handling of edge cases and unexpected perturbations

Built into RebelAI

These techniques are built directly into the RebelAI platform. When you train a policy, the system automatically applies adaptive domain randomization based on your robot and task definitions.

We're continuing to research new approaches to sim-to-real transfer. If you're interested in collaborating or have challenging transfer problems, reach out to us.