Recent work has demonstrated that a promising strategy for teaching robots a wide range of complex skills is by training them on a curriculum of progressively more challenging environments. However, developing an effective curriculum of environment distributions currently requires significant expertise, which must be repeated for every new domain. Our key insight is that environments are often naturally represented as code. Thus, we probe whether effective environment curriculum design can be achieved and automated via code generation by large language models (LLM). In this paper, we introduce Eurekaverse, an unsupervised environment design algorithm that uses LLMs to sample progressively more challenging, diverse, and learnable environments for skill training. We validate Eurekaverse’s effectiveness in the domain of quadrupedal parkour learning, in which a quadruped robot must traverse through a variety of obstacle courses. The automatic curriculum designed by Eurekaverse enables gradual learning of complex parkour skills in simulation and can successfully transfer to the real-world, outperforming manual training courses designed by humans.
In this section, we highlight the exceptional performance of our Eurekaverse policy on real-world parkour courses. Our policy can robustly traverse large gaps, steep ramps, and even a deformable yoga ball.
Across four obstacle types, the Eurekaverse-trained policy significantly outperforms a policy trained on human-designed environments from prior work.
Eurekaverse generates a diverse set of terrains for parkour learning in simulation. Each iteration of generation increases difficulty and complexity, forming an adaptive curriculum for effective training. Below the renders, we also show examples of the generated terrain code.
Eurekaverse generated terrain code.
In our simulated parkour benchmark, the Eurekaverse policy outperforms a Human-Designed baseline from prior work. The latter learns quickly but plateaus, whereas Ours continuously improves and nears the performance of an Oracle trained on the testing terrains.
Below are some failure cases of our policy. There are many ways to further improve performance. For instance, our sim-to-real transfer still leaves a noticeable gap between simulated and real-world behavior. Additionally, future work can incorporate visual feedback during environment generation, potentially improving spatial reasoning and environment diversity.
@inproceedings{liang2024eurekaverse,
title = {Environment Curriculum Generation via Large Language Models},
author = {William Liang and Sam Wang and Hung-Ju Wang and Osbert Bastani and Dinesh Jayaraman and Yecheng Jason Ma},
year = {2024},
booktitle = {Conference on Robot Learning (CoRL)},
}