Eurekaverse | Environment Curriculum Generation via Large Language Models

Abstract

Recent work has demonstrated that a promising strategy for teaching robots a wide range of complex skills is by training them on a curriculum of progressively more challenging environments. However, developing an effective curriculum of environment distributions currently requires significant expertise, which must be repeated for every new domain. Our key insight is that environments are often naturally represented as code. Thus, we probe whether effective environment curriculum design can be achieved and automated via code generation by large language models (LLM). In this paper, we introduce Eurekaverse, an unsupervised environment design algorithm that uses LLMs to sample progressively more challenging, diverse, and learnable environments for skill training. We validate Eurekaverse’s effectiveness in the domain of quadrupedal parkour learning, in which a quadruped robot must traverse through a variety of obstacle courses. The automatic curriculum designed by Eurekaverse enables gradual learning of complex parkour skills in simulation and can successfully transfer to the real-world, outperforming manual training courses designed by humans.

Method

Eurekaverse automatically learns complex skills by performing agent-environment co-evolution, which iterates between evolutionary environment generation and population-based policy training and evaluation.

Experiments

Obstacle Courses

In this section, we highlight the exceptional performance of our Eurekaverse policy on real-world parkour courses. Our policy can robustly traverse large gaps, steep ramps, and even a deformable yoga ball.

Climbing boxes, crossing a gap, and walking over a yoga ball.

Jumping over a 65cm gap with an elevated landing platform.

Chaining a triple jump and recovering from a slanted ramp landing.

Ignoring fruit distractors and quickly reacting to a dynamically changing course.

Additional Course Highlights

Outdoor Terrains

Climbing up and down a narrow path.

Hurdling a curb, turning around, and hurdling back.

Climbing two large steps of terraced seating.

Quantitative Evaluation

Across four obstacle types, the Eurekaverse-trained policy significantly outperforms a policy trained on human-designed environments from prior work.

Generated Terrains

Eurekaverse generates a diverse set of terrains for parkour learning in simulation. Each iteration of generation increases difficulty and complexity, forming an adaptive curriculum for effective training. Below the renders, we also show examples of the generated terrain code.

Eurekaverse generated terrain code.

Simulation Evaluation

In our simulated parkour benchmark, the Eurekaverse policy outperforms a Human-Designed baseline from prior work. The latter learns quickly but plateaus, whereas Ours continuously improves and nears the performance of an Oracle trained on the testing terrains.

Benchmark Visualization

Failures and Limitations

Below are some failure cases of our policy. There are many ways to further improve performance. For instance, our sim-to-real transfer still leaves a noticeable gap between simulated and real-world behavior. Additionally, future work can incorporate visual feedback during environment generation, potentially improving spatial reasoning and environment diversity.

BibTeX

@inproceedings{liang2024eurekaverse,
            title     = {Environment Curriculum Generation via Large Language Models},
            author    = {William Liang and Sam Wang and Hung-Ju Wang and Osbert Bastani and Dinesh Jayaraman and Yecheng Jason Ma},
            year      = {2024},
            booktitle = {Conference on Robot Learning (CoRL)},
}

Eurekaverse: Environment Curriculum Generationvia Large Language Models