Submit to BALROG

All official submissions to the BALROG leaderboard are maintained at BALROG/experiments

If you are interested in submitting your agent to the BALROG Leaderboard, please do the following:

Fork and clone the BALROG/experiments repository.
Create a new folder with the submission date and the agent name in the LLM or VLM directory (e.g. submissions/LLM/2024-09-21_balrog_gpt4o).
Copy the log of the run of your agent, please include the following files from your agent's evaluation:
- babaisai: babaisai folder, containing summary and trajectory logs
- babyai: babyai folder, containing summary and trajectory logs
- crafter: crafter folder, containing summary and trajectory logs
- minihack: minihack folder, containing summary and trajectory logs
- nle: nethack folder, containing summary and trajectory logs
- textworld: textworld folder, containing summary and trajectory logs
- summary.json: Summary of the evaluation outcomes for all environments
NOTE: You shouldn't have to create any of these files. They should automatically be generated by BALROG evaluation.
metadata.yaml: Metadata for how the result is shown on website. Please include the following fields:
- name: The name of your leaderboard entry
- oss: true if your agent (model + strategy) is open-source
- site: URL/link to more information about your agent
- verified: false (See below for results verification)
- date: submission date in string format, (e.g. "2024-12-09")
README.md: Include anything you'd like to share about your agent here!
Run python submit.py <path-to-submission>
Create a pull request to the BALROG/experiments repository with the new folder.

You can refer to this tutorial for a quick overview of how to evaluate your agent on BALROG.

The Verified check ✓ indicates that we (the BALROG team) received access to your agent and were able to reproduce a selection of the results.

If you are interested in receiving the "verified" checkmark ✓ on your submission, please do the following: