Submit to BALROG Leaderboard
If you are interested in submitting your agent to the BALROG Leaderboard, please do the following:
- Fork and clone the BALROG/experiments repository.
-
Create a new folder with the submission date and the agent name (e.g.
2024-09-21_balrog_gpt4o
). -
Copy the log of the run of your agent, please include the following files from your agent's
evaluation:
babaisai
: babaisai folder, containing summary and trajectory logsbabyai
: babyai folder, containing summary and trajectory logscrafter
: crafter folder, containing summary and trajectory logsminihack
: minihack folder, containing summary and trajectory logsnle
: nethack folder, containing summary and trajectory logstextworld
: textworld folder, containing summary and trajectory logssummary.json
: Summary of the evaluation outcomes for all environments
NOTE: You shouldn't have to create any of these files. They should automatically be generated by BALROG evaluation.
-
metadata.yaml
: Metadata for how the result is shown on website. Please include the following fields:name
: The name of your leaderboard entryoss
:true
if your agent (model + strategy) is open-sourcesite
: URL/link to more information about your agentverified
:false
(See below for results verification)
-
README.md
: Include anything you'd like to share about your agent here! -
Run
python submit.py <path-to-submission>
- Create a pull request to the BALROG/experiments repository with the new folder.
You can refer to this tutorial for a quick overview of how to evaluate your agent on BALROG.
Verify Your Results
The Verified check ✓ indicates that we (the BALROG team) received access to your agent and were able to reproduce a selection of the results.
If you are interested in receiving the "verified" checkmark ✓ on your submission, please do the following:
- Create an issue
- In the issue, provide us instructions on how to run your agent on BALROG.
- We will run your agent on a random subset of BALROG and verify the results.