Non tango pipeline #16

OyvindTafjord · 2023-11-28T08:41:45Z

@AkshitaB This is an attempt at merging my old workflow from the run_lm_eval.py script in catwalk with the new steps and functionality in this repo. Feel free to leave it in a branch for now if it's not suitable to be merged.

From my viewpoint, the main advantages of running it this way are:

Avoid relying on a tango workspace for retrieving raw predictions, instead will have link to associated beaker experiment
Less setup in terms of needing a public tango workspace for communication between steps
Easier debugging if something goes wrong, avoid issues around tango caching wrong output
Don't need a bunch of jsonnet code to tinker with the evaluation configs

To run this in beaker requires a somewhat clunky gantry command, as well as uploading the config files (either to beaker or nfs) if that's used instead of direct parameters (which is also supported). But that's easy enough once it's in your workflow. My gantry example uses my OLMoEvalLatest beaker image which includes the OLMo code and I try to keep up to date.

Since this is an internal repo, GITHUB_TOKEN is needed to run gantry. Also, GDRIVE_SERVICE_ACCOUNT_JSON is needed to upload to google sheets (I tweaked the google sheet upload a bit, behind a simple_pipeline argument, to add the beaker ID, remove the tango stuff, and add a all_metrics column, see example in my "OLMo-evals-testing" sheet).

I'm totally open to suggestions for how to refactor this to accomplish the main advantages above without being quite as hacky. :)

AkshitaB · 2023-12-04T23:52:08Z

@OyvindTafjord Thank you for the PR! I've made some fixes, mainly formatting and lint, and added a couple of test cases. This should help with future updates to the code.

OyvindTafjord added 4 commits November 27, 2023 21:17

Add run_lm_eval from catwalk using new steps

1243f39

Add google sheet and config file to run_lm_eval

69e4b0c

Update readme, tweak config example

74c18d7

Warning rather than crash if google sheet fails

17e7d0c

OyvindTafjord requested a review from AkshitaB November 28, 2023 08:41

AkshitaB added 5 commits December 4, 2023 14:21

fix formatting

3824d14

lint

f96e984

add tests

dbfc66c

change _ to - for convention

769177a

update changelog

b67874f

AkshitaB merged commit b56a989 into main Dec 5, 2023
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non tango pipeline #16

Non tango pipeline #16

OyvindTafjord commented Nov 28, 2023

AkshitaB commented Dec 4, 2023

Non tango pipeline #16

Non tango pipeline #16

Conversation

OyvindTafjord commented Nov 28, 2023

AkshitaB commented Dec 4, 2023