add beaker docs

allenai · Dec 5, 2023 · f9a7df4 · f9a7df4
1 parent 7df251a
commit f9a7df4
Show file tree

Hide file tree

Showing 3 changed files with 84 additions and 4 deletions.
diff --git a/ADVANCED.md b/ADVANCED.md
@@ -33,17 +33,36 @@ gcloud config set project <your-project>
 * Run with a `gs:` workspace.
 
 ```commandline
-tango --settings tango.yml run configs/evaluation_template.jsonnet --workspace gs://my-gs-workspace
+tango --settings tango.yml run configs/example_config.jsonnet --workspace gs://my-gs-workspace
 ```
 
 This will create a `tango` workspace in google cloud bucket.
 
-💡 See `tango.yml` to set this as the default option.
+💡 See [`tango.yml`](tango.yml) to set this as the default option.
 
 ### Troubleshooting
 
 If some error causes your google workspace to go into a bad state (i.e., you get errors that say step should not be in completed state, etc.), you can clear the workspace with
 
 ```commandline
 python scripts/empty_workspace.py my-gs-workspace
-```
+```
+
+
+## Run without Tango
+
+The [`llm_eval/run_lm_eval.py`](llm_eval/run_lm_eval.py) script provides a way to run an evaluation as a single
+job with associated result set. Arguments can be provided in a config file, an example is found 
+in `configs/run_lm_eval_example.jsonnet`, or as direct arguments (see documentation in script). E.g.,
+
+```commandline
+python -m llm_eval.run_lm_eval --config_file configs/run_lm_eval_example.jsonnet
+```
+or
+```commandline
+python -m llm_eval.run_lm_eval --model lm::pretrained=EleutherAI/pythia-160m,revision=step140000 \
+    --task arc_challenge arc_easy  --split validation \
+    --full_output_file predictions.jsonl --metrics_file metrics.json --model_max_length 2048 \
+    --max_batch_tokens 4096 --num_recorded_inputs 3 --num_shots 0 --gsheet OLMo-evals-testing
+```
+
diff --git a/BEAKER.md b/BEAKER.md
@@ -0,0 +1,59 @@
+
+# Run on beaker
+
+## Option 1: Run full pipeline in an interactive session
+
+* Start an interactive session with the required number of GPUs
+* Run as you would run locally, i.e., 
+
+```commandline
+tango --settings tango.yml run configs/example_config.jsonnet --workspace gs://my-gs-workspace
+```
+
+## Option 2: Run each step as a different beaker job
+
+This will run each step of the pipeline individually in different beaker experiments. Compute resources required for each step will be provisioned separately.
+
+<details>
+    <summary>Creating a beaker image (❗Update each time the catwalk / tango version is updated).</summary>
+
+This is done so that each individual step does not need to install catwalk and tango, and other libraries, which can be slow.
+
+[Reference](https://beaker-docs.apps.allenai.org/interactive/images.html#building-custom-images)
+
+```commandline
+beaker session create --gpus 1 --image beaker://ai2/cuda11.5-cudnn8-dev-ubuntu20.04 --bare --save-image
+conda create -n eval-env python=3.10
+conda activate eval-env
+pip install -e '.[dev]'
+exit
+beaker image rename <image-id> llm_eval_image
+```
+</details>
+
+```commandline
+tango --settings tango-in-beaker.yml run configs/example_config.jsonnet --workspace gs://my-gs-workspace
+```
+
+💡 See [`tango-in-beaker.yml`](tango-in-beaker.yml) for all configurable options.
+
+## Option 3: Run full pipeline in a single beaker job
+
+Note: Use with `llm_eval/run_llm_eval.py`. See details [here](ADVANCED.md#run-without-tango).
+
+Use [beaker-gantry](https://github.com/allenai/beaker-gantry), e.g.,
+
+```commandline
+gantry run --gpus 1 --venv base --workspace ai2/lm-eval --cluster ai2/aristo-cirrascale \
+   --beaker-image oyvindt/OLMoEvalLatest \
+   --env 'HF_DATASETS_CACHE=/net/nfs.cirrascale/aristo/oyvindt/hf_datasets_cache' -- \
+   python llm_eval/run_lm_eval.py \
+   --model lm::pretrained=EleutherAI/pythia-160m,revision=step140000 \
+   --task arc_challenge arc_easy boolq  --split validation \
+   --full_output_file /results/predictions.jsonl --metrics_file /results/metrics.json \
+   --model_max_length 2048 --max_batch_tokens 4096 --num_recorded_inputs 3 \
+   --num_shots 0 --gsheet OLMo-evals-testing
+```
+or reference a config file, either in `nfs` or a beaker dataset (which can be mounted
+in the gantry command).
+
diff --git a/README.md b/README.md
@@ -27,7 +27,7 @@ The current `task_sets` can be found at [configs/task_sets](configs/task_sets).
 The configuration can be run as follows:
 
 ```commandline
-tango --settings tango.yml run configs/evaluation_template.jsonnet --workspace my-eval-workspace
+tango --settings tango.yml run configs/example_config.jsonnet --workspace my-eval-workspace
 ```
 
 This executes all the steps defined in the config, and saves them in a local `tango` workspace called `my-eval-workspace`. If you add a new task_set or model to your config and run the same command again, it will reuse the previous outputs, and only compute the new outputs.
@@ -55,5 +55,7 @@ result = workspace.step_result("outputs_pythia-1bstep140000_gen_tasks_drop")
 
 * [Save output to google sheet](ADVANCED.md#save-output-to-google-sheet)
 * [Use a remote workspace](ADVANCED.md#use-a-remote-workspace)
+* [Run without Tango (useful for debugging)](ADVANCED.md#run-without-tango)
+* [Run on Beaker](BEAKER.md)