Skip to content

Commit

Permalink
add beaker docs
Browse files Browse the repository at this point in the history
  • Loading branch information
AkshitaB committed Dec 5, 2023
1 parent 7df251a commit f9a7df4
Show file tree
Hide file tree
Showing 3 changed files with 84 additions and 4 deletions.
25 changes: 22 additions & 3 deletions ADVANCED.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,17 +33,36 @@ gcloud config set project <your-project>
* Run with a `gs:` workspace.

```commandline
tango --settings tango.yml run configs/evaluation_template.jsonnet --workspace gs://my-gs-workspace
tango --settings tango.yml run configs/example_config.jsonnet --workspace gs://my-gs-workspace
```

This will create a `tango` workspace in google cloud bucket.

💡 See `tango.yml` to set this as the default option.
💡 See [`tango.yml`](tango.yml) to set this as the default option.

### Troubleshooting

If some error causes your google workspace to go into a bad state (i.e., you get errors that say step should not be in completed state, etc.), you can clear the workspace with

```commandline
python scripts/empty_workspace.py my-gs-workspace
```
```


## Run without Tango

The [`llm_eval/run_lm_eval.py`](llm_eval/run_lm_eval.py) script provides a way to run an evaluation as a single
job with associated result set. Arguments can be provided in a config file, an example is found
in `configs/run_lm_eval_example.jsonnet`, or as direct arguments (see documentation in script). E.g.,

```commandline
python -m llm_eval.run_lm_eval --config_file configs/run_lm_eval_example.jsonnet
```
or
```commandline
python -m llm_eval.run_lm_eval --model lm::pretrained=EleutherAI/pythia-160m,revision=step140000 \
--task arc_challenge arc_easy --split validation \
--full_output_file predictions.jsonl --metrics_file metrics.json --model_max_length 2048 \
--max_batch_tokens 4096 --num_recorded_inputs 3 --num_shots 0 --gsheet OLMo-evals-testing
```

59 changes: 59 additions & 0 deletions BEAKER.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@

# Run on beaker

## Option 1: Run full pipeline in an interactive session

* Start an interactive session with the required number of GPUs
* Run as you would run locally, i.e.,

```commandline
tango --settings tango.yml run configs/example_config.jsonnet --workspace gs://my-gs-workspace
```

## Option 2: Run each step as a different beaker job

This will run each step of the pipeline individually in different beaker experiments. Compute resources required for each step will be provisioned separately.

<details>
<summary>Creating a beaker image (❗Update each time the catwalk / tango version is updated).</summary>

This is done so that each individual step does not need to install catwalk and tango, and other libraries, which can be slow.

[Reference](https://beaker-docs.apps.allenai.org/interactive/images.html#building-custom-images)

```commandline
beaker session create --gpus 1 --image beaker://ai2/cuda11.5-cudnn8-dev-ubuntu20.04 --bare --save-image
conda create -n eval-env python=3.10
conda activate eval-env
pip install -e '.[dev]'
exit
beaker image rename <image-id> llm_eval_image
```
</details>

```commandline
tango --settings tango-in-beaker.yml run configs/example_config.jsonnet --workspace gs://my-gs-workspace
```

💡 See [`tango-in-beaker.yml`](tango-in-beaker.yml) for all configurable options.

## Option 3: Run full pipeline in a single beaker job

Note: Use with `llm_eval/run_llm_eval.py`. See details [here](ADVANCED.md#run-without-tango).

Use [beaker-gantry](https://github.com/allenai/beaker-gantry), e.g.,

```commandline
gantry run --gpus 1 --venv base --workspace ai2/lm-eval --cluster ai2/aristo-cirrascale \
--beaker-image oyvindt/OLMoEvalLatest \
--env 'HF_DATASETS_CACHE=/net/nfs.cirrascale/aristo/oyvindt/hf_datasets_cache' -- \
python llm_eval/run_lm_eval.py \
--model lm::pretrained=EleutherAI/pythia-160m,revision=step140000 \
--task arc_challenge arc_easy boolq --split validation \
--full_output_file /results/predictions.jsonl --metrics_file /results/metrics.json \
--model_max_length 2048 --max_batch_tokens 4096 --num_recorded_inputs 3 \
--num_shots 0 --gsheet OLMo-evals-testing
```
or reference a config file, either in `nfs` or a beaker dataset (which can be mounted
in the gantry command).

4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ The current `task_sets` can be found at [configs/task_sets](configs/task_sets).
The configuration can be run as follows:

```commandline
tango --settings tango.yml run configs/evaluation_template.jsonnet --workspace my-eval-workspace
tango --settings tango.yml run configs/example_config.jsonnet --workspace my-eval-workspace
```

This executes all the steps defined in the config, and saves them in a local `tango` workspace called `my-eval-workspace`. If you add a new task_set or model to your config and run the same command again, it will reuse the previous outputs, and only compute the new outputs.
Expand Down Expand Up @@ -55,5 +55,7 @@ result = workspace.step_result("outputs_pythia-1bstep140000_gen_tasks_drop")

* [Save output to google sheet](ADVANCED.md#save-output-to-google-sheet)
* [Use a remote workspace](ADVANCED.md#use-a-remote-workspace)
* [Run without Tango (useful for debugging)](ADVANCED.md#run-without-tango)
* [Run on Beaker](BEAKER.md)


0 comments on commit f9a7df4

Please sign in to comment.