Skip to content

Commit

Permalink
scicode bench
Browse files Browse the repository at this point in the history
  • Loading branch information
neginraoof authored Jan 30, 2025
1 parent c734d38 commit e689451
Showing 1 changed file with 10 additions and 9 deletions.
19 changes: 10 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,19 +13,19 @@ Evalchemy is a unified and easy-to-use toolkit for evaluating language models, f
- [vLLM models](https://blog.vllm.ai/2023/06/20/vllm.html): High-performance inference and serving engine with PagedAttention technology
```bash
python -m eval.eval \
--model vllm \
--tasks alpaca_eval \
--model_args "pretrained=meta-llama/Meta-Llama-3-8B-Instruct" \
--batch_size 16 \
--output_path logs
--model vllm \
--tasks alpaca_eval \
--model_args "pretrained=meta-llama/Meta-Llama-3-8B-Instruct" \
--batch_size 16 \
--output_path logs
```
- [OpenAI models](https://openai.com/): Full support for OpenAI's model lineup
```bash
python -m eval.eval \
--model openai-chat-completions \
--tasks alpaca_eval \
--model_args "model=gpt-4o-mini-2024-07-18,num_concurrent=32" \
--batch_size 16 \
--model openai-chat-completions \
--tasks alpaca_eval \
--model_args "model=gpt-4o-mini-2024-07-18,num_concurrent=32" \
--batch_size 16 \
--output_path logs
```

Expand Down Expand Up @@ -97,6 +97,7 @@ huggingface-cli login
- **Arena-Hard-Auto** (Coming soon): [Automatic evaluation tool for instruction-tuned LLMs](https://github.com/lmarena/arena-hard-auto)
- **SWE-Bench** (Coming soon): [Evaluating large language models on real-world software issues](https://github.com/princeton-nlp/SWE-bench)
- **SafetyBench** (Coming soon): [Evaluating the safety of LLMs](https://github.com/thu-coai/SafetyBench)
- **SciCode Bench** (Coming soon): [Evaluate language models in generating code for solving realistic scientific research problems](https://github.com/scicode-bench/SciCode)
- **Berkeley Function Calling Leaderboard** (Coming soon): [Evaluating ability of LLMs to use APIs](https://gorilla.cs.berkeley.edu/blogs/13_bfcl_v3_multi_turn.html)


Expand Down

0 comments on commit e689451

Please sign in to comment.