Skip to content

Releases: empirical-run/empirical

v0.4.0

09 Apr 04:45
b632035
Compare
Choose a tag to compare

What's Changed

Full Changelog: https://github.com/empirical-run/empirical/compare/@empiricalrun/cli@0.3.1...@empiricalrun/cli@0.4.0

v0.3.1

06 Apr 06:42
bcf4572
Compare
Choose a tag to compare

What's Changed

Full Changelog: https://github.com/empirical-run/empirical/compare/@empiricalrun/cli@0.3.0...@empiricalrun/cli@0.3.1

v0.3.0

04 Apr 09:28
e674d26
Compare
Choose a tag to compare

Empirical is a CLI and web UI for developers to test different LLMs, prompts and other model configurations — across all the scenarios that matter. Try it out with the quick start →

This is our first open source release and lays out the core primitives and capabilities of the product

Capabilities

Configuration

  • Empirical has a declarative configuration file for your tests in empiricalrc.json (see an example)
    • Each config has 3 parts: model providers (what to test), datasets (scenarios to test), scorers (measure output quality)

Model providers

  • Empirical can run tests against off-the-shelf LLMs and custom models or apps (e.g. RAG)
    • Off-the-shelf LLMs: Models hosted by OpenAI, Anthropic, Mistral, Google and Fireworks are supported today
    • Custom models or apps: You can write a Python script that behaves as an entry point to run tests against custom models or RAG applications

Datasets

Scorers

Web UI

  • Review results and compare model configurations side-by-side in our web UI

Continuous evaluation in CI

Get in touch

File an issue or join our Discord — we look forward to hearing from you ^_^