Releases: empirical-run/empirical
Releases · empirical-run/empirical
v0.4.0
What's Changed
- fix: run config gets hidden on opening after scroll by @KaustubhKumar05 in #92
- feat: add interactivity to the UI by @saikatmitra91 in #85
- docs: for configuration file, minor fixes elsewhere by @arjunattam in #95
- feat: add support for output metadata by @saikatmitra91 in #100
Full Changelog: https://github.com/empirical-run/empirical/compare/@empiricalrun/cli@0.3.1...@empiricalrun/cli@0.4.0
v0.3.1
What's Changed
- fix: top nav bar in UI should be sticky by @KaustubhKumar05 in #75
- feat: show output fetch errors in UI by @KaustubhKumar05 in #90
Full Changelog: https://github.com/empirical-run/empirical/compare/@empiricalrun/cli@0.3.0...@empiricalrun/cli@0.3.1
v0.3.0
Empirical is a CLI and web UI for developers to test different LLMs, prompts and other model configurations — across all the scenarios that matter. Try it out with the quick start →
This is our first open source release and lays out the core primitives and capabilities of the product
Capabilities
Configuration
- Empirical has a declarative configuration file for your tests in
empiricalrc.json
(see an example)- Each config has 3 parts: model providers (what to test), datasets (scenarios to test), scorers (measure output quality)
Model providers
- Empirical can run tests against off-the-shelf LLMs and custom models or apps (e.g. RAG)
- Off-the-shelf LLMs: Models hosted by OpenAI, Anthropic, Mistral, Google and Fireworks are supported today
- Custom models or apps: You can write a Python script that behaves as an entry point to run tests against custom models or RAG applications
Datasets
- Specify scenarios to test as samples in the configuration, or import them from file
Scorers
- Measure the quality of your output with built-in scoring functions, or write your own scorers as LLM prompts or Python functions
Web UI
- Review results and compare model configurations side-by-side in our web UI
- This brings the Empirical playground and comparison pages to your local environment
Continuous evaluation in CI
- Run your tests in GitHub Actions and get results reported as a PR comment
Get in touch
File an issue or join our Discord — we look forward to hearing from you ^_^