User Guide#

benchtool is a python library with a CLI for creating, managing, and running AI benchmarks.

Key terms#

A benchmark is comprised of several tasks, structured data, and documentation.
A task consists of a prompt, optionally a template with variations, a means of scoring, and a reference value if necessary for scoring
a scoring function must have an API that takes 2 inputs, response and refernce, though reference may be unused and passed as None

The easiest way to get familar is to run the demo benchmarks that are available in the repository

Clone the repository then explore the benchtools/demos folder

Contents: