Overview#

benchtool is a python library with a CLI for creating, managing, and running AI benchmarks.

Key terms#

  • A benchmark is comprised of several tasks, structured data, and documentation.

  • A task consists of a prompt, optionally a template with variations, a means of scoring, and a reference value if necessary for scoring

  • a scoring function must have an API that takes 2 inputs, response and refernce, though reference may be unused and passed as None

Sample benchmarks#

The easiest way to get familar is to run the demo benchmarks that are available in the repository

Clone the repository then explore the benchtools/demos folder