BenchTools#

A library for building and running benchmarks

Install#

You can install directly from github or after cloning

diect install#

By clone#

You can clone first

git clone https://github.com/ml4sts/benchtools.git

and then install

pip install benchtools

(possibly pip3)

if you clone in order to develop, you may want to install with pip’s -e option

pip install -e benchtools

To update, pull and install again.

Usage#

benchtools allows you to express templated tasks in multiple ways:

a yaml format listing the tasks with a values key
a folder for each task with txt file with template and a csv file of values for variations of the task

a benchmark can consist of tasks that all fit a single format above or a mixture of meta-tasks each represented as a folder and then the specific tasks in one of the forms above

There are two main ways to use BenchTools. The user can mix and match between the two methods.

Contents:

benchtools#

BenchTools is a tool that helps researchers set up benchmarks.

Usage

benchtools [OPTIONS] COMMAND [ARGS]...

add-task#

Set up a new task.

Usage

benchtools add-task [OPTIONS] TASK_NAME

Options

-p, --benchmark-path <benchmark_path>#: The path to the benchmark repository where the task will be added.

-s, --task-source <task_source>#: Required The relative path to content that already exists`

-t, --task-type <task_type>#

Required The type of the task content being added. Options are csv or yml

Options:: folders | list

Arguments

TASK_NAME#: Required argument

init#

Initializes a new benchmark.

Benchmark-name is required, if not provided, requested interactively.

this command creates the folder for the benchmark.

Usage

benchtools init [OPTIONS] [BENCHMARK_NAME]

Options

-p, --path <path>#: The path where the new benchmark repository will be placed

-a, --about <about>#: Benchmark describtion. Content will go in the about.md file

--no-git#: Don’t make benchmark a git repository. Default is False

Arguments

BENCHMARK_NAME#: Optional argument

run#

Running the benchmark and generating logs , help=”The path to the benchmark repository where all the task reside.”

Usage

benchtools run [OPTIONS] BENCHMARK_PATH

Options

-r, --runner-type <runner_type>#

The engine that will run your LLM.

Options:: ollama | openai | aws

-m, --model <model>#: The LLM to be benchmarked.

-a, --api-url <api_url>#: The api call required to access the runner engine.

-l, --log-path <log_path>#: The path to a log directory.

Arguments

BENCHMARK_PATH#: Required argument

run-task#

Running the tasks and generating logs

, help=”The path to the benchmark repository where all the task reside.” , help=”The name of the specific task you would like to run”

Usage

benchtools run-task [OPTIONS] BENCHMARK_PATH TASK_NAME

Options

-r, --runner-type <runner_type>#

The engine that will run your LLM.

Options:: ollama | openai | aws

-m, --model <model>#: The LLM to be benchmarked.

-a, --api-url <api_url>#: The api call required to access the runner engine.

-l, --log-path <log_path>#: The path to a log directory.

Arguments

BENCHMARK_PATH#: Required argument

TASK_NAME#: Required argument