BenchTools#

A library for building and running benchmarks

Install#

You can install directly from github or after cloning

direct install#

pip install git+https://github.com/ml4sts/benchtools.git

By clone#

You can clone first

git clone https://github.com/ml4sts/benchtools.git

and then install

pip install benchtools

(possibly pip3)

if you clone in order to develop, you may want to install with pip’s -e option

pip install -e benchtools

To update, pull and install again.

Usage#

benchtools allows you to express templated tasks in multiple ways:

a yaml format listing the tasks with a values key
a folder for each task with txt file with template and a csv file of values for variations of the task

a benchmark can consist of tasks that all fit a single format above or a mixture of meta-tasks each represented as a folder and then the specific tasks in one of the forms above

The fastest way to get familiar is to install a demo, one in the yaml format (listbench) and one in the folder format (folderbench) as follows:

benchtool demo install -n BENCHMARK_NAME

You can also run at the same time.

benchtool demo install -n BENCHMARK_NAME -r

Contents:

benchtools#

BenchTools is a tool that helps researchers set up benchmarks.

Usage

benchtools [OPTIONS] COMMAND [ARGS]...

add-task#

Set up a new task.

Usage

benchtools add-task [OPTIONS] TASK_NAME

Options

-p, --bench-path <bench_path>#: The path to the benchmark repository where the task will be added.

-s, --task-source <task_source>#: Required The relative path to content that already exists`

-t, --task-type <task_type>#

Required The type of the task content being added. Options are csv or yml

Options:: folders | list

Arguments

TASK_NAME#: Required argument

demo#

demo benchmarks package with benchtools

Usage

benchtools demo [OPTIONS] COMMAND [ARGS]...

install#

install and optionally run a demo

Usage

benchtools demo install [OPTIONS]

Options

-n, --demo-name <demo_name>#

-t, --target-dir <target_dir>#: target directory for the demo

-r, --run#: optionally, run the demo after sintalling

list#

list available demos

Usage

benchtools demo list [OPTIONS]

Options

-c, --concept#: include concept descriptions in list

init#

Initializes a new benchmark.

Benchmark-name is required, if not provided, requested interactively.

this command creates the folder for the benchmark.

Usage

benchtools init [OPTIONS] [BENCHMARK_NAME]

Options

-p, --path <path>#: The path where the new benchmark repository will be placed

-a, --about <about>#: Benchmark describtion. Content will go in the about.md file

--no-git#: Don’t make benchmark a git repository. Default is False

Arguments

BENCHMARK_NAME#: Optional argument

run#

Run the benchmark, generate logs, and optionally sore

Usage

benchtools run [OPTIONS] [BENCHMARK_PATH]

Options

-r, --runner-type <runner_type>#

The engine that will run your LLM.

Options:: ollama | openai | bedrock

-m, --model <model>#: The LLM to be benchmarked.

-a, --api <api>#: The api base url required to access the runner engine.

-l, --log-path <log_path>#: The path to a log directory.

-s, --score#: flag to score each task while running

-R, --runner-file <runner_file>#: use runner.yml configuration, if provided overrides options

Arguments

BENCHMARK_PATH#: Optional argument

run-task#

Running the tasks and generating logs

, help=”The path to the benchmark repository where all the task reside.” , help=”The name of the specific task you would like to run”

Usage

benchtools run-task [OPTIONS] BENCHMARK_PATH TASK_NAME

Options

-r, --runner-type <runner_type>#

The engine that will run your LLM.

Options:: ollama | openai | bedrock

-m, --model <model>#: The LLM to be benchmarked.

-a, --api-url <api_url>#: The api call required to access the runner engine.

-l, --log-path <log_path>#: The path to a log directory.

Arguments

BENCHMARK_PATH#: Required argument

TASK_NAME#: Required argument

score#

Running the benchmark and generating logs Parameters:

benchmark-path: The path to the benchmark repository where all the task reside.

Usage

benchtools score [OPTIONS] [BENCHMARK_PATH]

Options

-r, --result-id <result_id>#: runs to score: ‘last’,’all’ or specific ids

-c, --csv#: save csv of eval in additon to json

-C, --collate#: collate scores rather than recomputing them

Arguments

BENCHMARK_PATH#: Optional argument