CLI#

We can initialize without tasks

cd demos
benchtool init testbench -a "to test a simple example" --no-git
cd testbench
benchtool add-task ../new_test/ FillIn ../datasets/miscops/
benchtool run demos/folderbench

benchtool#

BenchTools is a tool that helps researchers set up benchmarks.

Usage

benchtool [OPTIONS] COMMAND [ARGS]...

add-task#

Set up a new task.

Usage

benchtool add-task [OPTIONS] TASK_NAME

Options

-p, --benchmark-path <benchmark_path>#

The path to the benchmark repository where the task will be added.

-s, --task-source <task_source>#

Required The relative path to content that already exists`

-t, --task-type <task_type>#

Required The type of the task content being added. Options are csv or yml

Options:

folders | list

Arguments

TASK_NAME#

Required argument

init#

Initializes a new benchmark.

Benchmark-name is required, if not provided, requested interactively.

this command creates the folder for the benchmark.

Usage

benchtool init [OPTIONS] [BENCHMARK_NAME]

Options

-p, --path <path>#

The path where the new benchmark repository will be placed

-a, --about <about>#

Benchmark describtion. Content will go in the about.md file

--no-git#

Don’t make benchmark a git repository. Default is False

Arguments

BENCHMARK_NAME#

Optional argument

run#

Running the benchmark and generating logs , help=”The path to the benchmark repository where all the task reside.”

Usage

benchtool run [OPTIONS] BENCHMARK_PATH

Options

-r, --runner-type <runner_type>#

The engine that will run your LLM.

Options:

ollama | openai | aws

-m, --model <model>#

The LLM to be benchmarked.

-a, --api-url <api_url>#

The api call required to access the runner engine.

-l, --log-path <log_path>#

The path to a log directory.

Arguments

BENCHMARK_PATH#

Required argument

run-task#

Running the tasks and generating logs

, help=”The path to the benchmark repository where all the task reside.” , help=”The name of the specific task you would like to run”

Usage

benchtool run-task [OPTIONS] BENCHMARK_PATH TASK_NAME

Options

-r, --runner-type <runner_type>#

The engine that will run your LLM.

Options:

ollama | openai | aws

-m, --model <model>#

The LLM to be benchmarked.

-a, --api-url <api_url>#

The api call required to access the runner engine.

-l, --log-path <log_path>#

The path to a log directory.

Arguments

BENCHMARK_PATH#

Required argument

TASK_NAME#

Required argument