CLI#

benchtool#

BenchTools is a tool that helps researchers set up benchmarks.

Usage

benchtool [OPTIONS] COMMAND [ARGS]...

add-task#

Set up a new task.

Usage

benchtool add-task [OPTIONS] TASK_NAME

Options

-p, --bench-path <bench_path>#

The path to the benchmark repository where the task will be added.

-s, --task-source <task_source>#

Required The relative path to content that already exists`

-t, --task-type <task_type>#

Required The type of the task content being added. Options are csv or yml

Options:

folders | list

Arguments

TASK_NAME#

Required argument

init#

Initializes a new benchmark.

Benchmark-name is required, if not provided, requested interactively.

this command creates the folder for the benchmark.

Usage

benchtool init [OPTIONS] [BENCHMARK_NAME]

Options

-p, --path <path>#

The path where the new benchmark repository will be placed

-a, --about <about>#

Benchmark describtion. Content will go in the about.md file

--no-git#

Don’t make benchmark a git repository. Default is False

Arguments

BENCHMARK_NAME#

Optional argument

run#

Run the benchmark, generate logs, and optionally sore

Usage

benchtool run [OPTIONS] [BENCHMARK_PATH]

Options

-r, --runner-type <runner_type>#

The engine that will run your LLM.

Options:

ollama | openai | bedrock

-m, --model <model>#

The LLM to be benchmarked.

-a, --api <api>#

The api base url required to access the runner engine.

-l, --log-path <log_path>#

The path to a log directory.

-s, --score#

flag to score each task while running

-R, --runner-file <runner_file>#

use runner.yml configuration, if provided overrides options

Arguments

BENCHMARK_PATH#

Optional argument

run-task#

Running the tasks and generating logs

, help=”The path to the benchmark repository where all the task reside.” , help=”The name of the specific task you would like to run”

Usage

benchtool run-task [OPTIONS] BENCHMARK_PATH TASK_NAME

Options

-r, --runner-type <runner_type>#

The engine that will run your LLM.

Options:

ollama | openai | bedrock

-m, --model <model>#

The LLM to be benchmarked.

-a, --api-url <api_url>#

The api call required to access the runner engine.

-l, --log-path <log_path>#

The path to a log directory.

Arguments

BENCHMARK_PATH#

Required argument

TASK_NAME#

Required argument

score#

Running the benchmark and generating logs Parameters:

benchmark-path: The path to the benchmark repository where all the task reside.

Usage

benchtool score [OPTIONS] [BENCHMARK_PATH]

Options

-r, --result-id <result_id>#

runs to score: ‘last’,’all’ or specific ids

-c, --csv#

save csv of eval in additon to json

-C, --collate#

collate scores rather than recomputing them

Arguments

BENCHMARK_PATH#

Optional argument