CLI#

benchtool#

BenchTools is a tool that helps researchers set up benchmarks.

Usage

benchtool [OPTIONS] COMMAND [ARGS]...

add-task#

Set up a new task.

Usage

benchtool add-task [OPTIONS] TASK_NAME

Options

-p, --bench-path <bench_path>#: The path to the benchmark repository where the task will be added.

-s, --task-source <task_source>#: Required The relative path to content that already exists`

-t, --task-type <task_type>#

Required The type of the task content being added. Options are csv or yml

Options:: folders | list

Arguments

TASK_NAME#: Required argument

demo#

demo benchmarks package with benchtools

Usage

benchtool demo [OPTIONS] COMMAND [ARGS]...

install#

install and optionally run a demo

Usage

benchtool demo install [OPTIONS]

Options

-n, --demo-name <demo_name>#

-t, --target-dir <target_dir>#: target directory for the demo

-r, --run#: optionally, run the demo after sintalling

list#

list available demos

Usage

benchtool demo list [OPTIONS]

Options

-c, --concept#: include concept descriptions in list

init#

Initializes a new benchmark.

Benchmark-name is required, if not provided, requested interactively.

this command creates the folder for the benchmark.

Usage

benchtool init [OPTIONS] [BENCHMARK_NAME]

Options

-p, --path <path>#: The path where the new benchmark repository will be placed

-a, --about <about>#: Benchmark describtion. Content will go in the about.md file

--no-git#: Don’t make benchmark a git repository. Default is False

Arguments

BENCHMARK_NAME#: Optional argument

run#

Run the benchmark, generate logs, and optionally sore

Usage

benchtool run [OPTIONS] [BENCHMARK_PATH]

Options

-r, --runner-type <runner_type>#

The engine that will run your LLM.

Options:: ollama | openai | bedrock

-m, --model <model>#: The LLM to be benchmarked.

-a, --api <api>#: The api base url required to access the runner engine.

-l, --log-path <log_path>#: The path to a log directory.

-s, --score#: flag to score each task while running

-R, --runner-file <runner_file>#: use runner.yml configuration, if provided overrides options

Arguments

BENCHMARK_PATH#: Optional argument

run-task#

Running the tasks and generating logs

, help=”The path to the benchmark repository where all the task reside.” , help=”The name of the specific task you would like to run”

Usage

benchtool run-task [OPTIONS] BENCHMARK_PATH TASK_NAME

Options

-r, --runner-type <runner_type>#

The engine that will run your LLM.

Options:: ollama | openai | bedrock

-m, --model <model>#: The LLM to be benchmarked.

-a, --api-url <api_url>#: The api call required to access the runner engine.

-l, --log-path <log_path>#: The path to a log directory.

Arguments

BENCHMARK_PATH#: Required argument

TASK_NAME#: Required argument

score#

Running the benchmark and generating logs Parameters:

benchmark-path: The path to the benchmark repository where all the task reside.

Usage

benchtool score [OPTIONS] [BENCHMARK_PATH]

Options

-r, --result-id <result_id>#: runs to score: ‘last’,’all’ or specific ids

-c, --csv#: save csv of eval in additon to json

-C, --collate#: collate scores rather than recomputing them

Arguments

BENCHMARK_PATH#: Optional argument