BenchTools as a Python Library#

A tiny example#

we can create a tiny benchmark programmatically

from benchtools import Bench

tiny_bench = Bench('Tiniest Demo', concept ='the simplest test')

we can also create a simple task programmatically

from benchtools import Task

tt = Task('greeting','Hello there','hi', 'contains')
response = tt.run()
tt.score(response)
tiny_bench.add_task(tt)

There are multiple ways to creating a Task object

add_task = Task.from_txt_csv('../../demos/folderbench/tasks/add')
tiny_bench.add_task(add_task)

For demo purposes we delete the folder, if it exists, before running.

%%bash
rm  -rf tiniest_demo

We create a new folder for a benchmark to store it in the file system

tiny_bench.initialize_dir()
tiny_bench.run()
pre_built_yml = Bench.from_yaml('../../demos/listbench')
pre_built_yml.written

we can access individual tasks:

pre_built_yml.tasks['product'].variant_values
pre_built_yml.run()
demo_bench = Bench.from_yaml('../../demos/listbench')

Creating a Benchmark object#

class benchtools.runner.BenchRunner(runner_type='ollama', model='gemma3:1b', api_url=None)#

A BenchRunner holds information about how a task is going to be run.

Benchmark class#