BenchTools as a Python Library#
A tiny example#
we can create a tiny benchmark programmatically
from benchtools import Bench
tiny_bench = Bench('Tiniest Demo', concept ='the simplest test')
we can also create a simple task programmatically
from benchtools import Task
tt = Task('greeting','Hello there','hi', 'contains')
response = tt.run()
tt.score(response)
tiny_bench.add_task(tt)
There are multiple ways to creating a Task object
add_task = Task.from_txt_csv('../../demos/folderbench/tasks/add')
tiny_bench.add_task(add_task)
For demo purposes we delete the folder, if it exists, before running.
%%bash
rm -rf tiniest_demo
We create a new folder for a benchmark to store it in the file system
tiny_bench.initialize_dir()
tiny_bench.run()
pre_built_yml = Bench.from_yaml('../../demos/listbench')
pre_built_yml.written
we can access individual tasks:
pre_built_yml.tasks['product'].variant_values
pre_built_yml.run()
demo_bench = Bench.from_yaml('../../demos/listbench')
Creating a Benchmark object#
- class benchtools.runner.BenchRunner(runner_type='ollama', model='gemma3:1b', api_url=None)#
A BenchRunner holds information about how a task is going to be run.