Example Files#

Task List#

An example task.yml file:

tasks.yml#
 1- name: product
 2  template: "find the product of {a} and {b}"
 3  values:
 4     a: [2,3,5]
 5     b: [3,4,5]
 6  reference: calculated
 7  scorer: "calculated_answer"
 8  format: "IntAnswer"
 9- name: product_combination
10  template: "find the product of {a} and {b}"
11  values:
12     a: [2,3,5]
13     b: [3,4,5]
14  reference: calculated
15  value_combinations: combinations
16  scorer: 
17    - calculated_answer
18    - add_justify
19  format: "IntJustification"
20- name: symbol
21  template: "what is the name for the following symbol? {symb}"
22  values: 
23     symb: ["@","$","#"]
24     exmeta: ['internet', 'world', 'internet']
25  reference: ["at", "dollar sign", "pound"]
26  scorer: "contains"
27- name: symbol_dir
28  template: "what direction does the {symb} symbol point? and what is its name "
29  values: 
30     symb: ["^","<",">"]
31     name: [carat, less,greater]
32     direction: ["up", "left", "right"]
33  scorer: check_name_dir
34  reference: calculated
35  format: NameSource

Folder specified Task#

It comprises a text file for the template

template.txt#
what is {a} + {b}?

and then values for the template feilds are in a csv file:

values.csv#
a,b,reference
2,3,5
4,5,9
8,9,17

Runner Specification#

Single run settings#

runner.yml#
1runner_type: ollama
2model: 'llama3.2'

Specifying multiple models:#

multiple_models.yml#
1runner_type: ollama
2model: 
3 - 'llama3.2'
4 - 'gemma3'

Custom Response format#

Classes should be like those in the responses class. Use pydantic.BaseModel for the response formats an enum.Enum to restrict options.

custom_response.py#
1from pydantic import BaseModel
2from enum import Enum

In order to constrain options, create a class for that:

custom_response.py#
1class Direction(str,Enum):
2    left="left"
3    right="right"
4    up="up"

Then that can be a field in a response.

custom_response.py#
1class NameSource(BaseModel):
2    name: str
3    direction : Direction

Custom Scorer#

For custom scoring, add a file custom_scorer.py. There can be multiple functions in one file. A function should take two inputs, the response and the reference.

If reference is set to “calculated” in the tasks.yml for list-style or info.yml for a folder-style, then the reference will be a dictionary of the values for the class, with the prompt_id added.

custom_scorer.py#
 1import json
 2def calculated_answer(response,values):
 3    '''
 4    example function for calcuating the correct answer from the values
 5    '''
 6    # parse the response object
 7    response_object = json.loads(response)
 8    # compute the answer
 9    ref = values['a'] *values['b']
10    # check the answer or otherwise calculate
11    return int(ref == response_object['answer'])
12
13def check_name_dir(response,values):
14    '''
15    '''
16    # parse the response object
17    response_object = json.loads(response)
18    return {'name':values['name']==response_object['name'],
19            'direction':values['direction']==response_object['direction'],}
20
21def add_justify(response,values):
22    response_object = json.loads(response)
23    return int('add' in response_object['justification'])

The function can return either a scalar numerical value or a dictionary for multiple values.