Example Files#

Task List#

An example task.yml file:

tasks.yml#

- name: product
  template: "find the product of {a} and {b}"
  values:
     a: [2,3,5]
     b: [3,4,5]
  reference: calculated
  scorer: "calculated_answer"
  format: "IntAnswer"
- name: product_combination
  template: "find the product of {a} and {b}"
  values:
     a: [2,3,5]
     b: [3,4,5]
  reference: calculated
  value_combinations: combinations
  scorer: 
    - calculated_answer
    - add_justify
  format: "IntJustification"
- name: symbol
  template: "what is the name for the following symbol? {symb}"
  values: 
     symb: ["@","$","#"]
     exmeta: ['internet', 'world', 'internet']
  reference: ["at", "dollar sign", "pound"]
  scorer: "contains"
- name: symbol_dir
  template: "what direction does the {symb} symbol point? and what is its name "
  values: 
     symb: ["^","<",">"]
     name: [carat, less,greater]
     direction: ["up", "left", "right"]
  scorer: check_name_dir
  reference: calculated
  format: NameSource

Folder specified Task#

It comprises a text file for the template

template.txt#

what is {a} + {b}?

and then values for the template feilds are in a csv file:

values.csv#

a,b,reference
2,3,5
4,5,9
8,9,17

Runner Specification#

Single run settings#

runner.yml#

runner_type: ollama
model: 'llama3.2'

Specifying multiple models:#

multiple_models.yml#

runner_type: ollama
model: 
 - 'llama3.2'
 - 'gemma3'

Custom Response format#

Classes should be like those in the responses class. Use pydantic.BaseModel for the response formats an enum.Enum to restrict options.

custom_response.py#

from pydantic import BaseModel
from enum import Enum

In order to constrain options, create a class for that:

custom_response.py#

class Direction(str,Enum):
    left="left"
    right="right"
    up="up"

Then that can be a field in a response.

custom_response.py#

class NameSource(BaseModel):
    name: str
    direction : Direction

Custom Scorer#

For custom scoring, add a file custom_scorer.py. There can be multiple functions in one file. A function should take two inputs, the response and the reference.

If reference is set to “calculated” in the tasks.yml for list-style or info.yml for a folder-style, then the reference will be a dictionary of the values for the class, with the prompt_id added.

custom_scorer.py#

import json
def calculated_answer(response,values):
    '''
    example function for calcuating the correct answer from the values
    '''
    # parse the response object
    response_object = json.loads(response)
    # compute the answer
    ref = values['a'] *values['b']
    # check the answer or otherwise calculate
    return int(ref == response_object['answer'])

def check_name_dir(response,values):
    '''
    '''
    # parse the response object
    response_object = json.loads(response)
    return {'name':values['name']==response_object['name'],
            'direction':values['direction']==response_object['direction'],}

def add_justify(response,values):
    response_object = json.loads(response)
    return int('add' in response_object['justification'])

The function can return either a scalar numerical value or a dictionary for multiple values.