Scoring and Evaluation#
You can set a scorer as one of the builtin or a custom one.
To use a custom function include a file named custom_scorer.py and include a function that takes in:
response: will be a string, formatted according to the
formatkeyreference: either the reference answer provided or the values used if
reference: calculatedin the setup
The score function can return a single value, which will be stored in a score key or a dictionary, where its keys will be retained.
Tip
see the product task in the listbench demo for an example of custom scoring
Warning
custom scoring not fully implemented for folder tasks, there needs to be a way to specify it and load it.