This evaluator uses a language model to numerically score a payload based on how likely they are to be satisfied with it, ranging from 0.0 (not satisfied at all) to 1.0 (completely satisfied).
Configuration
Option
Description
Type
Default
Required
Constraints
Model
The specific language model used for evaluation
ai.NamespacedModel
openai/gpt-3.5-turbo
true
Instructions
Evaluation criteria as provided to the LLM
string
"You are an LLM evaluator. Please score from 0.0 to 1.0 how likely the user is to be satisfied with this answer, from 0.0 being not satisfied at all to 1.0 being completely satisfied."
true
MinLength: 1
MaxTokens
The maximum number of tokens allowed in the prompt
int
8192
true
Min: 1
PassingThreshold
The minimum score required to consider the evaluation passed
float64
0.5
true
Metrics
This evaluator does not report specific metrics beyond the binary outcome of the evaluation (passed or failed).