Score LLM-as-Judge (Modelmetry)
Last updated
This evaluator uses a language model to numerically score a payload based on how likely they are to be satisfied with it, ranging from 0.0 (not satisfied at all) to 1.0 (completely satisfied).
Model
The specific language model used for evaluation
ai.NamespacedModel
openai/gpt-3.5-turbo
true
Instructions
Evaluation criteria as provided to the LLM
string
"You are an LLM evaluator. Please score from 0.0 to 1.0 how likely the user is to be satisfied with this answer, from 0.0 being not satisfied at all to 1.0 being completely satisfied."
true
MinLength: 1
MaxTokens
The maximum number of tokens allowed in the prompt
int
8192
true
Min: 1
PassingThreshold
The minimum score required to consider the evaluation passed
float64
0.5
true
This evaluator does not report specific metrics beyond the binary outcome of the evaluation (passed or failed).
Last updated