Score LLM-as-Judge (Modelmetry)
This evaluator uses a language model to numerically score a payload based on how likely they are to be satisfied with it, ranging from 0.0 (not satisfied at all) to 1.0 (completely satisfied).
Configuration
Option | Description | Type | Default | Required | Constraints |
---|---|---|---|---|---|
Model | The specific language model used for evaluation |
|
|
| |
Instructions | Evaluation criteria as provided to the LLM |
| "You are an LLM evaluator. Please score from 0.0 to 1.0 how likely the user is to be satisfied with this answer, from 0.0 being not satisfied at all to 1.0 being completely satisfied." |
|
|
MaxTokens | The maximum number of tokens allowed in the prompt |
|
|
|
|
PassingThreshold | The minimum score required to consider the evaluation passed |
|
|
|
Metrics
This evaluator does not report specific metrics beyond the binary outcome of the evaluation (passed or failed).
Last updated