Score LLM-as-Judge (Modelmetry)

This evaluator uses a language model to numerically score a payload based on how likely they are to be satisfied with it, ranging from 0.0 (not satisfied at all) to 1.0 (completely satisfied).

Configuration

OptionDescriptionTypeDefaultRequiredConstraints

Model

The specific language model used for evaluation

ai.NamespacedModel

openai/gpt-3.5-turbo

true

Instructions

Evaluation criteria as provided to the LLM

string

"You are an LLM evaluator. Please score from 0.0 to 1.0 how likely the user is to be satisfied with this answer, from 0.0 being not satisfied at all to 1.0 being completely satisfied."

true

MinLength: 1

MaxTokens

The maximum number of tokens allowed in the prompt

int

8192

true

Min: 1

PassingThreshold

The minimum score required to consider the evaluation passed

float64

0.5

true

Metrics

This evaluator does not report specific metrics beyond the binary outcome of the evaluation (passed or failed).

Last updated