Score LLM-as-Judge (Modelmetry)

This evaluator uses a language model to numerically score a payload based on how likely they are to be satisfied with it, ranging from 0.0 (not satisfied at all) to 1.0 (completely satisfied).

Configuration

Option Description Type Default Required Constraints

Option	Description	Type	Default	Required	Constraints
Model	The specific language model used for evaluation	`ai.NamespacedModel`	`openai/gpt-3.5-turbo`	`true`
Instructions	Evaluation criteria as provided to the LLM	`string`	"You are an LLM evaluator. Please score from 0.0 to 1.0 how likely the user is to be satisfied with this answer, from 0.0 being not satisfied at all to 1.0 being completely satisfied."	`true`	`MinLength: 1`
MaxTokens	The maximum number of tokens allowed in the prompt	`int`	`8192`	`true`	`Min: 1`
PassingThreshold	The minimum score required to consider the evaluation passed	`float64`	`0.5`	`true`

Model

The specific language model used for evaluation

ai.NamespacedModel

openai/gpt-3.5-turbo

true

Instructions

Evaluation criteria as provided to the LLM

string

"You are an LLM evaluator. Please score from 0.0 to 1.0 how likely the user is to be satisfied with this answer, from 0.0 being not satisfied at all to 1.0 being completely satisfied."

true

MinLength: 1

MaxTokens

The maximum number of tokens allowed in the prompt

int

8192

true

Min: 1

PassingThreshold

The minimum score required to consider the evaluation passed

float64

0.5

true

Metrics

This evaluator does not report specific metrics beyond the binary outcome of the evaluation (passed or failed).

PreviousLanguage Detector (Modelmetry)NextWord Counter (Modelmetry)

Last updated 6 days ago