Boolean LLM-as-Judge (Modelmetry)

This evaluator uses a large language model (LLM) to perform a boolean (true/false) evaluation of the input based on specified instructions and a maximum token count.

Configuration

OptionDescriptionTypeDefaultRequiredConstraints

Model

The namespace model used for evaluation

string

openai/gpt-4o-mini

true

Instructions

Directions given to the LLM to ensure responses adhere to the task's requirements

string

"You are an LLM evaluator. We need the guarantee that the output answers what is being asked on the input, please evaluate as False if it doesn't."

true

MinLength: 1

MaxTokens

The limit on the number of tokens the entire prompt can be, to prevent excessive input processing

int

8192

true

Min: 1

Metrics

This evaluator does not report specific metrics but evaluates whether the processed content meets the boolean conditions specified.

Last updated