Boolean LLM-as-Judge (Modelmetry)
This evaluator uses a large language model (LLM) to perform a boolean (true/false) evaluation of the input based on specified instructions and a maximum token count.
Configuration
Model
The namespace model used for evaluation
string
openai/gpt-4o-mini
true
Instructions
Directions given to the LLM to ensure responses adhere to the task's requirements
string
"You are an LLM evaluator. We need the guarantee that the output answers what is being asked on the input, please evaluate as False if it doesn't."
true
MinLength: 1
MaxTokens
The limit on the number of tokens the entire prompt can be, to prevent excessive input processing
int
8192
true
Min: 1
Metrics
This evaluator does not report specific metrics but evaluates whether the processed content meets the boolean conditions specified.
Last updated