@eva-llm/eva-parser
v1.0.2
Published
A converter for Promptfoo test formats and into the EVA-LLM ecosystem
Readme
eva-parser
A converter for Promptfoo test formats into the EVA-LLM ecosystem eva-run tasks.
NOTE! It supports restricted Promptfoo format and extends it with own features (see examples below).
Quick Start
npm i @eva-llm/eva-parserimport { parsePromptfoo } from '@eva-llm/eva-parser';
const evaTests = parsePromptfoo(promptfooYamlContent);Supported Promptfoo Items
Providers
providers:
- openai:gpt-5-mini
- openai:gpt-4.1-miniproviders:
- id: openai:gpt-5.2
config:
temperature: 0Prompts
prompts:
- Hello, how are you?
- What is the capital of France?prompts:
- What is the capital of {{country}}Repeat
test:
times: 50 # optional (default 1), eva-run specific, used for AI Metrology statistics.Variables
test:
- vars:
country: FranceAsserts
NOTE! All LLM asserts support natively Dark Teaming to measure Epistemic Honesty via Symmetry Deviation, and extend Promptfoo format with field must_fail
b-eval (binary g-eval - eva-llm specific)
test:
- assert:
- type: b-eval
value: answer is coherent to question # can be array as well
threshold: 0.5 # optional (default is 0.5 in eva-run)
provider: # optional (default is test provider)
- id: openai:gpt-4.1-mini
config:
temperature: 0 # optional (default is 0 in eva-run as factual standard for better judging)
times: 5 # optional (default 1, eva-run specific) - repeat assert N times
must_fail: true # optional (default false, eva-run specific) - Dark Teaming field
answer_only: true # optional (default false, eva-run specific) - analyze only LLM answer without prompt involvementg-eval
test:
- assert:
- type: g-eval
value: answer is coherent to question # can be array as well
threshold: 0.5 # optional (default is 0.5 in eva-run)
provider: # optional (default is test provider)
- id: openai:gpt-4.1-mini
config:
temperature: 0 # optional (default is 0 in eva-run as factual standard for better judging)
times: 5 # optional (default 1, eva-run specific) - repeat assert N times
must_fail: true # optional (default false, eva-run specific) - Dark Teaming field
answer_only: true # optional (default false, eva-run specific) - analyze only LLM answer without prompt involvementllm-rubric
test:
- assert:
- type: llm-rubric
value: answer is polite # can be array as well
threshold: 0.5 # optional (default is 0.5 in eva-run)
provider: # optional (default is test provider)
- id: openai:gpt-4.1-mini
config:
temperature: 0 # optional (default is 0 in eva-run as factual standard for better judging)
times: 5 # optional (default 1, eva-run specific) - repeat assert N times
must_fail: true # optional (default false, eva-run specific) - Dark Teaming fieldequals
test:
- assert:
- type: equals
value: Paris
case_sensitive: false # optional (default true, eva-run specific)
times: 5 # optional (default 1, eva-run specific) - repeat assert N timesnot-equals
test:
- assert:
- type: not-equals
value: Chicago
case_sensitive: false # optional (default true, eva-run specific)
times: 5 # optional (default 1, eva-run specific) - repeat assert N timescontains
test:
- assert:
- type: contains
value: Paris
case_sensitive: false # optional (default true, eva-run specific)
times: 5 # optional (default 1, eva-run specific) - repeat assert N timesnot-contains
test:
- assert:
- type: not-contains
value: Chicago
case_sensitive: false # optional (default true, eva-run specific)
times: 5 # optional (default 1, eva-run specific) - repeat assert N timesregex
test:
- assert:
- type: regex
value: /paris/i
times: 5 # optional (default 1, eva-run specific) - repeat assert N timesLicense
MIT
