@eva-llm/eva-parser

v1.0.2

Published

17 hours ago

A converter for Promptfoo test formats and into the EVA-LLM ecosystem

0High
0Medium
0Low

schipiga

eva-parser

A converter for Promptfoo test formats into the EVA-LLM ecosystem eva-run tasks.

NOTE! It supports restricted Promptfoo format and extends it with own features (see examples below).

Quick Start

npm i @eva-llm/eva-parser

import { parsePromptfoo } from '@eva-llm/eva-parser';

const evaTests = parsePromptfoo(promptfooYamlContent);

Supported Promptfoo Items

Providers

providers:
  - openai:gpt-5-mini
  - openai:gpt-4.1-mini

providers:
  - id: openai:gpt-5.2
    config:
      temperature: 0

Prompts

prompts:
  - Hello, how are you?
  - What is the capital of France?

prompts:
  - What is the capital of {{country}}

Repeat

test:
  times: 50 # optional (default 1), eva-run specific, used for AI Metrology statistics.

Variables

test:
  - vars:
      country: France

Asserts

NOTE! All LLM asserts support natively Dark Teaming to measure Epistemic Honesty via Symmetry Deviation, and extend Promptfoo format with field must_fail

b-eval (binary g-eval - eva-llm specific)

test:
  - assert:
      - type: b-eval
        value: answer is coherent to question # can be array as well
        threshold: 0.5 # optional (default is 0.5 in eva-run)
        provider: # optional (default is test provider)
          - id: openai:gpt-4.1-mini
            config:
              temperature: 0 # optional (default is 0 in eva-run as factual standard for better judging)
        times: 5 # optional (default 1, eva-run specific) - repeat assert N times
        must_fail: true # optional (default false, eva-run specific) - Dark Teaming field
        answer_only: true # optional (default false, eva-run specific) - analyze only LLM answer without prompt involvement

g-eval

test:
  - assert:
      - type: g-eval
        value: answer is coherent to question # can be array as well
        threshold: 0.5 # optional (default is 0.5 in eva-run)
        provider: # optional (default is test provider)
          - id: openai:gpt-4.1-mini
            config:
              temperature: 0 # optional (default is 0 in eva-run as factual standard for better judging)
        times: 5 # optional (default 1, eva-run specific) - repeat assert N times
        must_fail: true # optional (default false, eva-run specific) - Dark Teaming field
        answer_only: true # optional (default false, eva-run specific) - analyze only LLM answer without prompt involvement

llm-rubric

test:
  - assert:
      - type: llm-rubric
        value: answer is polite # can be array as well
        threshold: 0.5 # optional (default is 0.5 in eva-run)
        provider: # optional (default is test provider)
          - id: openai:gpt-4.1-mini
            config:
              temperature: 0 # optional (default is 0 in eva-run as factual standard for better judging)
        times: 5 # optional (default 1, eva-run specific) - repeat assert N times
        must_fail: true # optional (default false, eva-run specific) - Dark Teaming field

equals

test:
  - assert:
    - type: equals
      value: Paris
      case_sensitive: false # optional (default true, eva-run specific)
      times: 5 # optional (default 1, eva-run specific) - repeat assert N times

not-equals

test:
  - assert:
    - type: not-equals
      value: Chicago
      case_sensitive: false # optional (default true, eva-run specific)
      times: 5 # optional (default 1, eva-run specific) - repeat assert N times

contains

test:
  - assert:
    - type: contains
      value: Paris
      case_sensitive: false # optional (default true, eva-run specific)
      times: 5 # optional (default 1, eva-run specific) - repeat assert N times

not-contains

test:
  - assert:
    - type: not-contains
      value: Chicago
      case_sensitive: false # optional (default true, eva-run specific)
      times: 5 # optional (default 1, eva-run specific) - repeat assert N times

regex

test:
  - assert:
    - type: regex
      value: /paris/i
      times: 5 # optional (default 1, eva-run specific) - repeat assert N times

License

MIT