@younndai/yon-benchmarks
v2.0.3
Published
Benchmark suite for YON™, the stream-first data format — structural reliability, cognitive economy, baseline-relative token cost, and streaming properties.
Maintainers
Readme
What is this?
Quantitative evidence for the YON™ format. Measures structural reliability, cognitive economy, streaming properties, fault isolation, and emitter faithfulness across 58 local suites and 12 LLM suites.
Install
npm install @younndai/yon-benchmarksQuick Start
# Local suites only (no API keys needed)
npm run bench:local
# Full run (local + LLM suites if keys available)
npm run bench
# LLM suites only
npm run bench:llm
# Single provider
npm run bench -- --provider openai
# Multiple providers
npm run bench -- --provider openai,google
# Filter by suite name
npm run bench -- --filter "generation"API Key Setup
Copy .env.example to .env.local in this package directory and fill in your provider keys:
# OpenAI — required for most LLM suites (default provider)
OPENAI_API_KEY=sk-proj-...
# Anthropic — used for multi-model comparison suites
ANTHROPIC_API_KEY=sk-ant-api03-...
# Google — used for multi-model comparison suites
GOOGLE_GENERATIVE_AI_API_KEY=AIza...Missing keys are not errors. Suites that need a missing provider will skip with a message explaining which key to add. Local suites never require API keys.
Which keys unlock which suites?
| Suite | OpenAI | Anthropic | Google | | ------------------------ | ------ | --------- | ------ | | Cognitive Load | ✅ | — | — | | Generation Quality | ✅ | — | — | | Shot Curve | ✅ | — | — | | Information Preservation | ✅ | — | — | | Format Comprehension | ✅ | ✅ | ✅ | | Format Traps | ✅ | ✅ | ✅ | | Density Comparison | ✅ | ✅ | ✅ | | Prompt Compression | ✅ | ✅ | ✅ | | Multi-Model Generation | ✅ | ✅ | ✅ | | Report Enrichment | ✅ | ✅ | ✅ |
Report Enrichment is a post-run analysis/synthesis step over the suite results, not one of the 9 counted LLM suites. Suites marked with a single ✅ default to OpenAI but will fall back to any available provider. Multi-model suites run across all available providers.
CLI Reference
npm run bench [flags]
Flags:
--local Run local suites only (no LLM)
--llm Run LLM suites only (skip local)
--provider <name> Restrict LLM to specific provider(s)
Values: openai, anthropic, google
Comma-separated for multiple: openai,google
--filter <term> Run only suites whose name contains <term>
--report Force report generation (default for full runs)Examples
# Quick local check during development
npm run bench:local
# Test with just OpenAI
npm run bench -- --provider openai
# Multi-model comparison (OpenAI + Google)
npm run bench -- --provider openai,google
# Run a specific LLM suite
npm run bench -- --llm --filter "cognitive"
# Full run with all providers
npm run benchWhat It Measures
Six Pillars
| Pillar | What it validates | Example suites | | ------------------------ | -------------------------------------------------------------------------------- | ------------------------------------------ | | Streaming | Line-oriented processing, first-record latency | Streaming Properties, Streaming Latency | | Lossless | Zero information loss through format conversions | Format Fidelity, Payload Fidelity, Hedging | | Cognitive Economy | Token efficiency at compressed densities (min/ultra), context window utilization | Token Efficiency, Context Utilization | | Cross-cutting | Structural reliability, error recovery, throughput | Error Recovery, Comparative Throughput | | Emitter Faithfulness | LLMs generate valid YON without fine-tuning | Generation Quality, Multi-Model Validity | | Sapir-Whorf | Whether notation shapes model cognition — comprehension, salience, priming | Notation Alignment, Profile Priming, Value Amplifier |
Suite Breakdown
- 58 local suites — deterministic, no API keys needed
- 12 LLM suites — require API keys, measure AI comprehension and generation
Report Output
Reports are written to reports/<timestamp>/:
reports/
2026-02-14-13-17/
summary.md # Human-readable summary
summary.json # Machine-readable results
enriched-summary.md # AI-polished version (if LLM available)
<suite-name>/
result.json # Per-suite detailed results
result.md # Per-suite human summaryDocumentation
HOW-TO-USE.md— task-oriented usage guide.TESTING.md— test strategy and coverage.CHANGELOG.md— release history.
The YON Project
YON is an open block format and toolchain.
- Specification —
@younndai/yon-spec— the normative YON v2.0 standard. - Toolchain —
YounndAI/yon— parser, generator, runner, converter, examples, benchmarks, domains, ai-relay. - Editor support —
yon-vscode(VS Code Marketplace) ·@younndai/yon-textmate(TextMate grammar).
Testing
npm testDeterministic vitest suites run without API keys. The benchmark suites above (npm run bench) are separate from the unit tests.
About YounndAI
YounndAI™ — You and AI, unified. (pronounced "yoon-dye")
A philosophy of intelligence: building with intention, so humans and machines think together without losing what makes either whole.
License & Attribution
Apache-2.0. © 2026 MARLINK TRADING SRL (YounndAI). See LICENSE and NOTICE.
"YON" and "YounndAI" are trademarks of MARLINK TRADING SRL — see TRADEMARK.md.
Created by Alexandru Mareș.
Website: yon.younndai.com
| | | | ------------- | ------------------------------------------------------- | | Spec | YON v2.0 | | Author | Alexandru Mareș | | Company | MARLINK TRADING SRL · YounndAI™ | | License | Apache 2.0 — © 2026 MARLINK TRADING SRL | | Trademark | YounndAI™ Trademark Guidelines |
