@alxmss/toon

v1.0.3

Published

11 days ago

Token-Oriented Object Notation — high-density serialization for LLM context windows

0High
0Medium
0Low

TOON — Token-Oriented Object Notation

TOON is a serialization format for LLM context windows. It replaces JSON with a compact, indentation-aware notation that cuts 30–57% of tokens from structured data — no information lost, no fine-tuning required.

The Problem

You're building an LLM-powered feature. You serialize some data and stuff it into a prompt:

const context = JSON.stringify(records)  // 18,471 tokens

That's expensive. JSON was designed for machines to parse, not for transformer attention. Every "key": is repeated on every row. Every {, }, [, ] is a token that carries no information density. At scale — 200 log events, 50 API responses, a deep config object — this bloats your context window and your bill.

TOON fixes this at the serialization layer:

import { stringify } from '@alxmss/toon'
const context = stringify(records)  // 9,985 tokens — 45.9% less

Same data. Same LLM. No prompt engineering. Just fewer tokens.

How It Works

TOON uses three core compression mechanisms:

1. HRV (Header-Row-Value) — for arrays of uniform objects, keys are declared once in a header row instead of repeated on every item:

// JSON: 627 tokens
[{"method":"GET","path":"/users","auth":true},{"method":"POST","path":"/users","auth":true},...]

// TOON: 334 tokens (-46.7%)
endpoints[len:3]:
  # method | path    | auth
  > GET    | /users  | true
  > POST   | /users  | true
  > GET    | /health | false

2. Dot-path compression — single-child object chains collapse to a flat path:

// JSON: 88 tokens
{"db":{"primary":{"host":{"address":"10.0.0.1"}}}}

// TOON: 38 tokens (-56.8%)
db.primary.host.address: 10.0.0.1

3. Quote elision — strings that can't be ambiguous (IPs, semver, URL paths, bare words) are never quoted. 3.2.1, 10.0.0.1, /users/{id} stay as-is.

TOON + RTK: Two Layers, Zero Overlap

If you use RTK (the token-saving command wrapper for Claude Code), these tools work at different layers — one does not replace the other.

| Layer | Tool | What it compresses | When | |-------|------|-------------------|------| | Development | RTK | Shell command output entering Claude Code's context | rtk vitest, rtk git diff, rtk tsc | | Runtime | TOON | Structured data sent to your LLM in production | stringify(data) in your app |

RTK keeps your dev loop lean. TOON keeps your users' API calls cheap. Stack both for a two-layer token budget.

Development                        Runtime
───────────────────────────────    ──────────────────────────────────────
rtk vitest      → Claude Code      stringify(payload) → your LLM API call
rtk git diff    → Claude Code      parse(llmOutput)   ← LLM response
rtk tsc         → Claude Code

Installation

npm i @alxmss/toon

One-time Claude Code setup (teaches Claude to use TOON automatically in this project):

npx @alxmss/toon init

This writes a TOON conventions block into your project's CLAUDE.md. From that point on, Claude Code uses stringify() and TOON_SYSTEM_PROMPT whenever it writes a feature that sends data to an LLM.

Quick Start

Serialize data for an LLM

import { stringify, TOON_SYSTEM_PROMPT } from '@alxmss/toon'

const response = await anthropic.messages.create({
  system: myInstructions + '\n\n' + TOON_SYSTEM_PROMPT,
  messages: [{
    role: 'user',
    content: stringify(data),   // not JSON.stringify
  }],
})

TOON_SYSTEM_PROMPT (~280 tokens) teaches the model to read and emit TOON. Use TOON_SYSTEM_PROMPT_COMPACT (~90 tokens) for follow-up calls once the model is context-trained.

Parse LLM output back to an object

import { parse } from '@alxmss/toon'

const result = parse(llmOutput)
// → plain JS object, round-trip exact

Validate before parsing

import { lint } from '@alxmss/toon'

const issues = lint(toonString)
// [{ severity: 'error', line: 4, column: 1, message: '...' }]
// Empty array = structurally valid

Measure savings on your own data

npx @alxmss/toon check data.json

  TOON Density Report — data.json
  ────────────────────────────────────────────────────────────
  Metric                              JSON      TOON     Delta
  ────────────────────────────────────────────────────────────
  Bytes                              56792     27334    -51.9%
  Tokens (cl100k_base)               18471      9985    -45.9%
  ────────────────────────────────────────────────────────────
  Fits in window (128k)                10×       20×      ×2.00
  ────────────────────────────────────────────────────────────

  Density Score  45.9% reduction
  [█████████░░░░░░░░░░░] 45.9%

  Context Expansion Factor: 2.00× — TOON fits 2.00× more data in the same window

When to Use TOON

TOON's compression is structural — it eliminates repeated key names and punctuation, not values. The gain scales with schema uniformity.

Maximum benefit (40–57%)

| Use Case | Typical Saving | Why | |----------|---------------|-----| | Observability pipelines (CloudWatch, Datadog, Loki) | ~46% | Log events are the most uniform data in existence → HRV | | GitHub / REST API responses (repos, issues, PRs) | ~46% | Repeated field names across paginated records → HRV | | Infrastructure config (k8s, Terraform) | ~57% | Long single-child chains → dot-path compression | | RAG pipelines with structured records | 35–47% | DB rows, product catalogs, CRM contacts → HRV | | Agentic tool schemas / endpoint inventories | 35–45% | Repeated schema fields across tool definitions → HRV |

Good benefit (30–40%)

| Use Case | Typical Saving | Why | |----------|---------------|-----| | CI bots and PR review agents (mixed payloads) | ~35% | Flat KV metadata + tabular arrays → dot-path + HRV | | LLM data transformation (validate / enrich / classify) | 30–45% | Uniform input records → savings on both request and response |

Diminishing returns

Prose documents (articles, emails, legal text) — no structural repetition to eliminate
Tiny payloads (< 50 tokens) — [len:N] anchor overhead isn't amortized
Highly irregular arrays — falls back to block format, still ~20–30% savings

Not a fit

Human-edited config files — YAML/TOML are more ergonomic to write
Binary or streaming data
Top-level arrays — wrap in { items: [...] } first

Why the Savings Compound

Lower API cost — input tokens are priced per token; 46% fewer tokens = 46% less on that payload
More data per window — a 200k Claude window holds 1.85× more complete records in TOON than JSON
Faster time-to-first-token — smaller prompts start streaming sooner
Fewer RAG round-trips — fitting more records per call reduces retrieval calls per session

For a pipeline processing 1M CloudWatch events/day, the measured 45.9% reduction translates to ~42M tokens saved — before latency improvements.

Measured Results

Synthetic fixtures (validated on every `npm test`)

| Shape | JSON Tokens | TOON Tokens | Savings | |-------|-------------|-------------|---------| | 12-row uniform table | 627 | 334 | 46.7% | | 6-row sparse table (1 optional col) | 294 | 196 | 33.3% | | Mixed document (KV + HRV + dot-path) | 415 | 271 | 34.7% | | Deeply nested config (3 levels) | 88 | 38 | 56.8% | | Non-uniform block array | 111 | 77 | 30.6% |

Real-world stress test (`scripts/stress-test.ts`)

| Fixture | JSON Tokens | TOON Tokens | Reduction | 200k window CEF | |---------|-------------|-------------|-----------|-----------------| | CloudWatch logs — 200 events × 9 fields | 18,471 | 9,985 | 45.9% | 2.00× | | GitHub repos — 50 repos × 12 fields | 5,457 | 2,919 | 46.5% | 1.89× |

CEF = Context Expansion Factor: how many more complete documents fit in the same window.
Tokenizer: cl100k_base (same encoder as GPT-4). Run node_modules/.bin/tsx scripts/stress-test.ts to reproduce.

LLM handshake verification

TOON_SYSTEM_PROMPT was verified against claude-sonnet-4-6 on a 12-row HRV CloudWatch log with a three-part reasoning task. Score: 5/5 — correct on highest latency value, timestamp, most-errored user, error count, and root-cause pattern. The TOON payload used ~180 tokens vs ~420 for equivalent JSON — 57% less on the reasoning task itself.

ANTHROPIC_API_KEY=sk-... node_modules/.bin/tsx scripts/handshake-test.ts

API Reference

`stringify(value, options?)`

stringify(value: Record<string, unknown>, options?: {
  indent?: 2 | 4       // default: 2
  dotPath?: boolean    // default: true  — compress single-child chains
  sizeHints?: boolean  // default: true  — emit [len:N] anchors
  hrvThreshold?: number // default: 0.5  — max extra/base key ratio for sparse HRV
}): string

Throws TypeError if top-level value is not a plain object.
Throws ToonSerializationError on circular references.

`parse(input, options?)`

parse(input: string, options?: {
  validateHints?: boolean  // default: true — throw on [len:N] length mismatch
}): Record<string, unknown>

Throws ToonParseError (with .line, .column, .suggestion) on violations.

`lint(input)`

lint(input: string): Issue[]
// Issue: { severity: 'error' | 'warning', line: number, column: number, message: string }

Never throws. Returns all structural issues in one pass — use before parse() in CI or editor integrations.

`TOON_SYSTEM_PROMPT` / `TOON_SYSTEM_PROMPT_COMPACT`

Pre-written system prompt snippets. Full (~280 tokens) for first-time integration; compact (~90 tokens) for follow-up calls.

CLI

# One-time project setup
npx @alxmss/toon init           # writes TOON block to ./CLAUDE.md
npx @alxmss/toon init --global  # writes to ~/.claude/CLAUDE.md

# Measure token savings on any JSON file
toon check data.json
toon check data.json --window 32000   # custom context window
toon check data.json --toon           # also print the TOON output

Format Reference

Sigils

| Sigil | Role | Example | |-------|------|---------| | key: value | Key-value pair | name: Alice | | a.b.c: value | Dot-path (single-child chain) | db.host: localhost | | [len:N] | Size hint / structural anchor | users[len:3]: | | # col1 \| col2 | HRV header | # id \| name \| role? | | > v1 \| v2 | HRV data row | > 1 \| Alice \| admin | | - key: value | Block array item | - method: GET | | ~ | Null / absent | latency: ~ | | col? | Optional HRV column | # id \| note? | | // | Line comment | // deprecated |

Type inference (first match wins)

~ or null      → null
true / false   → boolean
bare integer   → int
bare float     → float
"…"            → string
[…]            → inline array
{…}            → inline object
anything else  → bare string   (IPs, semver, URL paths never need quotes)

Array tier selection

All scalars                        → key[len:N]: a, b, c
All objects, no optional keys      → HRV-uniform  (# / > rows)
Objects with ≤50% optional keys    → HRV-sparse   (col? columns, ~ for absent)
Otherwise                          → Block array  (- items)

Spec

Formal EBNF grammar: spec/GRAMMAR.md

Known limitations:

Block string | is parse-only (stringify uses quoted strings instead)
NaN / Infinity serialize as quoted strings
Top-level arrays are not supported — wrap in { items: [...] }