vague-lang

v3.3.0

Published

2 months ago

A declarative language for generating realistic test data

0High
0Medium
0Low

mcclowes

test-data mock-data fixture faker generator dsl openapi json-schema testing

Vague

A declarative language for describing and generating realistic data. Vague treats ambiguity as a first-class primitive — declare the shape of valid data and let the runtime figure out how to populate it.

Why Vague?

Vague is a data description model for APIs, not just a fake data tool.

Think of it as OpenAPI meets property-based testing: you describe what valid data looks like — its structure, constraints, distributions, and edge cases — and Vague handles generation. The same schema that generates test data can validate production data.

| What You Need | Traditional Tools | Vague | |---------------|-------------------|-------| | Intent — "80% of users are active" | Random selection | status: 0.8: "active" \| 0.2: "inactive" | | Constraints — "due date ≥ issued date" | Manual validation | assume due_date >= issued_date | | Relationships — "payment references an invoice" | Manual wiring | invoice: any of invoices where .status == "open" | | Edge cases — "test with Unicode exploits" | Manual creation | name: issuer.homoglyph("admin") | | Validation — "does this data match the schema?" | Separate tool | Same .vague file with --validate-data |

The question isn't "which fake data library?" — it's "how do we formally describe what valid data looks like for our APIs?"

For a detailed comparison, see COMPARISON.md.

Installation

npm install vague-lang

Or install globally for CLI usage:

npm install -g vague-lang

Quick Start

Create a .vague file:

schema Customer {
  name: string,
  status: 0.8: "active" | 0.2: "inactive"
}

schema Invoice {
  customer: any of customers,
  amount: decimal in 100..10000,
  status: "draft" | "sent" | "paid",

  assume amount > 0
}

dataset TestData {
  customers: 50 of Customer,
  invoices: 200 of Invoice
}

Generate JSON:

node dist/cli.js your-file.vague

Syntax Cheat Sheet

For a quick reference of all syntax, see SYNTAX.md.

Language Features

Superposition (Random Choice)

// Equal probability
status: "draft" | "sent" | "paid"

// Weighted probability
status: 0.6: "paid" | 0.3: "pending" | 0.1: "draft"

// Mixed: unweighted options share remaining probability
status: 0.85: "Active" | "Archived"         // "Archived" gets 15%
category: 0.6: "main" | "side" | "dessert"  // "side" and "dessert" get 20% each

Ranges

age: int in 18..65
price: decimal in 0.01..999.99
founded: date in 2000..2023

// Decimal with explicit precision
score: decimal(1) in 0..10       // 1 decimal place
amount: decimal(2) in 10..100    // 2 decimal places

Collections

line_items: 1..5 of LineItem    // 1-5 items
employees: 100 of Employee       // Exactly 100

Constraints

schema Invoice {
  issued_date: int in 1..28,
  due_date: int in 1..90,
  status: "draft" | "paid",
  amount: int in 0..10000,

  // Hard constraint
  assume due_date >= issued_date,

  // Conditional constraint
  assume if status == "paid" {
    amount == 0
  }
}

Logical operators: and, or, not

Cross-Record References

schema Invoice {
  // Reference any customer from the collection
  customer: any of customers,

  // Filtered reference
  active_customer: any of customers where .status == "active"
}

Parent References

schema LineItem {
  // Inherit currency from parent invoice
  currency: ^base_currency
}

schema Invoice {
  base_currency: "USD" | "GBP" | "EUR",
  line_items: 1..5 of LineItem
}

Computed Fields

schema Invoice {
  line_items: 1..10 of LineItem,

  total: sum(line_items.amount),
  item_count: count(line_items),
  avg_price: avg(line_items.unit_price),
  min_price: min(line_items.unit_price),
  max_price: max(line_items.unit_price),
  median_price: median(line_items.unit_price),
  first_item: first(line_items.unit_price),
  last_item: last(line_items.unit_price),
  price_product: product(line_items.unit_price)
}

Nullable Fields

nickname: string?           // Shorthand: sometimes null
notes: string | null        // Explicit

Ternary Expressions

status: amount_paid >= total ? "paid" : "pending"
grade: score >= 90 ? "A" : score >= 70 ? "B" : "C"

Match Expressions

// Pattern matching for multi-way branching
display: match status {
  "pending" => "Awaiting shipment",
  "shipped" => "On the way",
  "delivered" => "Complete"
}

// Returns null if no pattern matches

Conditional Fields

schema Account {
  type: "personal" | "business",
  companyNumber: string when type == "business"  // Only exists for business accounts
}

Dynamic Cardinality

schema Order {
  size: "small" | "large",
  items: (size == "large" ? 5..10 : 1..3) of LineItem
}

Side Effects (`then` blocks)

schema Payment {
  invoice: any of invoices,
  amount: int in 10..500
} then {
  invoice.amount_paid += amount,
  invoice.status = invoice.amount_paid >= invoice.total ? "paid" : "partial"
}

Unique Values

id: unique int in 1000..9999    // No duplicates in collection

Private Fields

schema Person {
  age: private int in 0..105,                    // Generated but excluded from output
  age_bracket: age < 18 ? "minor" : "adult"    // Computed from private field
}
// Output: { "age_bracket": "adult" } -- no "age" field

Ordered Sequences

pitch: [48, 52, 55, 60]   // Cycles in order: 48, 52, 55, 60, 48...
color: ["red", "green", "blue"]

Statistical Distributions

age: gaussian(35, 10, 18, 65)     // mean, stddev, min, max
income: lognormal(10.5, 0.5)      // mu, sigma
wait_time: exponential(0.5)       // rate
daily_orders: poisson(5)          // lambda
conversion: beta(2, 5)            // alpha, beta

Date Functions

created_at: now()                 // Full ISO 8601 timestamp
today_date: today()               // Date only
past: daysAgo(30)                 // 30 days ago
future: daysFromNow(90)           // 90 days from now
random: datetime(2020, 2024)      // Random datetime in range
between: dateBetween("2023-01-01", "2023-12-31")

Sequential Generation

id: sequence("INV-", 1001)        // "INV-1001", "INV-1002", ...
order_num: sequenceInt("orders")  // 1, 2, 3, ...
prev_value: previous("amount")    // Reference previous record

String Transformations

// Case transformations
upper: uppercase(name)             // "HELLO WORLD"
lower: lowercase(name)             // "hello world"
capitalized: capitalize(name)      // "Hello World"

// Case style conversions
slug: kebabCase(title)             // "hello-world"
snake: snakeCase(title)            // "hello_world"
camel: camelCase(title)            // "helloWorld"

// String manipulation
trimmed: trim("  hello  ")         // "hello"
combined: concat(first, " ", last) // "John Doe"
part: substring(name, 0, 5)        // First 5 characters
replaced: replace(name, "foo", "bar")
len: length(name)                  // String length

Negative Testing

// Generate data that violates constraints (for testing error handling)
dataset Invalid violating {
  bad_invoices: 100 of Invoice
}

Examples

See the examples/ directory:

data-description-model/ - Start here: Intent encoding, constraint encoding, edge-case bias
basics/ - Core language features (schemas, constraints, computed fields, cross-refs)
openapi-importing/ - Import schemas from OpenAPI specs
openapi-examples-generation/ - Populate OpenAPI specs with generated examples
codat/, stripe/, github/, etc. - Real-world API examples

CLI Usage

# Generate JSON to stdout
node dist/cli.js file.vague

# Save to file
node dist/cli.js file.vague -o output.json

# Pretty print
node dist/cli.js file.vague -p

# Reproducible output (seeded random)
node dist/cli.js file.vague --seed 123

# Watch mode - regenerate on file change
node dist/cli.js file.vague -o output.json -w

# CSV output
node dist/cli.js file.vague -f csv -o output.csv

# CSV with options
node dist/cli.js file.vague -f csv --csv-delimiter ";" -o output.csv

# Validate against OpenAPI spec
node dist/cli.js file.vague -v openapi.json -m '{"invoices": "Invoice"}'

# Validate only (exit code 1 on failure, useful for CI)
node dist/cli.js file.vague -v openapi.json -m '{"invoices": "Invoice"}' --validate-only

OpenAPI Example Population

Generate realistic examples and embed them directly in your OpenAPI spec:

# Populate OpenAPI spec with inline examples
node dist/cli.js data.vague --oas-output api-with-examples.json --oas-source api.json

# Multiple examples per schema
node dist/cli.js data.vague --oas-output api.json --oas-source api.json --oas-example-count 3

# External file references instead of inline
node dist/cli.js data.vague --oas-output api.json --oas-source api.json --oas-external

Auto-detection maps collection names to schema names (e.g., invoices → Invoice).

CLI Options

| Option | Description | |--------|-------------| | -o, --output <file> | Write output to file | | -f, --format <fmt> | Output format: json (default), csv | | -p, --pretty | Pretty-print JSON | | -s, --seed <number> | Seed for reproducible generation | | -w, --watch | Watch input file and regenerate on changes | | -v, --validate <spec> | Validate against OpenAPI spec | | -m, --mapping <json> | Schema mapping {"collection": "SchemaName"} | | --validate-only | Only validate, don't output data | | --csv-delimiter <char> | CSV field delimiter (default: ,) | | --csv-no-header | Omit CSV header row | | --csv-arrays <mode> | Array handling: json, first, count | | --csv-nested <mode> | Nested objects: flatten, json | | --infer <file> | Infer Vague schema from JSON or CSV data | | --collection-name <name> | Collection name for CSV inference | | --infer-delimiter <char> | CSV delimiter for inference (default: ,) | | --dataset-name <name> | Dataset name for inference | | --oas-source <spec> | Source OpenAPI spec to populate with examples | | --oas-output <file> | Output path for populated OpenAPI spec | | --oas-example-count <n> | Number of examples per schema (default: 1) | | --oas-external | Use external file references instead of inline | | --plugins <dir> | Load plugins from directory (can be used multiple times) | | --no-auto-plugins | Disable automatic plugin discovery | | --verbose | Show verbose output (e.g., discovered plugins) | | -h, --help | Show help |

Development

npm run build     # Compile TypeScript
npm test          # Run tests
npm run dev       # Watch mode

Project Structure

src/
├── lexer/       # Tokenizer
├── parser/      # Recursive descent parser
├── ast/         # AST node definitions
├── interpreter/ # JSON generator
├── validator/   # Schema validation (Ajv)
├── openapi/     # OpenAPI import support
├── infer/       # Schema inference from data
├── csv/         # CSV input/output formatting
├── config/      # Configuration file loading
├── logging/     # Debug logging utilities
├── plugins/     # Built-in plugins (faker, issuer, date, regex)
├── index.ts     # Library exports
└── cli.ts       # CLI entry point

Roadmap

See TODO.md for planned features:

Probabilistic constraints (assume X with probability 0.7)
Conditional schema variants
Constraint solving (SMT integration)

Working with Claude

This project includes Claude Code skills that help Claude assist you more effectively when working with Vague files and OpenAPI specifications.

Available Skills

| Skill | Description | |-------|-------------| | vague | Writing Vague (.vague) files - syntax, constraints, cross-references | | openapi | Working with OpenAPI specs - validation, schemas, best practices |

Installation via OpenSkills

Install the skills using OpenSkills:

npm i -g openskills
openskills install mcclowes/vague

This installs the skills to your .claude/skills/ directory, making them available when you use Claude Code in this project.

Manual Installation

Alternatively, copy the skills directly:

git clone https://github.com/mcclowes/vague.git
cp -r vague/.claude/skills/* ~/.claude/skills/

Contributing

See CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE