@moonye/schemaguardian

v0.4.0

Published

a month ago

Validate JSON-LD structured data on URLs, HTML files, or whole sites via sitemap. CI-friendly. Built for the AI search era. Now with a programmatic library API.

Downloads

205

0High
0Medium
0Low

moonye

schema json-ld structured-data seo validation cli ci schema.org geo aeo ai-search

schemaguardian

Validate JSON-LD structured data on any URL, HTML file, or whole site via sitemap. CI-friendly. Built for the AI search era.

# Validate one page
npx @moonye/schemaguardian check https://your-site.com

# Walk every URL in your sitemap.xml
npx @moonye/schemaguardian scan https://your-site.com

# Drop a ready-to-commit GitHub Actions workflow
npx @moonye/schemaguardian init --url https://your-site.com

Why this exists

Google scaled back FAQ and HowTo rich results in 2023 and cut them further in the March 2026 core update. But structured data is now a primary signal for citation in AI search engines (Perplexity, ChatGPT, Gemini, Google AI Overviews). schemaguardian validates your JSON-LD against schema.org rules plus the documented Google rejection patterns and the 2026 reality of which schema types still produce rich results.

It runs in CI. It exits non-zero on real problems. It tells you why.

Install

# one-off
npx @moonye/schemaguardian check https://example.com

# global
npm i -g @moonye/schemaguardian
schemaguardian check https://example.com

# project dev dependency
npm i -D @moonye/schemaguardian

Requires Node 18+.

Commands

schemaguardian check <url|file>      Validate a single URL or local HTML file.
schemaguardian scan  <site-url>      Walk a site's sitemap.xml and validate every page.
schemaguardian generate [type]       Interactively generate schema markup.
schemaguardian init                  Generate .github/workflows/schemaguardian.yml.
schemaguardian help
schemaguardian version

`check` — single page

schemaguardian check https://faqjsonld.com/faq-schema-generator
schemaguardian check ./dist/index.html
schemaguardian check https://staging.example.com --ci
schemaguardian check https://example.com --json | jq '.blocks[].issues'

Options: --ci (exit non-zero on errors) · --json (machine output) · --no-color.

`scan` — whole site via sitemap

Auto-discovers /sitemap-index.xml, /sitemap.xml, or /sitemap_index.xml. Recursively follows sitemap indices to their child sitemaps. Validates every URL in parallel.

schemaguardian scan https://faqjsonld.com
schemaguardian scan https://example.com --limit 25 --concurrency 8 --ci
schemaguardian scan https://example.com --sitemap https://example.com/news-sitemap.xml
schemaguardian scan https://example.com --json | jq '.summary'

Options:

| Flag | Default | Meaning | |---|---|---| | --sitemap <url> | auto-discover | Use this sitemap URL instead of guessing. | | --limit <n> | 100 | Max URLs to scan. | | --concurrency <n> | 4 | Parallel requests (1-32). | | --ci | off | Exit non-zero on any error or fetch failure. | | --json | off | Machine-readable output. | | --no-color | off | Disable ANSI color. |

Output includes per-page status, a per-type count of schemas found across the site, and a list of pages with no structured data at all.

`init` — generate a CI workflow

# default: writes .github/workflows/schemaguardian.yml using `scan`
schemaguardian init --url https://my-site.com

# use single-page check instead of scan
schemaguardian init --url https://my-site.com --command check

# write somewhere else
schemaguardian init --url https://my-site.com --target .gitlab-ci.yml --force

Options: --url <url> (the site to validate) · --command check|scan (default scan) · --target <path> (output location) · --force (overwrite an existing file).

`generate` — interactively generate schema markup

# Interactive mode: select schema type and fill in fields
schemaguardian generate

# Direct mode: specify schema type directly
schemaguardian generate faq

# Preview without saving
schemaguardian generate product --preview

# Save to file
schemaguardian generate article --output schema.json

# Combine options
schemaguardian generate recipe --output my-recipe.json --preview

Options: --output <path> (save to file) · --preview (show without saving) · --type <type> (specify schema type directly instead of interactive selection).

Supports all 12 schema types: FAQPage, HowTo, Product, Recipe, Article, Review, LocalBusiness, Event, BreadcrumbList, Organization, Course, JobPosting, and Video.

CI integration

GitHub Actions

# .github/workflows/schema.yml
name: schemaguardian
on:
  pull_request:
    branches: [main]
  push:
    branches: [main]
jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/setup-node@v4
        with: { node-version: '20' }
      - run: npx --yes @moonye/schemaguardian@latest scan https://your-site.com --ci

Or just npx @moonye/schemaguardian init once and commit the file.

GitLab CI

schema-check:
  image: node:20
  script:
    - npx --yes @moonye/schemaguardian@latest scan $CI_ENVIRONMENT_URL --ci

package.json

{
  "scripts": {
    "schema:check": "schemaguardian check https://faqjsonld.com --ci",
    "schema:scan":  "schemaguardian scan  https://faqjsonld.com --ci"
  }
}

What it validates

For every <script type="application/ld+json"> block found on a page:

Generic envelope — JSON parses, @context includes schema.org, @type is present.
Per-type required fields for the 12 schema types in the registry: FAQPage, HowTo, Product, Recipe, Article (and BlogPosting, NewsArticle), Review, LocalBusiness, Event, BreadcrumbList, Organization, Course, JobPosting.
2026-specific Google rejection patterns, including:
- FAQ rich result deprecation since 2023, further cut March 2026
- HowTo rich result removal since 2023-2024
- Product without offers OR aggregateRating (no rich result)
- JobPosting without validThrough (Google for Jobs suppression)
- JobPosting without baseSalary (lower placement, AI filter skip)
- Article without publisher logo (Top Stories ineligible)
- BreadcrumbList with non-sequential positions
- Many more, see src/lib/validators.ts.

Other @type values pass envelope checks and emit an info-level note that type-specific validation was skipped.

What it does NOT do (yet)

Microdata or RDFa parsing (only JSON-LD)
Validating that visible page content matches schema text content (Google requires this; only a human or rendered diff can verify it)
Full schema.org SHACL validation
Multi-domain monitoring (planned for paid Pro tier)

Severity levels

| Level | Meaning | --ci exit code | |---|---|---| | ERR | Required field missing or wrong type. Will not produce rich results. | 1 | | WARN | Best practice violation or 2026 deprecation note. Schema may still validate. | 0 | | INFO | Type unsupported or other note. | 0 |

scan --ci also exits 1 on any fetch failure (HTTP 4xx/5xx, timeout, DNS).

JSON output schemas

`check --json`

{
  "target": "https://example.com",
  "blocksFound": 2,
  "blocks": [
    {
      "block": { "raw": "...", "parsed": { ... }, "position": 1 },
      "schemaType": "FAQPage",
      "issues": [{ "severity": "warning", "code": "faq-rich-result-deprecated", "message": "...", "path": "..." }]
    }
  ]
}

`scan --json`

{
  "sitemap": "https://example.com/sitemap-index.xml",
  "totalUrlsInSitemap": 14,
  "scanned": 14,
  "limited": false,
  "pages": [
    { "url": "...", "status": "ok", "blocksFound": 2, "schemaTypes": ["FAQPage", "BreadcrumbList"], "errors": 0, "warnings": 1 }
  ],
  "summary": {
    "ok": 1, "withErrors": 0, "withWarnings": 13, "fetchErrors": 0,
    "missingSchema": 0, "schemaTypeCounts": { "FAQPage": 13 },
    "totalErrors": 0, "totalWarnings": 13
  }
}

Roadmap

v0.1: check command for a single URL or file
v0.2: scan for whole sites via sitemap, init for one-shot CI setup
v0.3: generate for interactive schema creation
v0.4+ (paid Pro, planned): multi-domain monitoring, auto-PR fix via GitHub API, team workflows, GitHub Action wrapper

The free CLI will always validate any site. Paid tiers add multi-domain operations and automation.

Contributing

Source lives at https://github.com/moonye6/faq under cli/. The 12 free schema generators on https://faqjsonld.com use the same validators. Issues and PRs welcome.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme