@moonye/schemaguardian
v0.4.0
Published
Validate JSON-LD structured data on URLs, HTML files, or whole sites via sitemap. CI-friendly. Built for the AI search era. Now with a programmatic library API.
Downloads
205
Maintainers
Readme
schemaguardian
Validate JSON-LD structured data on any URL, HTML file, or whole site via sitemap. CI-friendly. Built for the AI search era.
# Validate one page
npx @moonye/schemaguardian check https://your-site.com
# Walk every URL in your sitemap.xml
npx @moonye/schemaguardian scan https://your-site.com
# Drop a ready-to-commit GitHub Actions workflow
npx @moonye/schemaguardian init --url https://your-site.comWhy this exists
Google scaled back FAQ and HowTo rich results in 2023 and cut them
further in the March 2026 core update. But structured data is now a
primary signal for citation in AI search engines (Perplexity, ChatGPT,
Gemini, Google AI Overviews). schemaguardian validates your JSON-LD
against schema.org rules plus the documented Google rejection
patterns and the 2026 reality of which schema types still produce
rich results.
It runs in CI. It exits non-zero on real problems. It tells you why.
Install
# one-off
npx @moonye/schemaguardian check https://example.com
# global
npm i -g @moonye/schemaguardian
schemaguardian check https://example.com
# project dev dependency
npm i -D @moonye/schemaguardianRequires Node 18+.
Commands
schemaguardian check <url|file> Validate a single URL or local HTML file.
schemaguardian scan <site-url> Walk a site's sitemap.xml and validate every page.
schemaguardian generate [type] Interactively generate schema markup.
schemaguardian init Generate .github/workflows/schemaguardian.yml.
schemaguardian help
schemaguardian versioncheck — single page
schemaguardian check https://faqjsonld.com/faq-schema-generator
schemaguardian check ./dist/index.html
schemaguardian check https://staging.example.com --ci
schemaguardian check https://example.com --json | jq '.blocks[].issues'Options: --ci (exit non-zero on errors) · --json (machine output) · --no-color.
scan — whole site via sitemap
Auto-discovers /sitemap-index.xml, /sitemap.xml, or /sitemap_index.xml.
Recursively follows sitemap indices to their child sitemaps. Validates
every URL in parallel.
schemaguardian scan https://faqjsonld.com
schemaguardian scan https://example.com --limit 25 --concurrency 8 --ci
schemaguardian scan https://example.com --sitemap https://example.com/news-sitemap.xml
schemaguardian scan https://example.com --json | jq '.summary'Options:
| Flag | Default | Meaning |
|---|---|---|
| --sitemap <url> | auto-discover | Use this sitemap URL instead of guessing. |
| --limit <n> | 100 | Max URLs to scan. |
| --concurrency <n> | 4 | Parallel requests (1-32). |
| --ci | off | Exit non-zero on any error or fetch failure. |
| --json | off | Machine-readable output. |
| --no-color | off | Disable ANSI color. |
Output includes per-page status, a per-type count of schemas found across the site, and a list of pages with no structured data at all.
init — generate a CI workflow
# default: writes .github/workflows/schemaguardian.yml using `scan`
schemaguardian init --url https://my-site.com
# use single-page check instead of scan
schemaguardian init --url https://my-site.com --command check
# write somewhere else
schemaguardian init --url https://my-site.com --target .gitlab-ci.yml --forceOptions: --url <url> (the site to validate) · --command check|scan
(default scan) · --target <path> (output location) · --force
(overwrite an existing file).
generate — interactively generate schema markup
# Interactive mode: select schema type and fill in fields
schemaguardian generate
# Direct mode: specify schema type directly
schemaguardian generate faq
# Preview without saving
schemaguardian generate product --preview
# Save to file
schemaguardian generate article --output schema.json
# Combine options
schemaguardian generate recipe --output my-recipe.json --previewOptions: --output <path> (save to file) · --preview (show without saving) ·
--type <type> (specify schema type directly instead of interactive selection).
Supports all 12 schema types: FAQPage, HowTo, Product, Recipe, Article, Review, LocalBusiness, Event, BreadcrumbList, Organization, Course, JobPosting, and Video.
CI integration
GitHub Actions
# .github/workflows/schema.yml
name: schemaguardian
on:
pull_request:
branches: [main]
push:
branches: [main]
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/setup-node@v4
with: { node-version: '20' }
- run: npx --yes @moonye/schemaguardian@latest scan https://your-site.com --ciOr just npx @moonye/schemaguardian init once and commit the file.
GitLab CI
schema-check:
image: node:20
script:
- npx --yes @moonye/schemaguardian@latest scan $CI_ENVIRONMENT_URL --cipackage.json
{
"scripts": {
"schema:check": "schemaguardian check https://faqjsonld.com --ci",
"schema:scan": "schemaguardian scan https://faqjsonld.com --ci"
}
}What it validates
For every <script type="application/ld+json"> block found on a page:
- Generic envelope — JSON parses,
@contextincludes schema.org,@typeis present. - Per-type required fields for the 12 schema types in the registry: FAQPage, HowTo, Product, Recipe, Article (and BlogPosting, NewsArticle), Review, LocalBusiness, Event, BreadcrumbList, Organization, Course, JobPosting.
- 2026-specific Google rejection patterns, including:
- FAQ rich result deprecation since 2023, further cut March 2026
- HowTo rich result removal since 2023-2024
- Product without offers OR aggregateRating (no rich result)
- JobPosting without
validThrough(Google for Jobs suppression) - JobPosting without
baseSalary(lower placement, AI filter skip) - Article without publisher logo (Top Stories ineligible)
- BreadcrumbList with non-sequential positions
- Many more, see
src/lib/validators.ts.
Other @type values pass envelope checks and emit an info-level note
that type-specific validation was skipped.
What it does NOT do (yet)
- Microdata or RDFa parsing (only JSON-LD)
- Validating that visible page content matches schema text content (Google requires this; only a human or rendered diff can verify it)
- Full schema.org SHACL validation
- Multi-domain monitoring (planned for paid Pro tier)
Severity levels
| Level | Meaning | --ci exit code |
|---|---|---|
| ERR | Required field missing or wrong type. Will not produce rich results. | 1 |
| WARN | Best practice violation or 2026 deprecation note. Schema may still validate. | 0 |
| INFO | Type unsupported or other note. | 0 |
scan --ci also exits 1 on any fetch failure (HTTP 4xx/5xx, timeout, DNS).
JSON output schemas
check --json
{
"target": "https://example.com",
"blocksFound": 2,
"blocks": [
{
"block": { "raw": "...", "parsed": { ... }, "position": 1 },
"schemaType": "FAQPage",
"issues": [{ "severity": "warning", "code": "faq-rich-result-deprecated", "message": "...", "path": "..." }]
}
]
}scan --json
{
"sitemap": "https://example.com/sitemap-index.xml",
"totalUrlsInSitemap": 14,
"scanned": 14,
"limited": false,
"pages": [
{ "url": "...", "status": "ok", "blocksFound": 2, "schemaTypes": ["FAQPage", "BreadcrumbList"], "errors": 0, "warnings": 1 }
],
"summary": {
"ok": 1, "withErrors": 0, "withWarnings": 13, "fetchErrors": 0,
"missingSchema": 0, "schemaTypeCounts": { "FAQPage": 13 },
"totalErrors": 0, "totalWarnings": 13
}
}Roadmap
- v0.1:
checkcommand for a single URL or file - v0.2:
scanfor whole sites via sitemap,initfor one-shot CI setup - v0.3:
generatefor interactive schema creation - v0.4+ (paid Pro, planned): multi-domain monitoring, auto-PR fix via GitHub API, team workflows, GitHub Action wrapper
The free CLI will always validate any site. Paid tiers add multi-domain operations and automation.
Contributing
Source lives at https://github.com/moonye6/faq under cli/. The 12
free schema generators on https://faqjsonld.com use the same validators.
Issues and PRs welcome.
License
MIT
