html-json-extractor
v0.2.0
Published
Fast, forgiving extraction of application/json and application/ld+json script blocks from HTML strings.
Maintainers
Readme
html-json-extractor
Fast, forgiving extraction of <script type="application/json"> and <script type="application/ld+json"> blocks from an HTML string.
- No DOM parser or runtime dependencies
- Returns one result per matching script block
- Malformed JSON blocks do not break the rest
Install
npm install html-json-extractorUsage
import {
extractJson,
extractJsonStrings,
getJsonLdItems,
getJsonLdRecords
} from 'html-json-extractor';
const html = `
<script type="application/json">{"featureFlags":{"search":true}}</script>
<script type="application/ld+json">{"@type":"WebSite","name":"Example"}</script>
<script type="application/json">{"broken":</script>
<script type="application/ld+json">[{"@type":"Person","name":"Ada"}]</script>
`;
const raw = extractJsonStrings(html);
// ['{"featureFlags":{"search":true}}', '{"@type":"WebSite","name":"Example"}', '{"broken":', '[{"@type":"Person","name":"Ada"}]']
const parsed = extractJson(html);
// [{ featureFlags: { search: true } }, { '@type': 'WebSite', name: 'Example' }, [{ '@type': 'Person', name: 'Ada' }]]
const items = parsed.flatMap(getJsonLdItems);
// [{ featureFlags: { search: true } }, { '@type': 'WebSite', name: 'Example' }, { '@type': 'Person', name: 'Ada' }]
const records = parsed.flatMap(getJsonLdRecords);
// [{ featureFlags: { search: true } }, { '@type': 'WebSite', name: 'Example' }, { '@type': 'Person', name: 'Ada' }]API
extractJsonStrings(html: string): string[]
Returns normalized application/json and application/ld+json script contents as strings.
extractJson<T = JsonValue>(html: string): T[]
Parses the extracted strings with JSON.parse. Entries that fail to parse are skipped.
getJsonLdItems(value: unknown): unknown[]
Normalizes a single parsed entry or parsed array entry into a flat list of items.
getJsonLdRecords(value: unknown): Record<string, unknown>[]
Returns record-shaped items and follows nested @graph content.
getJsonLdItems and getJsonLdRecords remain JSON-LD-oriented helpers for structured-data use cases on top of the generic extractor.
License
MIT
