ogpeek
v0.5.0
Published
Peek into any page's Open Graph tags — parser, fetcher, and validator.
Maintainers
Readme
ogpeek
peek into any page's Open Graph tags — and the favicon / JSON-LD signals that travel with them
Korean: README.ko.md
A small engine that handles parsing, fetching, and validating OpenGraph
tags in a single package. Open Graph stays the primary signal; alongside
it the engine also surfaces the auxiliary head metadata most pages ship
with — favicons, apple-touch-icons, mask-icons, msapplication tiles,
application-name / theme-color, and JSON-LD blocks. Single external
dependency: htmlparser2. Runs on Node 20+, Bun, Workers, and the browser.
Install
npm install ogpeek
# or
pnpm add ogpeek
# or
yarn add ogpeekTwo entry points
| entry | purpose | runtime | dependencies |
| --- | --- | --- | --- |
| ogpeek | parse, validate, types | Node · Bun · Workers · browser | htmlparser2 |
| ogpeek/fetch | fetch a remote URL (timeout / size cap / redirect tracing) | anywhere globalThis.fetch exists | none (no Node built-ins) |
The root entry is pure logic, so as long as you do not import
ogpeek/fetch no runtime dependency comes along for the ride. The fetch
subpath also avoids Node built-ins, so it loads as-is on edge and browser
runtimes — SSRF policy decisions have been pushed out of the engine
specifically to make this possible.
Quick start
import { parse } from "ogpeek";
import { fetchHtml } from "ogpeek/fetch";
const { html, finalUrl } = await fetchHtml("https://ogp.me");
const result = parse(html, { url: finalUrl });
console.log(result.ogp.title);
console.log(result.ogp.images);
for (const w of result.warnings) {
console.log(`[${w.severity}] ${w.code}: ${w.message}`);
}API
parse(html: string, options?: ParseOptions): OgDebugResult
html— the raw HTML string.options.url— the base used to resolve relative URLs to absolute URLs. If omitted, theog:urldeclared in the document is used as the base.options.jsonldScope—"head" | "document". Where to harvest<script type="application/ld+json">blocks from. Default is"head"to keep the scan cost predictable; pass"document"to also walk<body>(JSON-LD is often placed there).
The return shape:
type OgDebugResult = {
ogp: OpenGraph; // normalized OG tree
typed: TypedObject | null; // article / book / profile / music.* / video.*
twitter: Record<string, string>; // twitter:* passthrough
raw: Array<{ property: string; content: string }>; // declaration order
warnings: Warning[];
// Auxiliary metadata travelling alongside OG:
icons: Icon[]; // <link rel="icon" | "apple-touch-icon" | ...>
jsonld: JsonLd[]; // <script type="application/ld+json"> blocks
meta: {
title: string | null;
canonical: string | null; // <link rel="canonical">
prefixDeclared: boolean; // <html prefix="og: https://ogp.me/ns#">
charset: string | null;
applicationName: string | null;// <meta name="application-name">
themeColor: string | null; // <meta name="theme-color">
msTileImage: string | null; // <meta name="msapplication-TileImage">
msTileColor: string | null; // <meta name="msapplication-TileColor">
};
};Each structured property (og:image:width and friends) attaches to the
most recent parent (og:image). If one appears before any parent, it is
reported as an ORPHAN_STRUCTURED_PROPERTY warning.
Auxiliary metadata
Open Graph remains the primary signal. The auxiliary fields are surfaced so that "how does this page advertise itself elsewhere?" debugging stays in one place — they are intentionally kept thin (no schema.org rule checking, no manifest.json fetching).
type Icon = {
rel: string; // matched icon token, normalized to one of:
// "icon" | "apple-touch-icon"
// | "apple-touch-icon-precomposed" | "mask-icon"
// | "fluid-icon" (lower-cased)
//
// <link rel> is a space-separated token set, so a tag
// like `rel="shortcut icon"` (legacy IE) or
// `rel="icon apple-touch-icon"` (multi-role) is parsed
// per token. Multi-role declarations emit one Icon per
// matched token, sharing the same href.
href: string;
sizes?: string; // "32x32 16x16" or "any"
type?: string; // "image/png"
color?: string; // mask-icon color
};
type JsonLd = {
raw: string; // original script body
parsed: unknown | null; // JSON.parse result, or null on failure
types: string[]; // every @type seen (recurses into @graph)
error?: string; // populated when parsed === null
};Severity is set on every warning (error / warn / info). Consumers
typically render all of them and let the user filter at display time;
the engine never decides what is "important enough to show".
fetchHtml(url: string, options?: FetchOptions): Promise<FetchResult>
Fetches a remote URL and returns the HTML as a string. Timeout, response
size cap, and redirect tracing are built in. Redirects are received with
redirect: "manual" so options.guard runs again on every hop. The result
includes redirects: { from, to, status }[] containing every redirect hop
in occurrence order — the UI can replay the "URL entered → 302 → final"
flow exactly.
options.userAgent— User-Agent for outbound requests. Default is a browser-like UA.options.timeoutMs— request timeout. Default 8000.options.maxBytes— response size cap. Default 5 MiB. The stream is cancelled when exceeded.options.guard—(url: URL) => Promise<void> | void. Called right before the initial request and before every redirect hop. Throw aFetchErrorto block, justreturnto allow. If unset, no checks are performed — ogpeek does not make SSRF policy decisions.options.fetch—(url: string, init: RequestInit) => Promise<Response>. A function that performs the HTTP transport for a single hop only.fetchHtmlcalls this for each redirect hop and reads back one response. Redirect tracing, timeout, maxBytes, content-type judgement, and guard invocation stay owned byfetchHtml, so this injection point is a narrow slot for "transport policy only" — custom dispatcher, DoH resolver, mTLS, etc. Default isglobalThis.fetch.
On failure it throws a FetchError (fields: code, status, message).
The main codes: INVALID_URL, UNSUPPORTED_SCHEME, TIMEOUT, NETWORK,
UPSTREAM_STATUS, NOT_HTML, TOO_LARGE, REDIRECT_LOOP,
TOO_MANY_REDIRECTS, BAD_REDIRECT, GUARD_FAILED (when the guard threw
something other than a FetchError).
SSRF is the caller's responsibility
The engine does not make SSRF policy decisions. The definitions of "private
range" and the behaviour of resolvers vary across cloud / on-prem / edge,
so making the library own this responsibility leads to a combinatorial
explosion. Instead a single guard hook lets the caller inject a guard
appropriate to its deployment environment.
import { fetchHtml, FetchError } from "ogpeek/fetch";
await fetchHtml(userInput, {
guard(url) {
if (url.hostname === "169.254.169.254") {
throw new FetchError("BLOCKED_METADATA", 400, "cloud metadata blocked");
}
},
});A real-world guard layers hostname check → DNS resolve → IP-range
classification. Use ipaddr.js to classify ranges; on Node, the canonical
approach is to use undici's Agent({ connect: { lookup } }) to connect
directly to the validated IP, which also defends against DNS rebinding.
Edge runtimes (Cloudflare Workers and friends) do not let you open raw
TCP, so the practical ceiling there is DoH (cloudflare-dns.com/dns-query)
plus a hostname check. For the full threat model and reference
implementations, see the OWASP SSRF Prevention Cheat
Sheet.
This repo's website/lib/ssrf-guard.ts is a concrete example of a
Workers-compatible DoH guard.
Warning codes
| code | severity | description |
| --- | --- | --- |
| OG_TITLE_MISSING | error | og:title is missing |
| OG_TITLE_TOO_LONG | warn | og:title exceeds 60 characters — truncated by KakaoTalk |
| OG_TYPE_MISSING | error | og:type is missing |
| OG_IMAGE_MISSING | error | og:image is missing |
| OG_URL_MISSING | error | og:url is missing |
| OG_URL_MISMATCH | warn | og:url host/path disagrees with the actual request URL |
| OG_TYPE_UNKNOWN | warn | og:type value is not in the OGP-spec whitelist |
| URL_NOT_ABSOLUTE | warn | a URL-typed property is not absolute |
| DUPLICATE_SINGLETON | warn | a single-valued property is declared more than once |
| ORPHAN_STRUCTURED_PROPERTY | warn | a structured property appears with no parent |
| INVALID_DIMENSION | warn | width/height failed integer parsing |
| MISSING_PREFIX_ATTR | info | <html prefix> is not declared |
| JSONLD_PARSE_ERROR | warn | a <script type="application/ld+json"> block did not parse as JSON |
Related projects
The web tool built on this engine: https://github.com/minjun0219/ogpeek
License
MIT.
