@jcottam/html-metadata
v3.1.2
Published
This JavaScript library simplifies the extraction of HTML Meta and OpenGraph tags from HTML content or URLs.
Maintainers
Readme
HTML Metadata
@jcottam/html-metadata is a lightweight, TypeScript-first JavaScript library for extracting HTML meta tags, Open Graph tags, and other metadata from HTML content or URLs. Perfect for social media sharing, SEO analysis, and web scraping applications.
Compatibility: Works seamlessly with Node.js (CommonJS) and modern browsers (ES6+).
Features
- 🚀 Fast & Lightweight - Built on Cheerio for optimal performance
- 📱 Open Graph Support - Extract all Open Graph meta tags for social media
- 🎯 TypeScript Ready - Full type definitions and IntelliSense support
- 🌐 URL & HTML Support - Extract from URLs or HTML strings directly
- 🔧 Configurable - Customizable extraction with filtering and timeout options
- 🛡️ Error Resilient - Graceful handling of malformed HTML and network errors
- 📦 Zero Dependencies - Only depends on Cheerio for HTML parsing
Installation
npm install @jcottam/html-metadataUsage
ES6/ESM Import
import { extractFromUrl, extractFromHTML } from "@jcottam/html-metadata"CommonJS Require
const { extractFromUrl, extractFromHTML } = require("@jcottam/html-metadata")Examples
Extract metadata from a URL
import { extractFromUrl } from "@jcottam/html-metadata"
// Basic usage
const metadata = await extractFromUrl("https://www.retool.com")
console.log(metadata)
// Output: { lang: "en", title: "Retool", og:title: "...", og:description: "...", ... }
// With options
const options = {
timeout: 5000, // 5 second timeout
metaTags: ["og:title", "og:description", "og:image"], // Only extract specific tags
}
const filteredMetadata = await extractFromUrl("https://example.com", options)Extract metadata from HTML string
import { extractFromHTML } from "@jcottam/html-metadata"
const html = `
<html lang="en">
<head>
<title>My Website</title>
<meta property="og:title" content="My Amazing Website" />
<meta property="og:description" content="This is a brief description" />
<meta property="og:image" content="https://example.com/image.jpg" />
<link rel="icon" href="/favicon.ico" />
</head>
</html>
`
const metadata = extractFromHTML(html)
console.log(metadata)
// Output: {
// lang: "en",
// title: "My Website",
// "og:title": "My Amazing Website",
// "og:description": "This is a brief description",
// "og:image": "https://example.com/image.jpg",
// favicon: "/favicon.ico"
// }Resolve relative URLs with baseUrl
const html = '<html><head><link rel="icon" href="/favicon.ico" /></head></html>'
const options = { baseUrl: "https://example.com" }
const metadata = extractFromHTML(html, options)
console.log(metadata.favicon) // "https://example.com/favicon.ico"API Reference
Methods
extractFromHTML(html: string, options?: Options): ExtractedData
Extracts metadata from an HTML string.
Parameters:
html(string): The HTML content to parseoptions(Options, optional): Configuration options
Returns: ExtractedData - Object containing extracted metadata
extractFromUrl(url: string, options?: Options): Promise<ExtractedData | null>
Extracts metadata from a URL by fetching the HTML content.
Parameters:
url(string): The URL to fetch and extract metadata fromoptions(Options, optional): Configuration options
Returns: Promise<ExtractedData | null> - Promise that resolves to extracted metadata or null if extraction fails
Types
Options
type Options = {
/** Base URL for resolving relative links (e.g., favicon, apple-touch-icon) */
baseUrl?: string
/** Fetch timeout in milliseconds for URL extraction */
timeout?: number
/** Specific meta tags to extract. If not provided, all meta tags will be extracted */
metaTags?: string[]
}ExtractedData
type ExtractedData = {
/** Language attribute from the HTML tag */
lang?: string
/** Page title from the title tag */
title?: string
/** Favicon URL */
favicon?: string
/** Apple touch icon URL */
"apple-touch-icon"?: string
/** Open Graph and other meta tag properties */
[key: string]: string | undefined
}Example Response
{
"lang": "en",
"title": "Retool | The fastest way to build internal software.",
"og:type": "website",
"og:url": "https://retool.com/",
"og:title": "Retool | The fastest way to build internal software.",
"og:description": "Retool is the fastest way to build internal software. Use Retool's building blocks to build apps and workflow automations that connect to your databases and APIs, instantly.",
"og:image": "https://d3399nw8s4ngfo.cloudfront.net/og-image-default.webp",
"favicon": "/favicon.png",
"apple-touch-icon": "/apple-touch-icon.png"
}Browser Usage & CORS
When using extractFromUrl in browsers, you may encounter CORS restrictions. To bypass CORS:
- Server-side usage: Run
extractFromUrlon a server - Proxy services: Use a CORS proxy like AllOrigins
- Browser extensions: Use CORS-disabling browser extensions for development
Error Handling
The library handles errors gracefully:
// Network errors return null
const result = await extractFromUrl("https://invalid-url.com")
if (result === null) {
console.log("Failed to fetch or parse the URL")
}
// Malformed HTML is handled gracefully
const metadata = extractFromHTML(
"<html><head><meta property='og:title' content='Test'"
)
console.log(metadata["og:title"]) // "Test"Supported Meta Tags
The library extracts the following types of metadata:
- HTML attributes:
langfrom<html>tag - Title: Content from
<title>tag - Favicon:
hreffrom<link rel="icon">tags - Apple Touch Icon:
hreffrom<link rel="apple-touch-icon">tags - Meta tags: All
<meta>tags withnameorpropertyattributes - Open Graph: All
og:*properties - Twitter Cards: All
twitter:*properties - Custom meta tags: Any custom meta tags you define
Development
Prerequisites
- Node.js 18+
- npm
Setup
git clone https://github.com/jcottam/html-metadata.git
cd html-metadata
npm installScripts
npm run build # Build the library
npm test # Run tests
npm run release # Release new version (manual)Automated Workflow
This project uses automated dependency management and releases:
- Renovate Bot: Automatically updates dependencies and creates pull requests
- GitHub Actions: Automatically releases new versions when changes are pushed to main
- Manual Release: Use
npm run releasefor immediate releases or specific version bumps
Testing
The project uses Vitest for testing. Run tests with:
npm testDependencies
- Cheerio: Fast, flexible HTML parsing
- Vitest: Next-generation testing framework
- Rollup: Module bundler for multiple formats
Contributing
We welcome contributions! Please follow these guidelines:
- Fork the repository and create a feature branch
- Make changes and ensure tests pass (
npm test) - Add tests for new functionality
- Update documentation if needed
- Submit a pull request with a clear description
Development Guidelines
- Follow TypeScript best practices
- Add JSDoc comments for new functions
- Ensure all tests pass
- Update README for new features
- Use conventional commit messages
License
MIT License - see LICENSE.md for details.
