omnifetch-lib
v1.2.0
Published
Universal content extraction library with tiered fetching strategies
Maintainers
Readme
OmniFetch JavaScript Library
JavaScript/TypeScript implementation of OmniFetch - a universal content extraction library.
Features
- Universal Extraction: Fetches content from any URL, handling standard sites, SPAs, and paywalls.
- Tiered System:
- Light Fetch: Fast, standard HTTP request.
- Headless Browser: Handles dynamic JS-heavy sites (requires Netlify endpoint).
- Search Fallback: Finds alternative sources for paywalled or blocked content.
- Smart Parsing: Converts HTML to clean Markdown or JSON.
Installation
npm install omnifetchQuick Start
import { omniFetch } from 'omnifetch';
// Text extraction (Markdown)
const result = await omniFetch('https://example.com', { mode: 'TEXT' });
console.log(result.content);
// JSON extraction (Structured Data)
const jsonResult = await omniFetch('https://example.com', { mode: 'JSON' });
console.log(jsonResult.content.title);Configuration
The omniFetch function accepts a configuration object as the second argument:
interface OmniFetchConfig {
/** Output mode: 'JSON' for structured data, 'TEXT' for plain text (Markdown) */
mode: 'JSON' | 'TEXT';
/** Request timeout in milliseconds (default: 30000) */
timeout?: number;
/** Custom Netlify endpoint for headless browser fallback (Tier 2) */
netlifyEndpoint?: string;
/** Custom headers to include in requests */
headers?: Record<string, string>;
/** Whether to skip Tier 2 (headless) and go directly to search fallback */
skipHeadless?: boolean;
/** Whether to skip Tier 3 (search fallback) entirely */
skipSearch?: boolean;
/** Override title for search fallback (useful for opaque URLs like x.com/status/ID) */
forceTitle?: string;
}Advanced Usage
Handling Blocked Domains (e.g., X/Twitter)
Some domains block direct scraping. OmniFetch automatically handles this by falling back to search (Tier 3). For opaque URLs, you can provide a forceTitle to improve search results.
const result = await omniFetch('https://x.com/someuser/status/12345', {
mode: 'TEXT',
forceTitle: 'Specific Tweet Content Title' // Helps find the content via search
});Headless Browser Support
To enable Tier 2 (Headless Browser) for dynamic sites, you need to deploy the provided Netlify function and pass the endpoint.
const result = await omniFetch('https://dynamic-site.com', {
netlifyEndpoint: 'https://your-site.netlify.app/.netlify/functions/headless-fetch'
});Building
npm install
npm run buildDevelopment
npm run dev # Watch modeSee the main README.md for full documentation.
