universal-read-api

v1.0.0

Published

a month ago

A serverless infrastructure tool that turns any website URL into structured JSON data for AI agents

0High
0Medium
0Low

rajeshdev

web-scraping ai llm json api cloudflare-workers gemini serverless url-to-json data-extraction ai-agents

Universal Read API

Build AI agents that can read any website.

A serverless API that turns any URL into structured JSON data using Cloudflare Workers and Google Gemini 2.5 Flash Lite.

🚀 Features

Free to Host: Runs entirely on the Cloudflare Workers FREE plan.
Fast Extraction: Uses lightweight HTTP fetch + intelligent Regex cleaning (not heavy Puppeteer).
Smart Schema: Define exactly what JSON structure you want back.
Universal: Works on ~80% of websites (blogs, news, documentation, etc.).
Auto-Summarization: If no schema is provided, it intelligently summarizes the page.

🛠️ Tech Stack

Runtime: Cloudflare Workers (Hono framework)
AI Model: Gemini 2.5 Flash Lite
Parsing: Zero-dependency Regex HTML-to-Markdown converter
Language: TypeScript

⚡ Quick Start

1. Clone & Install

git clone https://github.com/RajeshKalidandi/universal-read-api.git
cd universal-read-api
npm install

2. Setup Gemini API Key

Get a free API key from Google AI Studio.

For Local Development: Create a .dev.vars file:

GEMINI_API_KEY=your_actual_api_key_here

For Production:

npx wrangler secret put GEMINI_API_KEY

3. Run Locally

npm run dev

4. Deploy

npm run deploy

🔌 API Usage

Endpoint: `POST /extract`

Request:

curl -X POST https://universal-read-api.rajeshdev.workers.dev/extract \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "schema": {
      "title": "string",
      "summary": "string",
      "dates": ["string"]
    }
  }'

Response:

{
  "success": true,
  "data": {
    "title": "Example Domain",
    "summary": "This domain is for use in documentation examples...",
    "dates": []
  },
  "metadata": {
    "url": "https://example.com",
    "model": "gemini-2.5-flash-lite",
    "tokensUsed": 341,
    "processingTimeMs": 1205
  }
}

🤝 Contributing

We welcome contributions! Whether it's fixing bugs, improving documentation, or adding new features.

How to Contribute

Fork the repository
Clone your fork: git clone https://github.com/YOUR_USERNAME/universal-read-api.git
Create a branch: git checkout -b feature/amazing-feature
Make your changes
Commit: git commit -m 'Add some amazing feature'
Push: git push origin feature/amazing-feature
Open a Pull Request

Ideas for Contributions

[ ] Add support for Puppeteer (Browser Rendering) as an optional mode
[ ] Add rate limiting using Cloudflare KV/Durable Objects
[ ] Add scraping fallback for different site structures
[ ] Improve prompt engineering for specific extraction types

⚠️ Limitations

This version uses standard HTTP requests (fetch), not a full browser.

Works great for: Static sites, blogs, news, wiki, docs.
Does not work for: Heavy client-side rendered apps (some React/SPA sites) that require JavaScript to show any content.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme