@neuralmux/omp-firecrawl

v0.1.0

Published

18 days ago

Firecrawl extension for Oh My Pi — scrape, crawl, map, search, batch scrape, and extract from any URL.

0High
0Medium
0Low

ljuti

omp-extension omp oh-my-pi firecrawl scrape crawl map search extract batch-scrape web-scraping ai-agent

🔥 omp-firecrawl — Firecrawl extension for Oh My Pi

@neuralmux/omp-firecrawl is a Firecrawl extension for Oh My Pi. It gives your agent native tools for scraping, crawling, mapping, searching, batch scraping, and structured extraction — backed by the @mendable/firecrawl-js SDK.

✨ Features

🪄 Drop-in replacement for built-in fetch/browse tools. Returns clean LLM-ready markdown by default.
🌐 Web search (web, news, images) with optional inline scraping.
🗺️ Site mapping to discover URLs before crawling.
🕷️ Async crawl jobs with status, cancel, and pagination.
📚 Batch scrape across many URLs in a single job.
🧩 Structured extraction via JSON Schema.
🔒 Never logs or displays your API key.
⚡ Powered by the SDK — supports JS rendering, anti-bot bypass, PDFs, screenshots, actions, and proxy modes.

📦 Install

omp install npm:@neuralmux/omp-firecrawl

Try without installing permanently:

FIRECRAWL_API_KEY=fc-... omp -e npm:@neuralmux/omp-firecrawl

Try locally from a checkout:

FIRECRAWL_API_KEY=fc-... omp -e ./

⚙️ Configuration

Set a Firecrawl API key before starting omp (get one at firecrawl.dev):

export FIRECRAWL_API_KEY=fc-your-key

Optional overrides:

| Variable | Default | Purpose | | ----------------------- | --------------------------- | ------------------------------------------------------------------------------ | | FIRECRAWL_API_URL | https://api.firecrawl.dev | Custom API endpoint (self-hosted / proxy). FIRECRAWL_BASE_URL also accepted. | | FIRECRAWL_TIMEOUT_MS | (SDK default) | Per-request timeout in milliseconds. | | FIRECRAWL_MAX_RETRIES | (SDK default) | Automatic retry budget for transient failures. |

The extension never logs or displays the API key.

🛠️ Tools

| Tool | What it does | | ------------------------------- | --------------------------------------------------------------------------- | | firecrawl_scrape | Scrape one URL into markdown, HTML, links, screenshots, or structured JSON. | | firecrawl_search | Search the web (web, news, images) and optionally scrape each result. | | firecrawl_map | Discover URLs for a site quickly (sitemap-aware). | | firecrawl_crawl | Start an async site crawl job. | | firecrawl_crawl_status | Poll a crawl job, with pagination. | | firecrawl_crawl_cancel | Cancel a running crawl job. | | firecrawl_batch_scrape | Start an async job scraping many URLs at once. | | firecrawl_batch_scrape_status | Poll a batch scrape job. | | firecrawl_extract | Multi-URL structured extraction with a shared schema or prompt. |

All tools fail with a clear configuration error when FIRECRAWL_API_KEY is missing.

💬 Command

/firecrawl

Reports whether the extension sees an API key and which API URL it will call.

🚀 Examples

Scrape a page as markdown:

{
  "url": "https://www.firecrawl.dev/blog",
  "formats": ["markdown"]
}

Search the web with inline scraping:

{
  "query": "Oh My Pi extensions",
  "limit": 5,
  "scrapeOptions": { "formats": ["markdown"] }
}

Map a site:

{
  "url": "https://docs.firecrawl.dev",
  "limit": 50
}

Start a crawl with markdown extraction:

{
  "url": "https://docs.firecrawl.dev",
  "limit": 25,
  "scrapeOptions": { "formats": ["markdown"] }
}

Structured extraction from a single page:

{
  "url": "https://news.ycombinator.com",
  "formats": [
    {
      "type": "json",
      "prompt": "Extract the top 5 stories",
      "schema": {
        "type": "object",
        "properties": {
          "stories": {
            "type": "array",
            "items": {
              "type": "object",
              "properties": {
                "title": { "type": "string" },
                "url": { "type": "string" },
                "points": { "type": "number" }
              }
            }
          }
        }
      }
    }
  ]
}

🧠 Use cases

Pull docs into your context for grounded code generation.
Audit competitor pricing or feature pages.
Crawl a site before migration or content review.
Search + scrape in a single agent loop for research tasks.
Structured extraction from product catalogs, listings, or filings.

🗂️ Package layout

omp-firecrawl/
├── src/
│   ├── index.ts              ← entry point + /firecrawl command
│   ├── helpers.ts            ← shared: client init, status, abort, result formatting
│   └── tools/
│       ├── scrape.ts
│       ├── search.ts
│       ├── map.ts
│       ├── crawl.ts          ← crawl + status + cancel
│       ├── batch-scrape.ts   ← batch + status
│       └── extract.ts
├── package.json
├── tsconfig.json
├── biome.json
├── README.md
└── LICENSE

The extension ships as TypeScript source. omp loads it via Bun's native import() — no build step required.

🔗 Related

Firecrawl — the web data API powering this extension.
@mendable/firecrawl-js — official JavaScript / TypeScript SDK.
Oh My Pi — the agent runtime that loads this extension.
Firecrawl on GitHub — open-source engine.

📄 License

MIT. See LICENSE.