@imiagkov/llms-gen

v1.0.8

Published

7 months ago

Automated llms.txt generator SDK + CLI

Downloads

0High
0Medium
0Low

imiagkov

llms.txt seo crawler metadata sdk

llms-gen

Generate and monitor llms.txt files automatically from any website. Works as both a CLI tool and an SDK.

✨ Features

🔍 Crawl websites to generate llms.txt automatically
📂 Groups URLs into logical sections (Blog, Guides, Features, etc.)
⚡ Supports Playwright for JS-heavy or dynamic sites
🕵️ Smart change detection via HTTP headers (etag, last-modified, content-length)
🔔 Automated monitoring with callbacks
🛠 Usable from CLI or programmatically via SDK
⏱ Concurrency support for faster metadata extraction
📝 Logs progress during BFS crawl and metadata extraction

📦 Installation

npm install -g @imiagkov/llms-gen

Or use with npx (no install required):

npx @imiagkov/llms-gen https://example.com

For SDK usage:

npm install @imiagkov/llms-gen

🚀 CLI Usage

llms-gen <rootUrl>

Example:

llms-gen https://tryprofound.com --concurrency 8 --max 500 --playwright

This will:

Crawl the website (via sitemap or BFS fallback)
Fetch metadata concurrently
Detect changes
Generate or update llms.txt in the specified output directory

Sample output:

[llms-gen] Checking for changes at https://tryprofound.com...
[llms-gen] Discovered 10 URLs
[llms-gen] Extracted metadata for 10 URLs
[llms-gen] ✅ llms.txt updated at: ./llms.txt

🧑‍💻 SDK Usage

Generate `llms.txt`

import { generateLlmsTxt } from "@imiagkov/llms-gen";

await generateLlmsTxt({
  rootUrl: "https://tryprofound.com",
  outputDir: "./",
  concurrency: 8,
  usePlaywright: true
});

Generate only if changed

import { generateIfChanged } from "@imiagkov/llms-gen";

const res = await generateIfChanged({
  rootUrl: "https://tryprofound.com",
  outputDir: "./",
  concurrency: 8,
  usePlaywright: true
});

if (res) {
  console.log("✅ Updated:", res.shortPath);
} else {
  console.log("No changes detected.");
}

Continuous Monitoring

import { startMonitoring } from "@imiagkov/llms-gen";

const monitor = startMonitoring(
  {
    rootUrl: "https://tryprofound.com",
    intervalMs: 1000 * 60 * 5, // every 5 minutes
    concurrency: 8,
    usePlaywright: true,
    onChange: (res) => console.log("Website changed, llms.txt regenerated:", res.shortPath),
    onError: (err) => console.error(err)
  }
);

// Stop monitoring when needed
// monitor.stop();

⚡ Quick Example of `llms.txt` Output

# tryprofound.com

> Profound helps brands gain visibility in AI-generated answers, optimize their presence in LLM-based answer engines, and stay competitive in the zero-click world.

## Blog

- [Blog - AI Search Optimization News and Updates](https://www.tryprofound.com/blog): Stay informed about the latest developments in AI Search Optimization.
- [How AI Answer Engines Will Transform the Future](https://www.tryprofound.com/blog/how-ai-answer-engines-will-transform-the-future)

## Guides

- [What is Answer Engine Optimization (AEO)? Understanding AEO for the Future of Search](https://www.tryprofound.com/guides/what-is-answer-engine-optimization)

## Features

- [Answer Engine Insights](https://www.tryprofound.com/features/answer-engine-insights)

⚙️ Options

Both CLI & SDK support:

rootUrl (string, required) — website root
outputDir (string) — output folder for llms.txt
userAgent (string) — custom User-Agent
concurrency (number) — number of parallel requests
delayMs (number) — delay between requests (ms)
includePatterns / excludePatterns (RegExp[]) — URL filtering
maxPages (number) — max number of pages to crawl
usePlaywright (boolean) — enable JS rendering for dynamic pages
scrapeTimeoutMs (number) — page timeout (ms)
ignoreRobotsForSitemap (boolean) — bypass robots.txt rules
logRobots (boolean) — debug robots.txt parsing
logCrawler (boolean) — log sitemap/BFS crawl progress
logMetadata (boolean) — log metadata extraction progress

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme