@crawlith/core

v0.1.2

Published

4 months ago

Headless intelligence engine for Crawlith. Handles crawling, graph analysis, scoring, and data persistence.

0High
0Medium
0Low

saurabhsharma2u

@crawlith/core 🧠

The headless intelligence engine at the heart of Crawlith.

@crawlith/core provides the foundational logic for the Crawlith ecosystem. It is a high-performance, deterministic library for crawling websites, building weighted internal link graphs, and calculating advanced SEO metrics like PageRank and Hub/Authority scores.

✨ Features

🕸️ Graph-Based Crawler: Treats websites as mathematical graphs, not just a list of URLs.
🔢 Advanced Algorithms: Built-in support for PageRank, HITS, and Duplicate Detection.
🔍 SEO Analysis: Integrated on-page audits (Titles, Meta, H1s, Thin Content detection).
💾 Persistent Storage: Efficient data layer using better-sqlite3 for local intelligence.
🧩 Plugin Architecture: Highly extensible hook system for custom crawling and metrics.
⚡ Performance First: Multi-threaded with p-limit and optimized for large datasets.

🏗 Modular Architecture

The core is divided into independent specialized modules:

🛠 Usage (Library)

While most users interact with Crawlith through the CLI, you can use the core directly in your Node.js applications:

import { CrawlSitegraph } from '@crawlith/core';

const engine = new CrawlSitegraph();
const result = await engine.execute({
  url: 'https://example.com',
  limit: 100,
  depth: 5
});

console.log(`Crawl complete. Snapshot ID: ${result.snapshotId}`);
console.log(`Nodes in graph: ${result.graph.nodes.size}`);

🔌 Hook System

Extend the engine's behavior by tapping into the lifecycle:

onInit: Setup before crawling starts.
onCrawlStart: Triggered when first URL is enqueued.
onPageParsed: Intercept page data after parsing.
onGraphBuilt: Access the complete link graph.
onMetrics: Modify or augment calculated metrics.
onReport: Transform final output results.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@crawlith/core 🧠

✨ Features

🏗 Modular Architecture

🛠 Usage (Library)

🔌 Hook System

🛡 License