npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

bitcrawl

v0.0.1

Published

Universal web crawling SDK for developers

Downloads

16

Readme

BitCrawl SDK 🕷️

Universal Web Crawling SDK for Node.js Developers

BitCrawl is a comprehensive, developer-friendly web crawling library designed to make web data extraction simple, efficient, and powerful. Whether you're building web scrapers, data analysis tools, AI applications, or any project that needs web data, BitCrawl provides the tools you need.

Node.js 14+ TypeScript License: MIT npm version

🚀 Why Choose BitCrawl?

| Feature | BitCrawl | Other Crawlers | |---------|----------|----------------| | Easy to Use | ✅ Simple TypeScript/JavaScript API | ❌ Complex setup | | Smart Filtering | ✅ Contextual content filtering | ❌ Raw data only | | Multiple Formats | ✅ JSON, CSV, TXT output | ❌ Limited formats | | Flexible Crawling | ✅ 5 different crawling modes | ❌ Single approach | | Developer Friendly | ✅ Both CLI and programmatic API | ❌ API only | | Cost Efficient | ✅ Reduces data volume by 70-90% | ❌ Full data extraction |

🛠️ Key Features

  • 🎯 Smart Content Filtering: Reduce noise and extract only relevant content
  • 📊 Multiple Output Formats: JSON, CSV, TXT for different use cases
  • 🔄 Flexible Crawling Modes: Scrape, crawl, search, map, extract
  • 📦 Advanced Processing: Automatic chunking and text processing
  • ⚡ CLI & Node.js API: Use from command line or integrate into your code
  • 🤝 Respectful Crawling: Built-in rate limiting and respectful practices
  • 🔧 TypeScript Support: Full TypeScript support with type definitions

📦 Installation

npm install bitcrawl
# or
yarn add bitcrawl

⚡ Quick Start

Node.js/TypeScript API

import { BitCrawl } from 'bitcrawl';

// Initialize
const bc = new BitCrawl();

// Scrape a single page
const data = await bc.scrape("https://example.com");

// Crawl multiple pages with filtering
const crawlResult = await bc.crawl("https://example.com", {
  context: "pricing information",  // Filter for relevant content
  pageLimit: 10
});

// Search the web
const searchResults = await bc.search("machine learning tutorials", {
  pageLimit: 5
});

// Get structured chunks for processing
const chunks = bc.getChunks(crawlResult, 1000);

JavaScript (CommonJS)

const { BitCrawl } = require('bitcrawl');

const bc = new BitCrawl();

// Scrape a single page
bc.scrape("https://example.com")
  .then(data => console.log(data))
  .catch(err => console.error(err));

Command Line Interface

# Scrape a single page
bitcrawl -l https://example.com -m scrape -o json

# Crawl with filtering
bitcrawl -l https://example.com -m crawl -c "pricing" -p 10 -o csv

# Search and extract
bitcrawl -l "python tutorials" -m search -p 5 -o json

🔧 Crawling Modes

1. Scrape - Single Page Extraction

const data = await bc.scrape("https://example.com");

Perfect for extracting data from specific pages.

2. Crawl - Multi-Page Website Crawling

const data = await bc.crawl("https://example.com", { pageLimit: 20 });

Follows internal links to crawl entire websites.

3. Search - Web Search Integration

const results = await bc.search("machine learning", { pageLimit: 10 });

Search the web and extract content from results.

4. Map - Website Structure Mapping

const structure = await bc.map("https://example.com");

Create a map of website structure and navigation.

5. Extract - Advanced Data Extraction

const data = await bc.crawl("https://example.com", { 
  context: "product information",
  pageLimit: 15 
});

Extract structured data with enhanced processing.

🎯 Smart Content Filtering

BitCrawl's intelligent filtering reduces data volume while preserving relevant content:

// Without filtering - gets everything
const fullData = await bc.crawl("https://docs.example.com", { pageLimit: 10 });

// With filtering - gets only relevant content (70-90% reduction)
const filteredData = await bc.crawl("https://docs.example.com", {
  pageLimit: 10,
  context: "functions classes modules"  // Smart filtering
});

Benefits:

  • 📉 Reduce data volume by 70-90%
  • 💰 Lower processing costs
  • 🎯 Higher content relevance
  • ⚡ Faster data processing

📊 Output Formats

JSON (Default)

const data = await bc.scrape("https://example.com");
const jsonOutput = await bc.formatOutput(data, "json");

CSV for Analysis

const data = await bc.crawl("https://example.com");
const csvOutput = await bc.formatOutput(data, "csv");

Plain Text

const data = await bc.search("tutorials");
const txtOutput = await bc.formatOutput(data, "txt");

🔧 Advanced Features

Text Chunking

// Split content into manageable chunks
const chunks = bc.getChunks(crawlResult, 1000, 100);

// Each chunk includes content, metadata, and positioning
chunks.forEach(chunk => {
  console.log(`Chunk ID: ${chunk.metadata.chunkId}`);
  console.log(`Content: ${chunk.content.slice(0, 100)}...`);
});

Token Estimation

// Estimate processing costs
const estimatedTokens = bc.estimateTokens(text);
console.log(`Estimated tokens: ${estimatedTokens}`);

Configuration Options

const bc = new BitCrawl({
  delay: 2.0,          // Delay between requests (seconds)
  verbose: true,       // Enable detailed logging
  timeout: 30000,      // Request timeout (ms)
  userAgent: 'MyBot/1.0'  // Custom user agent
});

📋 Common Use Cases

🔍 Data Research & Analysis

// Competitive analysis
const competitorData = await bc.crawl("https://competitor.com", {
  context: "pricing features"
});

// Market research
const marketData = await bc.search("industry trends 2024", { pageLimit: 20 });

🤖 AI & Machine Learning

// Training data collection
const trainingData = await bc.crawl("https://docs.example.com", {
  context: "tutorials examples"
});

// Knowledge base building
const chunks = bc.getChunks(trainingData, 512);

📊 Content Aggregation

// News aggregation
const news = await bc.search("technology news", { pageLimit: 50 });

// Documentation aggregation
const docs = await bc.crawl("https://docs.framework.com", {
  context: "API reference"
});

💼 Business Intelligence

// Monitor competitors
const updates = await bc.crawl("https://competitor.com/blog", {
  context: "product updates"
});

// Track industry news
const industryNews = await bc.search("industry analysis", { pageLimit: 25 });

📱 CLI Reference

bitcrawl [options]

Required:
  -l, --link <url>            Target URL or search query
  -m, --mode <mode>           scrape, crawl, search, map, extract

Optional:
  -o, --output <format>       json, csv, txt (default: json)
  -p, --pagenumber <number>   Maximum pages (default: 10)
  -c, --context <text>        Filter content by context
  -r, --min-relevance <score> Relevance threshold (0.0-1.0)
  -d, --delay <seconds>       Request delay in seconds (default: 1.0)
  -v, --verbose               Enable detailed output

🧪 Examples

Web Scraping for Analysis

import { BitCrawl } from 'bitcrawl';
import fs from 'fs';

const bc = new BitCrawl();

// Scrape product information
const products = await bc.crawl("https://store.example.com", {
  context: "price product specifications",
  pageLimit: 50
});

// Export to CSV for analysis
const csvData = await bc.formatOutput(products, "csv");
await fs.promises.writeFile("products.csv", csvData);

Documentation Extraction

// Extract API documentation
const docs = await bc.crawl("https://api-docs.example.com", {
  context: "endpoints parameters examples",
  pageLimit: 100
});

// Get structured chunks
const chunks = bc.getChunks(docs, 1200);
console.log(`Extracted ${chunks.length} documentation sections`);

Market Research

// Research industry trends
const trends = await bc.search("artificial intelligence trends 2024", {
  pageLimit: 30
});

// Get high-relevance content only
const relevantTrends = trends.filter(item => 
  (item.relevanceScore || 0) > 0.7
);

🎯 Framework Integration

BitCrawl works seamlessly with popular Node.js frameworks:

Express.js API

import express from 'express';
import { BitCrawl } from 'bitcrawl';

const app = express();
const bc = new BitCrawl();

app.get('/scrape', async (req, res) => {
  try {
    const data = await bc.scrape(req.query.url as string);
    res.json(data);
  } catch (error) {
    res.status(500).json({ error: error.message });
  }
});

Next.js API Route

// pages/api/crawl.ts
import { NextApiRequest, NextApiResponse } from 'next';
import { BitCrawl } from 'bitcrawl';

export default async function handler(
  req: NextApiRequest, 
  res: NextApiResponse
) {
  const bc = new BitCrawl();
  const result = await bc.crawl(req.body.url, req.body.options);
  res.json(result);
}

📈 Performance & Efficiency

| Metric | Typical Results | |--------|----------------| | Data Reduction | 70-90% volume decrease | | Relevance Score | 85-95% content relevance | | Processing Speed | 2-5 seconds per page | | Memory Usage | Optimized for large datasets |

🤝 Contributing

We welcome contributions! See our Contributing Guide for details.

📄 License

MIT License - see LICENSE file for details.

🆘 Support


Made with ❤️ for Node.js developers who need web data 🕷️✨