npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

eliza-scraper

v1.1.8

Published

Awesome Scraper for eliza, scrape docs, tweets, and tokens

Downloads

24

Readme

Eliza Scraper

A powerful scraping and RAG (Retrieval Augmented Generation) toolkit for collecting, processing, and serving blockchain ecosystem data. Built with Bun, TypeScript, and modern vector search capabilities.

Features

  • 🤖 Automated Twitter data scraping with engagement filtering
  • 📊 Real-time token price and market data tracking via CoinGecko
  • 📑 Documentation site scraping with caching
  • 🔄 Configurable cron jobs for automated data collection
  • 🔍 Vector search with time-decay relevance scoring
  • 🚀 REST API with Swagger documentation

Architecture

RAG Pipeline for Tweet Processing

graph TD
    A[Twitter] --> B[Apify Scraper]
    B --> C[Filters]
    C --> D[Text Processing]
    D --> E[OpenAi Embedding]
    E --> F[Qdrant Vector DB]
    F --> G[Search API]
    
    subgraph "filters"
    C --> H[Min Replies]
    C --> I[Min Retweets]
    C --> J[Search Tags]
    C --> K[Kaito Most Mindshare handles]
    end
    
    subgraph "Vector Search"
    F --> L[Semantic Search]
    end 

System Architecture

graph TB
    CLI[CLI Interface] --> Services
    API[REST API] --> Services
    
    subgraph "Services"
    Tweets[Tweet Service]
    Tokens[Token Service]
    Docs[Site Service]
    end
    
    subgraph "Storage"
    Qdrant[(Qdrant Vector DB)]
    Postgres[(PostgreSQL)]
    end
    
    Services --> Storage

Scraper Quick Start

  1. Install globally:
npm install -g eliza-scraper
  1. Create .env file:
# Required APIs
APIFY_API_KEY=your_apify_key
COINGECKO_API_KEY=your_coingecko_key
OPENAI_API_KEY=

# Database
DATABASE_URL=postgresql://user:password@localhost:5432/token_scraper

# Vector DB
QDRANT_URL=http://localhost:6333
QDRANT_COLLECTION_NAME=twitter_embeddings

# Optional
TWITTER_SEARCH_TAGS=berachain,bera
APP_TOPICS=berachain launch,berachain token
  1. Start required services:
docker-compose up -d

CLI Commands

Start API Server

elizascraper server --port <number>

Tweet Scraping Service

elizascraper tweets-cron [options]

Options:
--port Port number (default: 3000)
--tags Custom search tags, can also pass twitter search strings, check out https://github.com/igorbrigadir/twitter-advanced-search (comma-separated)
--start Start date (YYYY-MM-DD)
--end End date (YYYY-MM-DD)
--min-replies Minimum replies threshold (default: 4)
--retweets Minimum retweets threshold (default: 2)
--max-items Maximum tweets to fetch (default: 200)

Disclaimer: make sure you don't run the cron job in a small time interval, with start and end date less than 3 days, fetching less than 50 tweets, would lead to a ban shortly on your apify key.

To prevent ban, same search query, ran multiple times inside an hour would be skipped over.

Tracking Latest Token Data

elizascraper token-cron [options]

Options:
-p, --port <number> Port to listen on (default: 3008)
-i, --interval <number> Interval in minutes (default: 60)
--currency <string> Currency to track (default: 'usd')
--cg_category <string> CoinGecko category (default: 'berachain-ecosystem')

Documentation Scraping Service

elizascraper site-cron [options]

Options:
-p, --port <number> Port to listen on (default: 3010)
-i, --interval <number> Interval in minutes (default: 60)
-l, --link <string> URL to scrape (default: 'https://docs.berachain.com')

API Documentation

Tweets API

GET /tweets/search

Search for tweets based on semantic similarity.

Query Parameters:

  • searchString (required): Text to search for semantically similar tweets

Response Schema:

{
  status: "success" | "error",
  results: {
    tweet: {
      id: string,
      text: string,
      fullText: string,
      author: {
        name: string,
        userName: string,
        profilePicture: string
      },
      createdAt: string,
      retweetCount: number,
      replyCount: number,
      likeCount: number
    },
    combinedScore: number,
    originalScore: number,
    date: string
  }[]
}

GET /tweets/random

Get random tweets by predefined topics.

Response Schema:

{
  topic: string,
  tweet: {
    tweet: {
      id: string,
      text: string,
      fullText: string,
      author: {
        name: string,
        userName: string,
        profilePicture: string
      },
      createdAt: string,
      retweetCount: number,
      replyCount: number,
      likeCount: number
    },
    combinedScore: number,
    originalScore: number,
    date: string
  }[]
}

Tokens API

GET /tokens

Get token market data.

Query Parameters:

  • symbol (optional): Token symbol to filter results

Response Schema:

{
  status: "success" | "error",
  results: {
    id: string,
    symbol: string,
    name: string,
    image: string,
    currentPrice: number,
    marketCap: number,
    marketCapRank: number,
    fullyDilutedValuation: number,
    totalVolume: number,
    high24h: number,
    low24h: number,
    priceChange24h: number,
    priceChangePercentage24h: number,
    marketCapChange24h: number,
    marketCapChangePercentage24h: number,
    circulatingSupply: number,
    totalSupply: number,
    maxSupply: number,
    ath: number,
    athChangePercentage: number,
    athDate: string,
    atl: number,
    atlChangePercentage: number,
    atlDate: string,
    lastUpdated: string
  }
}

Documentation API

GET /docs/scrape

Scrape and cache documentation from a given URL.

Query Parameters:

  • url (required): URL of the documentation site to scrape

Response Schema:

{
  status: "success" | "error",
  data: {
    title: string,
    last_updated: string,
    total_sections: number,
    sections: {
      topic: string,
      source_url: string,
      overview: string,
      subsections: {
        title: string,
        content: string
      }[]
    }[]
  },
  source: "cache" | "fresh"
} | {
  status: "error",
  message: string
}

Cache Behavior:

  • Documentation is cached for 7 days
  • Returns cached version if available and not expired
  • Automatically refreshes cache if data is older than 7 days

Error Handling

All routes use a standardized error response format:

{
  status: "error",
  code: string,
  message: string
}

Contributing

Contributions are welcome! Please read our contributing guidelines before submitting pull requests.

License

MIT