npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

greptor

v0.8.0

Published

Transform messy, unstructured text into clean, grep-friendly data for agentic search workflows.

Readme

Greptor

Grep + Raptor: Transform messy, unstructured text into clean, grep-friendly data for agentic search workflows.

License: MIT npm version

Claude Code has proven that agentic search (ripgrep + filesystem traversal + iterative investigation) is powerful enough for complex code navigation tasks. But what about textual data like documents, transcripts, posts, articles, notes, and reports?

Greptor is a library that helps you with this. It ingests and indexes unstructured text into a format that agents can easily search using simple tools like ripgrep.

Why Agentic Search (and Why Not Classic RAG)?

RAG worked around small context windows by chunking documents and retrieving "relevant" fragments. That approach has recurring pain points:

  • Chunking breaks structure: Tables, section hierarchies, and cross-references get lost.
  • Embeddings are fuzzy: They struggle with exact terms, numbers, and identifiers.
  • Complexity overhead: Hybrid search + rerankers add latency, cost, and moving parts.
  • Error cascade: If retrieval misses the right chunk, the answer can't be correct.

Agentic search flips the approach: with larger context windows and better tool use, agents can search, open files, follow references, and refine queries — more like a human analyst.

Greptor's job is to clean, chunk, and add structure to your documents, making them easily searchable with text tools like ripgrep. No complex indices, no retrievers, no vector databases. Just minimal initial processing + maximal grep-ability.

How It Works

Step 1: Install

npm install greptor
# or
bun add greptor

Step 2: Initialize

Create a Greptor instance with your base path, topic, and model config.

import { createGreptor } from 'greptor';

// Create Greptor instance
const greptor = await createGreptor({
  basePath: './projects/investing/content',
  topic: 'Investing, stock market, financial, and macroeconomics',
  tagSchema: YOUR_TAG_SCHEMA, // Required. See "Tag Schemas" below.
  model: {
    provider: "@ai-sdk/openai",
    model: "gpt-5-mini",
  },
});

// Start background processing workers
await greptor.start();
  • basePath: Base directory where data will be stored.
  • topic: Helps Greptor understand your data better and generate a relevant tag schema.
  • tagSchema: Required. Define your tag fields (or generate them with greptor generate tags).
  • model: A config object with provider, model, and optional options for the Vercel AI SDK.

Greptor will automatically create and manage the following structure in your basePath:

  • raw/ - immediate raw content writes
  • processed/ - enriched/processed content from background workers

Model Config

Greptor uses an LLM (via Greptor) to process content. You'll need to:

  1. Choose a provider from the AI SDK ecosystem:

    • @ai-sdk/openai - OpenAI (GPT-4, GPT-4o, etc.)
    • @ai-sdk/anthropic - Anthropic (Claude)
    • @ai-sdk/groq - Groq (fast inference)
    • @ai-sdk/openai-compatible - OpenAI-compatible endpoints (NVIDIA NIM, OpenRouter, etc.)
    • And many more...
  2. Get an API key from your provider and set it as an environment variable:

    export OPENAI_API_KEY="sk-..."
    # or add to ~/.bashrc, ~/.zshrc, etc.
  3. Provide it in the model config when creating Greptor.

      const greptor = await createGreptor({
     	basePath: './projects/investing/content',
     	topic: 'Investing, stock market, financial, and macroeconomics',
     tagSchema: YOUR_TAG_SCHEMA,
     	model: {
     		provider: "@ai-sdk/openai-compatible",
     		model: "z-ai/glm4.7",
     		name: "nvidia",
     		options: {
     			baseURL: "https://integrate.api.nvidia.com/v1",
     			apiKey: process.env.NVIDIA_API_KEY,
     		},
     	},
     });
    

await greptor.start(); ```

Step 3: Start Feeding Documents

await greptor.eat({
  id: 'QwwVJfvfqN8',
  source: 'youtube',
  publisher: '@JosephCarlsonShow',
  format: 'text',
  label: 'Top Five AI Stocks I\'m Buying Now',
  content: '{fetch and populate video transcript here}',
  creationDate: new Date('2025-11-15'),
  tags: {
    // Optional custom tags specific to the source or document
    channelTitle: 'Joseph Carlson',
    channelSubscribers: 496000
  },
});

await greptor.eat({
  id: 'tesla_reports_418227_deliveries_for_the_fourth',
  source: 'reddit',
  publisher: 'investing',  // For Reddit, publisher is the subreddit name
  format: 'text',
  label: 'Tesla reports 418,227 deliveries for the fourth quarter, down 16%',
  content: '{fetch and populate Reddit post with comments here}',
  creationDate: new Date('2025-12-03'),
  tags: {
    // Optional custom tags
    upvotes: 1400
  },
});

Step 4: Wait for Background Processing

Greptor writes your input to a raw Markdown file immediately. After you call await greptor.start(), background workers run enrichment (LLM cleaning + chunking + tagging) and write a processed Markdown file. You can grep the raw files right away, and the processed files will appear shortly after.

Step 5: Generate a Skill (CLI)

Navigate to your workspace directory and run:

greptor generate skills

The CLI will prompt you to pick an agent type (claude code, codex, or opencode)

Then it writes the appropriate skill file for your chosen agent.

The skill is customized for the sources you provide and includes search tips based on the tag schema. You can always customize it manually further for better results.

Step 6: Run the Agent

By this point, you should have the following structure in your basePath:

./projects/investing/content/
  .claude/
    skills/
      search-youtube-reddit/
        SKILL.md
  raw/
    youtube/
      JosephCarlsonShow/
        2025-12/
          2025-12-01-Top-Five-AI-Stocks-Im-Buying-Now.md
    reddit/
      investing/
        2025-12/
          2025-12-03-Tesla-reports-418227-deliveries-for-the-fourth-quarter-down-16.md
  processed/
    youtube/
      JosephCarlsonShow/
        2025-12/
          2025-12-01-Top-Five-AI-Stocks-Im-Buying-Now.md
    reddit/
      investing/
        2025-12/
          2025-12-03-Tesla-reports-418227-deliveries-for-the-fourth-quarter-down-16.md

If you chose Codex or OpenCode, the skill file will be written to:

  • .codex/skills/search-*.md (Codex)
  • .opencode/skills/search-*.md (OpenCode)

Now run your chosen agent in this folder and ask questions about your data or perform research tasks!

For better results:

  1. Connect MCP servers like Yahoo Finance or other relevant financial/stock market MCP servers for up-to-date information.
  2. Add personal financial information, such as your portfolio holdings, watchlists, and risk profile.
  3. Create custom skills, slash commands, or subagents for researching specific tickers, sectors, topics, or managing your portfolio.

Now you have a personal investment research assistant with access to your portfolio, sentiment data (YouTube, Reddit), news, and market data! You don't have to manually watch dozens of YouTube channels or spend hours scrolling Reddit and other sources.

Under the Hood

1) Raw Write (Immediate)

eat() writes the input to a raw Markdown file with YAML frontmatter. You can grep it right away.

2) Background Processing (Asynchronous)

Workers pick up new documents and run a one-time pipeline:

  1. LLM clean + chunk + tag (single prompt): Remove boilerplate, split into semantic chunks, and inline grep-friendly per-chunk tags.

Here's an example of a processed file:

---
title: "NVIDIA Q4 2024 Earnings: AI Boom Continues"
source: "youtube"
publisher: "Wall Street Millennial"
date: 2025-11-15
ticker: "NVDA"
videoId: "dQw4w9WgXcQ"
url: "https://youtube.com/watch?v=dQw4w9WgXcQ"
---

## 01 Revenue Growth Analysis
topics=earnings,revenue,data_center
sentiment=positive
tickers=NVDA

NVIDIA reported Q4 revenue of $35.1 billion, beating estimates...

## 02 AI Chip Demand Outlook
topics=ai,competition,market_share
sentiment=bullish
tickers=NVDA,AMD,INTC
timeframe=next_quarter

The demand for AI accelerators continues to outpace supply...

3) Navigate with grep/glob

Your "index" is the YAML frontmatter (document-level) plus the per-chunk tag lines. Agents can search it deterministically.

Basic search examples:

# Simple tag search with context
rg -n -C 6 "ticker=NVDA" content/processed/

# Search for any value in a tag field
rg -n -C 6 "sentiment=" content/processed/

# Case-insensitive full-text search
rg -i -n -C 3 "artificial intelligence" content/processed/

# Search within a specific source
rg -n -C 6 "sector=technology" content/processed/youtube/

Date-filtered searches:

# Content from December 2025
rg -n -C 6 "ticker=TSLA" content/processed/ --glob "**/2025-12/*.md"

# Q4 2025 content
rg -n -C 6 "sentiment=bullish" content/processed/ --glob "**/2025-1[0-2]/*.md"

# Specific month and source
rg -n -C 6 "asset_type=etf" content/processed/reddit/ --glob "**/2025-11/*.md"

Combined tag filters:

# Match chunks with two specific tags (using file list)
rg -l "sector=technology" content/processed/ | xargs rg -n -C 6 "sentiment=bullish"

# Pipeline filter for complex queries
rg -n -C 6 "ticker=AAPL" content/processed/ | rg "recommendation=.*buy"

# Three-way filter: tech stocks with bullish sentiment and buy recommendation
rg -l "sector=technology" content/processed/ | xargs rg -l "sentiment=bullish" | xargs rg -n -C 6 "recommendation=buy"

# Find AI narrative discussions with specific tickers
rg -n -C 6 "narrative=.*ai" content/processed/ | rg "ticker=NVDA\|ticker=.*,NVDA"

Discovery and exploration:

# List all unique tickers mentioned
rg -o "ticker=[^\n]+" content/processed/ | cut -d= -f2 | tr ',' '\n' | sort -u

# Count occurrences of each sentiment
rg -o "sentiment=[^\n]+" content/processed/ | cut -d= -f2 | sort | uniq -c | sort -rn

# Top 20 most discussed companies
rg -o "company=[^\n]+" content/processed/ | cut -d= -f2 | tr ',' '\n' | sort | uniq -c | sort -rn | head -20

# Find all files discussing dividend investing
rg -l "investment_style=dividend" content/processed/

# See what narratives exist in the data
rg -o "narrative=[^\n]+" content/processed/ | cut -d= -f2 | tr ',' '\n' | sort -u

Analysis patterns:

# Sentiment distribution for a specific ticker
rg -n -C 6 "ticker=TSLA" content/processed/ | rg -o "sentiment=[^\n]+" | cut -d= -f2 | sort | uniq -c

# Most discussed sectors
rg -o "sector=[^\n]+" content/processed/ | cut -d= -f2 | tr ',' '\n' | sort | uniq -c | sort -rn

# Track narrative evolution over time
for month in 2025-{10..12}; do
  echo "=== $month ==="
  rg -o "narrative=[^\n]+" content/processed/ --glob "**/$month/*.md" | cut -d= -f2 | tr ',' '\n' | sort | uniq -c | sort -rn | head -5
done

# Compare sentiment across sources for a stock
for source in youtube reddit; do
  echo "=== $source ==="
  rg -n -C 6 "ticker=AAPL" content/processed/$source/ | rg -o "sentiment=[^\n]+" | cut -d= -f2 | tr ',' '\n' | sort | uniq -c
done

# Find all strong buy recommendations by sector
for sector in technology healthcare financials; do
  echo "=== $sector ==="
  rg -l "sector=$sector" content/processed/ | xargs rg -n -C 3 "recommendation=strong_buy" | head -5
done

Advanced multi-criteria searches:

# Large-cap tech stocks with bullish sentiment
rg -l "market_cap=large_cap" content/processed/ | xargs rg -l "sector=technology" | xargs rg -n -C 6 "sentiment=bullish"

# Growth investing discussions about mega-cap stocks
rg -n -C 6 "investment_style=growth" content/processed/ | rg "market_cap=mega_cap"

# ETF recommendations from specific time period
rg -n -C 6 "asset_type=etf" content/processed/ --glob "**/2025-12/*.md" | rg "recommendation=buy\|recommendation=strong_buy"

# Bearish sentiment on specific narrative
rg -n -C 6 "narrative=ev_transition" content/processed/ | rg "sentiment=bearish"

Configuration

Custom Processing Prompts

You can override the default processing prompt for specific sources to tailor how content is processed:

const greptor = await createGreptor({
  basePath: './projects/investing/content',
  topic: 'Investing, stock market, financial, and macroeconomics',
  tagSchema: YOUR_TAG_SCHEMA,
  model: {
    provider: "@ai-sdk/openai",
    model: "gpt-5-mini",
  },
  customProcessingPrompts: {
    // Custom prompt for Twitter/X content
    'twitter': `
# INSTRUCTIONS
Process this Twitter/X content for investment research. Focus on:
- Investment signals, predictions, or analysis
- Key metrics and numbers mentioned
- Influencer sentiment and conviction level

# CONTENT TO PROCESS:
{CONTENT}
    `,
    
    // Custom prompt for SEC filings
    'sec_filing': `
# INSTRUCTIONS
Process this SEC filing with extreme precision:
- Preserve all financial figures, dates, and legal language exactly
- Extract key financial metrics and risk factors
- Maintain formal, factual tone throughout

# CONTENT TO PROCESS:
{CONTENT}
    `,
    
    // Custom prompt for earnings transcripts
    'earnings': `
# INSTRUCTIONS
Process this earnings call transcript:
- Extract forward-looking statements and guidance
- Preserve exact numbers, percentages, and ranges
- Capture management sentiment and key Q&A points

# CONTENT TO PROCESS:
{CONTENT}
    `,
  },
});

await greptor.start();

Usage notes:

  • Use {CONTENT} as a placeholder where the raw content will be inserted
  • Each custom prompt should include the placeholder exactly once
  • If no custom prompt is defined for a source, Greptor falls back to the default processing prompt
  • Custom prompts are matched against the document's source field (e.g., youtube, reddit, twitter)

Event Hooks

Greptor provides optional hooks to monitor document processing. These are useful for logging, metrics, progress tracking, or building custom UIs.

const greptor = await createGreptor({
  basePath: './projects/investing/content',
  topic: 'Investing, stock market, financial, and macroeconomics',
  tagSchema: YOUR_TAG_SCHEMA,
  model: {
    provider: "@ai-sdk/openai",
    model: "gpt-5-mini",
  },
  hooks: {
    onDocumentProcessingStarted: ({ source, publisher, label, documentsCount }) => {
      const count = documentsCount[source] || { fetched: 0, processed: 0 };
      console.log(`Processing: ${source}/${publisher}/${label} (${count.fetched} fetched, ${count.processed} processed)`);
    },

    onDocumentProcessingCompleted: (event) => {
      if (event.success) {
        const { source, publisher, label, documentsCount, elapsedMs, totalTokens } = event;
        const count = documentsCount[source] || { fetched: 0, processed: 0 };
        console.log(`✓ Completed: ${source}/${publisher}/${label} (${elapsedMs}ms, ${totalTokens} tokens, ${count.processed}/${count.fetched} processed)`);
      } else {
        const { source, publisher, label, error } = event;
        console.error(`✗ Failed: ${source}/${publisher}/${label} - ${error}`);
      }
    },
  },
});

await greptor.start();

Available Hooks

| Hook | When Called | Event Data | |------|-------------|------------| | onDocumentProcessingStarted | Before processing each document | source, publisher?, label, documentsCount: SourceCounts | | onDocumentProcessingCompleted | After processing succeeds or fails | Union type:Success: success: true, source, publisher?, label, documentsCount, elapsedMs, inputTokens, outputTokens, totalTokensFailure: success: false, error: string, source, publisher?, label |

Tag Schemas

Greptor requires a tag schema. For best results, provide a custom tag schema (or generate one with greptor generate tags).

Here's a comprehensive example for investment research:

const greptor = await createGreptor({
  basePath: './projects/investing/content',
  topic: 'Investing, stock market, financial, and macroeconomics',
  model: {
    provider: "@ai-sdk/openai",
    model: "gpt-5-mini",
  },
  tagSchema: [
    {
      name: 'company',
      type: 'string[]',
      description: 'Canonical company names in snake_case (e.g. apple, tesla, microsoft)',
    },
    {
      name: 'ticker',
      type: 'string[]',
      description: 'Canonical stock tickers, UPPERCASE only (e.g. AAPL, TSLA, MSFT, SPY)',
    },
    {
      name: 'sector',
      type: 'enum[]',
      description: 'GICS sector classification for stocks/companies discussed',
      enumValues: [
        'technology', 'healthcare', 'financials', 'consumer_discretionary',
        'consumer_staples', 'energy', 'utilities', 'industrials',
        'materials', 'real_estate', 'communication_services',
        'etf', 'index', 'commodity', 'bond', 'mixed'
      ],
    },
    {
      name: 'industry',
      type: 'string[]',
      description: 'Specific industry/sub-sector in snake_case (e.g. semiconductors, biotech, banking)',
    },
    {
      name: 'market_cap',
      type: 'enum[]',
      description: 'Market capitalization category of the company',
      enumValues: ['mega_cap', 'large_cap', 'mid_cap', 'small_cap', 'micro_cap'],
    },
    {
      name: 'investment_style',
      type: 'enum[]',
      description: 'Investment approach or style discussed',
      enumValues: [
        'value', 'growth', 'dividend', 'momentum', 'index',
        'passive', 'active', 'day_trading', 'swing_trading', 'long_term_hold'
      ],
    },
    {
      name: 'asset_type',
      type: 'enum[]',
      description: 'Type of financial instrument discussed',
      enumValues: [
        'stock', 'etf', 'mutual_fund', 'option', 'bond',
        'reit', 'commodity', 'crypto', 'cash'
      ],
    },
    {
      name: 'narrative',
      type: 'string[]',
      description: 'Investment or market narratives in snake_case (e.g. ai_boom, ev_transition, rate_cuts)',
    },
    {
      name: 'sentiment',
      type: 'enum[]',
      description: 'Directional stance on the stock/market',
      enumValues: ['bullish', 'bearish', 'neutral', 'mixed', 'cautious'],
    },
    {
      name: 'recommendation',
      type: 'enum[]',
      description: 'Analyst or influencer recommendation type',
      enumValues: ['strong_buy', 'buy', 'hold', 'sell', 'strong_sell'],
    },
  ],
});

await greptor.start();

License

MIT © Sergii Vashchyshchuk