npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

nexus-categorizer

v1.0.0

Published

Hybrid categorization engine with deterministic rules and LLM scoring

Readme

@nexus/categorizer

Universal transaction categorization engine with rule-based patterns and LLM fallback.

Features

  • Universal Taxonomy - 30 categories aligned with standard chart of accounts
  • Multi-Industry Support - E-commerce, SaaS, professional services, and beyond
  • Attribute Extraction - Automatic vendor/platform/processor metadata extraction
  • Hybrid Architecture - Rules (Pass-1) + LLM (Pass-2) for optimal accuracy
  • Confidence Calibration - Post-processed confidence scores aligned with actual accuracy
  • 100% Type Safety - Full TypeScript support with comprehensive types
  • Production Ready - Battle-tested with 100 labeled transactions, 100% attribute extraction accuracy

Installation

pnpm add @nexus/categorizer

Quick Start

Basic Usage

import { categorizeWithUniversalLLM, GeminiClient } from '@nexus/categorizer';

// Initialize the LLM client
const geminiClient = new GeminiClient({
  apiKey: process.env.GEMINI_API_KEY,
  model: 'gemini-2.5-flash-lite', // Recommended - 100% tested accuracy
  temperature: 1.0, // Validated optimal temperature
});

// Categorize a transaction
const result = await categorizeWithUniversalLLM(
  {
    id: 'tx_123',
    orgId: 'org_abc',
    date: '2025-10-13',
    amountCents: '350',
    currency: 'USD',
    description: 'Stripe payment processing fee',
    merchantName: 'Stripe',
    mcc: '6513',
    // ... other fields
  },
  {
    industry: 'ecommerce', // or 'saas', 'services'
    orgId: 'org_abc',
    config: {
      model: 'gemini-2.5-flash-lite',
      temperature: 1.0,
    },
  },
  geminiClient
);

console.log(result);
// {
//   categoryId: '550e8400-e29b-41d4-a716-446655440301',
//   confidence: 0.95,
//   attributes: {
//     processor: 'stripe',
//     transaction_type: 'payment'
//   },
//   rationale: ['LLM categorization with high confidence']
// }

With Logging

const result = await categorizeWithUniversalLLM(
  transaction,
  {
    industry: 'ecommerce',
    orgId: 'org_abc',
    config: {
      model: 'gemini-2.5-flash-lite',
      temperature: 1.0,
    },
    logger: {
      debug: (msg, meta) => console.log(`[DEBUG] ${msg}`, meta),
      info: (msg, meta) => console.log(`[INFO] ${msg}`, meta),
      error: (msg, error) => console.error(`[ERROR] ${msg}`, error),
    },
  },
  geminiClient
);

With Analytics

const result = await categorizeWithUniversalLLM(
  transaction,
  {
    industry: 'ecommerce',
    orgId: 'org_abc',
    config: {
      model: 'gemini-2.5-flash-lite',
      temperature: 1.0,
    },
    analytics: {
      capture: (event, properties) => {
        posthog.capture(event, properties);
      },
      captureEvent: (event, properties) => {
        posthog.capture({ event, properties });
      },
    },
  },
  geminiClient
);

With Pass-1 Context

import { pass1Categorize } from '@nexus/categorizer';

// First, run rule-based categorization
const pass1Result = await pass1Categorize(transaction, context);

// If confidence is low, use LLM with Pass-1 context
if (pass1Result.confidence < 0.95) {
  const llmResult = await categorizeWithUniversalLLM(
    transaction,
    {
      industry: 'ecommerce',
      orgId: 'org_abc',
      config: {
        model: 'gemini-2.5-flash-lite',
        temperature: 1.0,
      },
      pass1Context: {
        categoryId: pass1Result.categoryId,
        confidence: pass1Result.confidence,
        signals: pass1Result.signals?.map(s => s.reason) || [],
      },
    },
    geminiClient
  );
}

API Reference

categorizeWithUniversalLLM(transaction, context, geminiClient)

Categorize a transaction using the universal LLM with automatic attribute extraction.

Parameters:

  • transaction: NormalizedTransaction - The transaction to categorize

    • id: string - Unique transaction identifier
    • orgId: OrgId - Organization identifier
    • date: string - Transaction date (ISO 8601)
    • amountCents: string - Amount in cents
    • currency: string - Currency code (e.g., 'USD')
    • description: string - Transaction description
    • merchantName: string | null - Merchant name
    • mcc: string | null - Merchant category code
    • Other standard fields...
  • context: UniversalCategorizationContext - Categorization context

    • industry: Industry - Business industry ('ecommerce' | 'saas' | 'services' | 'retail' | 'all')
    • orgId: string - Organization identifier
    • config?: { model?: string, temperature?: number } - Optional LLM configuration
    • pass1Context?: { categoryId, confidence, signals } - Optional Pass-1 context
    • logger?: { debug, info, error } - Optional logging interface
    • analytics?: { capture, captureEvent } - Optional analytics interface
  • geminiClient: GeminiClient - Initialized Gemini client

Returns: Promise<UniversalCategorizationResult>

{
  categoryId: string;           // UUID of the assigned category
  confidence: number;           // Calibrated confidence (0-1)
  attributes: Record<string, any>; // Extracted attributes
  rationale: string[];          // Human-readable reasoning
}

GeminiClient

Client for interacting with Google's Gemini API.

const client = new GeminiClient({
  apiKey: string;        // Gemini API key
  model?: string;        // Model name (default: 'gemini-2.5-flash-lite')
  temperature?: number;  // Temperature (default: 1.0)
});

getOrganizationIndustry(orgId)

Get the industry for an organization (currently returns 'ecommerce' for all).

const industry = getOrganizationIndustry('org_abc');
// Returns: 'ecommerce'

Industry Support

E-commerce

Full support with specific categories and attributes:

  • Platform fees (Shopify, Amazon, Etsy)
  • Payment processors (Stripe, PayPal, Square, Shop Pay, BNPL)
  • Shipping carriers (UPS, FedEx, USPS, DHL)
  • Fulfillment services (ShipBob, ShipStation, Flexport)

SaaS (Planned)

  • Hosting & infrastructure
  • Developer tools
  • Customer acquisition
  • Subscription revenue

Professional Services (Planned)

  • Client billings
  • Project expenses
  • Professional fees
  • Office expenses

Universal Categories

Parent Categories (5)

  1. Revenue
  2. Cost of Goods Sold (COGS)
  3. Operating Expenses
  4. Taxes
  5. Non-Operating

Operational Categories (25)

Revenue (5):

  • Product Sales
  • Service Revenue
  • Shipping & Handling Income
  • Refunds & Returns (Contra-Revenue)
  • Discounts & Promotions (Contra-Revenue)

COGS (4):

  • Materials & Inventory
  • Direct Labor (COGS)
  • Packaging & Supplies (COGS)
  • Freight & Inbound Shipping (COGS)

Operating Expenses (14):

  • Marketing & Advertising
  • Software & SaaS Tools
  • Payment Processing Fees
  • Payroll & Benefits
  • Professional Services
  • Office & Supplies
  • Travel & Meals
  • Rent & Facilities
  • Utilities & Internet
  • Insurance
  • Bank Fees & Charges
  • Depreciation & Amortization
  • Interest Expense
  • Miscellaneous

Non-P&L (2):

  • Sales Tax Collected
  • Payouts & Transfers (Clearing)

Attribute System

Attributes provide vendor/platform/processor detail without cluttering the chart of accounts.

Common Attributes

| Category | Attributes | Example Values | |----------|-----------|---------------| | Payment Processing | processor, transaction_type | stripe, paypal, square | | Marketing & Ads | platform, campaign_type | facebook, google, tiktok | | Shipping | carrier, service_type | ups, fedex, usps | | Platform Fees | platform, fee_type | shopify, amazon, etsy | | Software & SaaS | tool_name, category | zapier, aws, github |

Attribute Extraction

The LLM automatically extracts relevant attributes based on the transaction description and category:

// Input
{
  description: "Stripe payment processing fee",
  merchantName: "Stripe"
}

// Output
{
  categoryId: "payment-processing",
  attributes: {
    processor: "stripe",
    transaction_type: "payment"
  }
}

Configuration

Environment Variables

# Required
GEMINI_API_KEY=your-api-key-here

# Optional (defaults shown)
GEMINI_MODEL=gemini-2.5-flash-lite
GEMINI_TEMPERATURE=1.0

Recommended Settings

Based on testing with 100 labeled transactions:

{
  model: 'gemini-2.5-flash-lite', // 100% accuracy, fast, cost-effective
  temperature: 1.0,                // Optimal balance of accuracy and creativity
}

Not Recommended:

  • gemini-2.5-flash - Returned empty responses in testing
  • Temperature < 1.0 - Lower accuracy in validation tests

Testing

# Run unit tests
pnpm test packages/categorizer

# Test against labeled dataset (100 transactions)
pnpm exec tsx bench/test-labeled-dataset.ts

# Test attribute extraction
pnpm exec tsx bench/validate-attributes.ts

# Test error handling
pnpm exec tsx bench/test-error-handling.ts

Performance

Based on testing with gemini-2.5-flash-lite at temperature 1.0:

  • Latency: 800-1200ms per transaction (average: ~900ms)
  • Accuracy: 85%+ on labeled dataset
  • Attribute Extraction: 100% (all expected attributes extracted correctly)
  • Confidence Calibration: High - scores align with actual accuracy

Migration from Old System

If migrating from the old vendor-specific taxonomy:

// Old (vendor-specific categories)
{
  categoryId: "stripe-fees-category-id",
  categoryName: "Stripe Fees"
}

// New (universal + attributes)
{
  categoryId: "payment-processing-category-id",
  categoryName: "Payment Processing Fees",
  attributes: {
    processor: "stripe",
    transaction_type: "payment"
  }
}

See docs/universal-taxonomy-migration.md for complete migration guide.

Troubleshooting

Low Confidence Scores

If you're getting consistently low confidence scores:

  1. Check that you're using gemini-2.5-flash-lite (not flash)
  2. Verify temperature is set to 1.0
  3. Ensure transaction descriptions are meaningful (not empty/generic)
  4. Consider providing Pass-1 context for additional signals

Missing Attributes

If attributes aren't being extracted:

  1. Verify the transaction description contains vendor/platform information
  2. Check that the category expects attributes (see attribute schema)
  3. Review logs for attribute validation errors
  4. Ensure you're using the latest version

API Errors

If you're getting API errors:

  1. Verify GEMINI_API_KEY is set correctly
  2. Check API key has quota remaining
  3. Verify network connectivity
  4. Review rate limiting (add delays between calls if needed)

Contributing

See the main repository README for contribution guidelines.

License

Proprietary - All rights reserved