npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@gorgom123/structify

v1.0.0

Published

AI-powered data extraction library - normalize messy text into structured JSON

Readme

@gorgom123/structify

npm version TypeScript License

AI-powered data extraction library — Normalize messy text into structured, predictable JSON.

What is Structify?

Structify is a production-ready developer library that helps you turn unreliable, messy text into clean, structured data using AI. Think of it as a semantic normalization layer for your applications.

This is NOT:

  • ❌ A prompt playground
  • ❌ A raw AI wrapper
  • ❌ An OCR engine

This IS:

  • ✅ A developer utility
  • ✅ A semantic normalization layer
  • ✅ Infrastructure-grade tooling
  • ✅ Schema-first data extraction

Installation

npm install @gorgom123/structify

Quick Start

import { extract, init } from '@gorgom123/structify';

// Initialize with your OpenRouter API key
init({
  openRouterApiKey: process.env.OPENROUTER_API_KEY
});

// Messy input (OCR, legacy API, logs, etc.)
const messyInput = `
Inv No: 88921
Total Rp. 1.250.000
Date 03/12/24
PT Maju Jaya
`;

// Define what you want
const result = await extract(messyInput, {
  invoice_number: 'string',
  invoice_date: 'date',
  total_amount: 'number',
  vendor_name: 'string'
});

console.log(result);
/* Output:
{
  "invoice_number": "88921",
  "invoice_date": "2024-12-03",
  "total_amount": 1250000,
  "vendor_name": "PT Maju Jaya"
}
*/

Core Features

🎯 Schema-First

Define the structure you want. Structify handles the rest.

🔒 Type-Safe

Full TypeScript support with strict type definitions.

🛡️ Production-Ready

  • Input validation
  • Retry logic with exponential backoff
  • Comprehensive error handling
  • Configurable limits

🌍 Smart Normalization

  • Dates: Multiple formats → ISO-8601
  • Numbers: Currency symbols, separators → Clean numeric values
  • Booleans: "yes"/"no"/"1"/"0" → true/false
  • Missing data → null (never makes up data)

🔧 Developer-Focused

  • No prompt engineering required
  • Deterministic behavior (temperature=0)
  • Debug mode available
  • Predictable errors

API Reference

extract(text, schema, options?)

Extract structured data from text.

Parameters:

  • text (string): The messy input text
  • schema (Schema): Expected output structure
  • options (ExtractOptions, optional):
    • model?: string - Override AI model
    • timeout?: number - Request timeout (ms)
    • maxRetries?: number - Retry attempts
    • debug?: boolean - Enable debug logging

Returns: Promise<T> - Normalized data matching schema

init(config)

Initialize the library with configuration.

Parameters:

  • config (StructifyConfig):
    • openRouterApiKey: string - Required. Your OpenRouter API key
    • defaultModel?: string - Default: nvidia/nemotron-nano-12b-v2-vl:free (free, reliable)
    • maxInputSize?: number - Default: 50000 characters
    • maxSchemaDepth?: number - Default: 5 levels
    • maxFieldCount?: number - Default: 100 fields
    • timeout?: number - Default: 30000ms
    • maxRetries?: number - Default: 3 attempts

generateMessyText(options)

Generate messy text for testing and demos.

Parameters:

  • options (MessyTextOptions):
    • domain: 'invoice' | 'receipt' | 'shipping' | 'log'
    • language?: 'en' | 'id' - Default: 'en'
    • chaosLevel?: 'low' | 'medium' | 'high' - Default: 'medium'

Returns: string - Generated messy text

Supported Schema Types

| Type | Description | Example Output | |------|-------------|----------------| | string | Text value | "PT Maju Jaya" | | number | Numeric value | 1250000 | | boolean | True/false | true | | date | ISO-8601 date | "2024-12-03" | | array | Array of items | [{...}, {...}] | | object | Nested object | { key: value } |

Nested Schema Example

const result = await extract(orderText, {
  order_id: 'string',
  customer: {
    name: 'string',
    email: 'string'
  },
  items: [
    {
      name: 'string',
      price: 'number'
    }
  ],
  total: 'number'
});

Error Handling

All errors are typed with specific error codes:

import { StructifyError, ErrorCode } from '@gorgom123/structify';

try {
  await extract(text, schema);
} catch (error) {
  if (error instanceof StructifyError) {
    switch (error.code) {
      case ErrorCode.INVALID_SCHEMA:
        // Schema validation failed
        break;
      case ErrorCode.INVALID_INPUT:
        // Input text invalid or too large
        break;
      case ErrorCode.AI_ERROR:
        // OpenRouter request failed
        break;
      case ErrorCode.PARSE_ERROR:
        // JSON parsing failed
        break;
      case ErrorCode.CONFIG_ERROR:
        // Missing or invalid configuration
        break;
    }
  }
}

Configuration

Environment Variables

# Required
OPENROUTER_API_KEY=your_key_here

# Optional - specify model (default: nvidia/nemotron-nano-12b-v2-vl:free)
OPENROUTER_MODEL=openai/gpt-4o-mini

Get your API key at OpenRouter

Available Models:

  • nvidia/nemotron-nano-12b-v2-vl:free (default, free, reliable)
  • openai/gpt-4o-mini (paid, more accurate)
  • anthropic/claude-3-haiku (paid)
  • google/gemini-2.0-flash-exp:free (free alternative)
  • meta-llama/llama-3.2-3b-instruct:free (free alternative)
  • See more at OpenRouter Models

Programmatic Configuration

init({
  openRouterApiKey: 'your_key',
  defaultModel: 'openai/gpt-4o-mini', // Override free model
  maxRetries: 5,
  timeout: 60000
});

Use Cases

  • 📄 Invoice/Receipt Processing: Extract structured data from scanned documents
  • 🔍 OCR Normalization: Clean up OCR output into structured format
  • 📊 Legacy API Response Cleaning: Normalize inconsistent API responses
  • 📝 Log Parsing: Extract structured data from application logs
  • 🌐 Multi-format Data Integration: Unify data from various sources

Limitations

[!IMPORTANT] AI-Based: Results depend on AI model performance. Always validate critical data.

  • Maximum input size: 50,000 characters (configurable)
  • Maximum schema depth: 5 levels (configurable)
  • Maximum field count: 100 fields (configurable)
  • Requires OpenRouter API key
  • Not 100% accurate - validate critical extractions

Examples

See the examples/ directory for more:

Development

# Install dependencies
npm install

# Build the library
npm run build

# Type check
npm run typecheck

# Development mode (watch)
npm run dev

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT © Bobi Gunardi

Author

Bobi Gunardi

Positioning

Structify is designed to feel like Zod + AI — a serious, infrastructure-grade developer tool for semantic data normalization. It's perfect for:

  • Backend services processing unstructured input
  • Frontend applications handling messy API responses
  • Serverless functions normalizing data
  • ETL pipelines cleaning data sources

Built for developers who need predictable, structured data from unpredictable sources.