npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@docamz/json-tokenizer

v1.4.0

Published

Lightweight JSON tokenizer with multiple encoding strategies (alphabetic, numeric, base64, UUID, custom) for compression and optimization

Readme

@docamz/json-tokenizer

🚀 Advanced JSON tokenizer with multiple encoding strategies for optimal compression and performance.

Lightweight and symmetric JSON tokenizer for compression and optimization. Generates consistent dictionaries and supports alphabetic, numeric, base64, UUID-based, and custom tokenization methods with symmetric encoding/decoding. Perfect for data compression, API optimization, and storage efficiency. Can be used standalone or with MessagePack or Gzip for enhanced compression.

Features

  • Multiple Tokenization Methods: Alphabetic, numeric, base64, UUID-short, and custom
  • Symmetric Encoding: Perfect reconstruction of original data
  • 🔒 Security First: Built-in prototype pollution protection
  • High Performance: Optimized algorithms with minimal overhead
  • TypeScript Support: Full type safety
  • ⚛️ React Hook API: First-class React support with useJsonTokenizer hook

Installation

npm install @docamz/json-tokenizer

Quick Start

import { generateDictionary, tokenize, detokenize, TokenizationMethod } from "@docamz/json-tokenizer";

const data = { name: "Alice", age: 30, city: "Paris" };
const keys = ["name", "age", "city"];

// Generate dictionary and tokenize
const dict = generateDictionary(keys);
const encoded = tokenize(data, dict.forward);
const decoded = detokenize(encoded, dict.reverse);

console.log(encoded); // { a: "Alice", b: 30, c: "Paris" }
console.log(decoded); // { name: "Alice", age: 30, city: "Paris" }

React Hook API

⚛️ React Hook for seamless integration with React applications

Installation (Hook)

The React hook requires React 16.8.0 or higher (for hooks support):

npm install @docamz/json-tokenizer react

Basic Usage

import { useJsonTokenizer, TokenizationMethod } from "@docamz/json-tokenizer/react";

function MyComponent() {
  const data = { name: "Alice", age: 30, city: "Paris" };

  const { tokenized, dictionary, isLoading, error } = useJsonTokenizer(data, {
    keys: ["name", "age", "city"],
    method: TokenizationMethod.ALPHABETIC
  });

  if (isLoading) return <div>Loading...</div>;
  if (error) return <div>Error: {error.message}</div>;

  return (
    <div>
      <h3>Original:</h3>
      <pre>{JSON.stringify(data, null, 2)}</pre>

      <h3>Tokenized:</h3>
      <pre>{JSON.stringify(tokenized, null, 2)}</pre>
    </div>
  );
}

Manual Control

Disable auto-tokenization and control when tokenization happens:

function ManualComponent() {
  const [data, setData] = useState({ name: "Alice", age: 30 });

  const { tokenized, tokenize, detokenize, reset } = useJsonTokenizer(data, {
    keys: ["name", "age"],
    autoTokenize: false  // Don't tokenize automatically
  });

  return (
    <div>
      <button onClick={tokenize}>Tokenize</button>
      <button onClick={() => detokenize(tokenized)}>Detokenize</button>
      <button onClick={reset}>Reset</button>
      <pre>{JSON.stringify(tokenized, null, 2)}</pre>
    </div>
  );
}

With Different Tokenization Methods

function TokenizationMethodsExample() {
  const data = { name: "Alice", age: 30, city: "Paris" };

  // Numeric tokenization
  const numeric = useJsonTokenizer(data, {
    keys: ["name", "age", "city"],
    method: TokenizationMethod.NUMERIC
  });

  // Base64 tokenization
  const base64 = useJsonTokenizer(data, {
    keys: ["name", "age", "city"],
    method: TokenizationMethod.BASE64
  });

  return (
    <div>
      <h3>Numeric: {JSON.stringify(numeric.tokenized)}</h3>
      <h3>Base64: {JSON.stringify(base64.tokenized)}</h3>
    </div>
  );
}

Custom Tokenization

function CustomTokenization() {
  const data = { name: "Alice", age: 30, city: "Paris" };

  const { tokenized } = useJsonTokenizer(data, {
    keys: ["name", "age", "city"],
    method: TokenizationMethod.CUSTOM,
    customGenerator: (index) => `field_${index}`
  });

  return <pre>{JSON.stringify(tokenized, null, 2)}</pre>;
}

Using Pre-generated Dictionary

For consistent tokenization across multiple components:

import { generateDictionary } from "@docamz/json-tokenizer";

// Generate dictionary once (e.g., in a context or constant)
const SHARED_DICTIONARY = generateDictionary(
  ["name", "age", "city"],
  { method: TokenizationMethod.ALPHABETIC }
);

function ComponentA() {
  const { tokenized } = useJsonTokenizer(
    { name: "Alice", age: 30, city: "Paris" },
    { dictionary: SHARED_DICTIONARY }
  );
  return <pre>{JSON.stringify(tokenized)}</pre>;
}

function ComponentB() {
  const { tokenized } = useJsonTokenizer(
    { name: "Bob", age: 25, city: "London" },
    { dictionary: SHARED_DICTIONARY }
  );
  return <pre>{JSON.stringify(tokenized)}</pre>;
}

React Hook API Reference

useJsonTokenizer(input, options)

Parameters:

  • input: any - The JSON data to tokenize
  • options: UseJsonTokenizerOptions - Configuration options

Options:

interface UseJsonTokenizerOptions {
  keys?: string[];              // Keys to include in dictionary generation
  dictionary?: Dictionary;       // Pre-generated dictionary (overrides keys)
  autoTokenize?: boolean;        // Auto-tokenize on input change (default: true)
  method?: TokenizationMethod;   // Tokenization method (default: ALPHABETIC)
  customGenerator?: (index: number) => string;  // For custom method
  paddingLength?: number;        // For padded numeric method
  prefix?: string;               // Prefix for tokens
}

Returns:

interface UseJsonTokenizerResult {
  tokenized: any;               // The tokenized data
  detokenized: any;             // The original/detokenized data
  dictionary: Dictionary | null; // The dictionary used
  isLoading: boolean;           // Loading state
  error: Error | null;          // Any error that occurred
  tokenize: () => void;         // Manually trigger tokenization
  detokenize: (data: any) => any; // Manually detokenize data
  reset: () => void;            // Reset state
}

SSR Considerations

The useJsonTokenizer hook is safe for Server-Side Rendering (SSR):

  • No browser-specific APIs are used
  • Works in Next.js, Remix, and other SSR frameworks
  • Dictionary generation happens synchronously
  • No side effects during initial render (when autoTokenize is false)

Example with Next.js:

// pages/tokenize.tsx
import { useJsonTokenizer, TokenizationMethod } from "@docamz/json-tokenizer/react";

export default function TokenizePage() {
  const data = { name: "Alice", age: 30 };

  const { tokenized, isLoading } = useJsonTokenizer(data, {
    keys: ["name", "age"],
    method: TokenizationMethod.ALPHABETIC
  });

  return <pre>{JSON.stringify(tokenized, null, 2)}</pre>;
}

TypeScript Support

The React hook exports are fully typed:

import type {
  UseJsonTokenizerOptions,
  UseJsonTokenizerResult,
  Dictionary,
  TokenizationMethod
} from "@docamz/json-tokenizer/react";

Tokenization Methods

1. Alphabetic (Default)

Perfect for maximum compression with readable tokens.

const dict = generateDictionary(keys, { method: TokenizationMethod.ALPHABETIC });
// Result: { name: "a", age: "b", city: "c" }

2. Numeric

Simple numeric tokens for databases and APIs.

const dict = generateDictionary(keys, { method: TokenizationMethod.NUMERIC });
// Result: { name: "0", age: "1", city: "2" }

3. Padded Numeric

Fixed-width numeric tokens for consistent formatting.

const dict = generateDictionary(keys, {
  method: TokenizationMethod.PADDED_NUMERIC,
  paddingLength: 3
});
// Result: { name: "000", age: "001", city: "002" }

4. Base64 Style

High-density encoding using alphanumeric + symbols.

const dict = generateDictionary(keys, { method: TokenizationMethod.BASE64 });
// Supports 64 characters: a-z, A-Z, 0-9, _, $
// Result: { name: "a", age: "b", city: "c", ... key63: "$", key64: "ba" }

5. UUID Short

Distributed-system friendly with timestamp + counter.

const dict = generateDictionary(keys, { method: TokenizationMethod.UUID_SHORT });
// Result: { name: "1a2b00", age: "1a2b01", city: "1a2b02" }
// Format: 4-char timestamp + 2-char counter (6 chars total)

6. Custom Generator

Define your own tokenization logic.

const dict = generateDictionary(keys, {
  method: TokenizationMethod.CUSTOM,
  customGenerator: (index) => `custom_${index}`
});
// Result: { name: "custom_0", age: "custom_1", city: "custom_2" }

7. Prefixed Tokens

Add prefixes to any tokenization method.

const dict = generateDictionary(keys, {
  method: TokenizationMethod.NUMERIC,
  prefix: "api_"
});
// Result: { name: "api_0", age: "api_1", city: "api_2" }

Advanced Usage

Complex Nested Objects

const complexData = {
  user: {
    profile: { firstName: "John", lastName: "Doe", email: "[email protected]" },
    settings: { theme: "dark", language: "en", notifications: true }
  },
  metadata: { version: "2.0", createdAt: "2023-01-01T00:00:00Z" }
};

const keys = [
  "user", "profile", "firstName", "lastName", "email",
  "settings", "theme", "language", "notifications",
  "metadata", "version", "createdAt"
];

const dict = generateDictionary(keys, { method: TokenizationMethod.ALPHABETIC });
const encoded = tokenize(complexData, dict.forward);
const decoded = detokenize(encoded, dict.reverse);

// Perfect reconstruction guaranteed
console.log(decoded === complexData); // true

Arrays of Objects

const arrayData = {
  users: [
    { name: "Alice", age: 30, role: "admin" },
    { name: "Bob", age: 25, role: "user" },
    { name: "Charlie", age: 35, role: "moderator" }
  ]
};

const keys = ["users", "name", "age", "role"];
const dict = generateDictionary(keys, { method: TokenizationMethod.BASE64 });
const encoded = tokenize(arrayData, dict.forward);
// Result: { a: [{ b: "Alice", c: 30, d: "admin" }, ...] }

Dictionary Serialization

// Save dictionary for later use
const dict = generateDictionary(keys, { method: TokenizationMethod.ALPHABETIC });
const serialized = JSON.stringify(dict);
fs.writeFileSync('dictionary.json', serialized);

// Load and use dictionary
const loaded = JSON.parse(fs.readFileSync('dictionary.json', 'utf-8'));
const decoded = detokenize(encodedData, loaded.reverse);

🔒 Security Features

Built-in protection against prototype pollution and security vulnerabilities:

import { tokenize, sanitizeObject, isSafeKey } from "@docamz/json-tokenizer";

// Automatic protection against dangerous keys
const maliciousData = { name: "Alice", "__proto__": { isAdmin: true } };
tokenize(maliciousData, dict.forward); // Throws: "Dangerous key detected"

// Sanitize untrusted input
const cleanData = sanitizeObject(untrustedInput, { throwOnUnsafeKeys: true });

// Validate keys manually
if (isSafeKey(keyName)) {
  // Safe to use
}

Protected against:

  • __proto__ pollution
  • constructor manipulation
  • Dangerous property access
  • Control character injection

📖 See SECURITY.md for complete security guide

API Reference

Core Functions

| Function | Parameters | Description | |----------|------------|-------------| | generateDictionary(keys, options?) | keys: string[], options?: TokenizationOptions | Generate tokenization dictionary | | tokenize(obj, dict) | obj: any, dict: Record<string, string> | Replace keys with tokens | | detokenize(obj, reverse) | obj: any, reverse: Record<string, string> | Restore original keys |

Tokenization Methods reference

| Method | Description | Use Case | |--------|-------------|----------| | ALPHABETIC | a, b, c, ..., z, aa, ab | Maximum compression, readable | | NUMERIC | 0, 1, 2, 3, ... | Simple, database-friendly | | PADDED_NUMERIC | 000, 001, 002, ... | Fixed-width, sortable | | BASE64 | a-z, A-Z, 0-9, _, $ | High-density encoding | | UUID_SHORT | timestamp + counter | Distributed systems | | CUSTOM | User-defined function | Custom requirements |

TokenizationOptions

interface TokenizationOptions {
  method?: TokenizationMethod;           // Default: ALPHABETIC
  customGenerator?: (index: number) => string; // For CUSTOM method
  paddingLength?: number;                // Default: 4 (for PADDED_NUMERIC)
  prefix?: string;                       // Default: "" (empty)
}

Sequence Generators

Access individual generators directly:

import {
  generateAlphabeticSequence,
  generateNumericSequence,
  generatePaddedNumericSequence,
  generateBase64Sequence,
  generateUuidShortSequence
} from "@docamz/json-tokenizer";

// Use specific generators
const token1 = generateAlphabeticSequence(0); // "a"
const token2 = generateBase64Sequence(63);    // "$"
const token3 = generateUuidShortSequence(0);  // "1a2b00"

Benchmarks

  • model1.json (83.8 KB file) 2679 Row - 216 unique keys dict
  • model2.json (134.4 KB file) 4069 Row - 216 unique keys dict
  • model3.json (148.7 KB file) 4424 Row - 216 unique keys dict
  • model4.json (33.1 file) 1056 Row - 216 unique keys dict

this files contains complex nested structures and arrays, their values are multiples(boolean, url, text, numbers..) to simulate real-world JSON data.

Compression Ratios

Compression Benchmarks for Different Tokenization Methods on model3.json (148.7 KB file) 4424 Row - 216 unique keys :

| Method | Dict Gen | Tokenize | Total | Original | Tokenized | Compression | Saved | |--------|----------|----------|-------|----------|-----------|-------------|-------| | alphabetic | 0.00 ms | 112.28 ms | 112.28 ms | 72.14 KB | 49.26 KB | 31.71% | 22.87 KB | | base64 | 0.00 ms | 111.24 ms | 111.24 ms | 72.14 KB | 48.70 KB | 32.49% | 23.44 KB | | numeric | 0.00 ms | 113.88 ms | 113.88 ms | 72.14 KB | 51.52 KB | 28.58% | 20.62 KB | | padded_numeric | 0.00 ms | 127.31 ms | 127.31 ms | 72.14 KB | 56.87 KB | 21.17% | 15.27 KB | | uuid_short | 0.00 ms | 113.00 ms | 113.00 ms | 72.14 KB | 63.82 KB | 11.53% | 8.31 KB |

FASTEST TOKENIZATION:

  1. base64: 111.24 ms
  2. alphabetic: 112.28 ms
  3. uuid_short: 113.00 ms
  4. numeric: 113.88 ms
  5. padded_numeric: 127.31 ms

BEST COMPRESSION:

  1. base64: 32.49% (23.44 KB saved)
  2. alphabetic: 31.71% (22.87 KB saved)
  3. numeric: 28.58% (20.62 KB saved)
  4. padded_numeric: 21.17% (15.27 KB saved)
  5. uuid_short: 11.53% (8.31 KB saved)

MOST SPACE SAVED:

  1. base64: 23.44 KB
  2. alphabetic: 22.87 KB
  3. numeric: 20.62 KB
  4. padded_numeric: 15.27 KB
  5. uuid_short: 8.31 KB

EFFICIENCY SCORE (Compression/Time):

  1. base64: 0.2921 (32.49% in 111.24 ms)
  2. alphabetic: 0.2824 (31.71% in 112.28 ms)
  3. numeric: 0.2510 (28.58% in 113.88 ms)
  4. padded_numeric: 0.1663 (21.17% in 127.31 ms)
  5. uuid_short: 0.1020 (11.53% in 113.00 ms)

Benchmark Results

| Model | Raw Size | Raw→Tok | Tok+Gzip | MsgPack | Tok+Msg | Tok+Msg+Gzip | Tok Enc/Dec | Msg Enc/Dec | Tok+Msg Enc/Dec | |-------|----------|---------|----------|---------|---------|--------------|-------------|-------------|------------------| | model1.json | 83.8 KB | 64.6% | 55.5% | 60.3% | 76.3% | 55.5% | 86.5/74.6 ms | 1.3/0.8 ms | 87.1/72.3 ms | | model2.json | 134.4 KB | 65.7% | 51.8% | 61.3% | 77.2% | 55.7% | 103.9/105.9 ms | 0.3/0.4 ms | 104.2/106.5 ms | | model3.json | 148.7 KB | 66.9% | 56.5% | 62.7% | 78.0% | 57.4% | 113.4/116.5 ms | 0.3/0.3 ms | 113.7/115.0 ms | | model4.json | 33.1 KB | 69.9% | 45.6% | 64.1% | 82.2% | 46.2% | 28.1/28.6 ms | 0.2/0.1 ms | 28.1/27.5 ms | | Average | - | 66.8% | 52.4% | 62.1% | 78.4% | 53.7% | 82.9/81.4 ms | 0.53/0.40 ms | 83.28/80.3 ms |

Key:

  • Raw→Tok: Tokenization compression ratio
  • Tok+Gzip: Tokenized with Gzip compression
  • MsgPack: MessagePack compression ratio
  • Tok+Msg: Combined tokenization + MessagePack
  • Tok+Msg+Gzip: Best compression (tokenization + MessagePack + Gzip)
  • Enc/Dec: Encoding/Decoding performance in milliseconds

License

MIT License © 2025 DocAmz