npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

deduplino

v0.0.9

Published

CLI tool for deduplicating lino format

Readme

Deduplino

A CLI tool for deduplicating lino format files by identifying patterns in repeated link references and replacing them with numbered references for improved readability and reduced file size.

Installation

Using Bun (Recommended)

# Install globally with bun
bun install -g deduplino

# Or from source
git clone <repository-url>
cd deduplino
bun install
bun run build

Using NPM (Fallback)

npm install -g deduplino

Quick Start

# Basic usage
deduplino -i input.lino -o output.lino

# From stdin to stdout
echo "(test link)\n(test link)" | deduplino --piped-input

# Process with different threshold
deduplino --deduplication-threshold 0.5 -i input.lino

How It Works

Deduplino analyzes lino files to find patterns in link references and creates optimized representations using three pattern types.

Auto-Escape Feature

The --auto-escape option automatically converts non-lino text (like logs) into valid lino format:

  1. First attempt: Escape only references containing colons (timestamps, URLs, field names)
  2. Second attempt: Escape references with special characters (!@#$%^&*+=|\\:;?/<>.,)
  3. Final fallback: Escape all references except simple punctuation and quoted strings

Example log processing:

Input:  2025-07-25T21:32:46Z updateReferences id: a43fad436d79
Output: '2025-07-25T21:32:46Z' updateReferences 'id:' a43fad436d79

Pattern Types

1. Exact Duplicates

Links that appear identically multiple times.

Input:

(first second)
(first second)
(first second)

Output:

1: first second
1
1
1

2. Prefix Patterns

Links that share common beginnings.

Input:

(this is a link of cat)
(this is a link of tree)

Output:

1: this is a link of
1 cat
1 tree

3. Suffix Patterns

Links that share common endings.

Input:

(foo ends here)
(bar ends here)

Output:

1: ends here
foo 1
bar 1

Advanced Pattern Detection

The tool handles complex nested structures and can identify patterns in structured links:

Input:

(this is) a link
(this is) a link

Output:

1: this is
1 a link
1 a link

Algorithm

  1. Parse input using the Protocols.Lino parser
  2. Filter links with 2+ references (deduplicatable content)
  3. Identify Patterns:
    • Exact duplicates
    • Common prefixes between link pairs
    • Common suffixes between link pairs
    • Special handling for structured links
  4. Score & Select patterns by (frequency × pattern_length)
  5. Apply top patterns based on threshold
  6. Format output using library's formatLinks function

CLI Options

| Option | Short | Description | Default | |--------|--------|-------------|---------| | [input-file] | | Input file as positional argument | - | | --input | -i | Input file path (alternative to positional argument) | - | | --output | -o | Output file path (smart naming if not provided) | - | | --deduplication-threshold | -p | Percentage of patterns to apply (0-1) | 0.2 | | --auto-escape | | Automatically escape input to make it valid lino format | false | | --piped-input | | Read from stdin (use when piping data) | false | | --fail-on-parse-error | | Exit with code 1 if input cannot be parsed as lino format | false | | --detect-auto-escape-edge-cases | | Analyze log file line-by-line to find cases that auto-escape cannot fix | false | | --help | -h | Show help information | - |

Examples

Basic File Processing

# Deduplicate a file (smart output naming)
deduplino document.lino
# Creates document.deduped.lino

# Deduplicate with custom output
deduplino document.lino -o compressed.lino

# Traditional flag syntax
deduplino -i document.lino -o compressed.lino

# Process from pipeline
cat document.lino | deduplino --piped-input > compressed.lino

# Quick stdin processing
echo "(test)\n(test)" | deduplino --piped-input

Smart Output Naming

When you don't specify an output file, deduplino automatically generates one:

# File with .lino extension
deduplino input.lino           # → input.deduped.lino

# File without .lino extension  
deduplino server.log          # → server.log.deduped.lino
deduplino data.txt            # → data.txt.deduped.lino

Threshold Control

# Conservative (default) - top 20% of patterns
deduplino document.lino

# More aggressive - top 50% of patterns
deduplino --deduplication-threshold 0.5 -i document.lino

# Maximum deduplication - all patterns
deduplino --deduplication-threshold 1.0 -i document.lino

Auto-Escape for Logs

# Process log files that aren't valid lino format
deduplino --auto-escape -i server.log -o processed.lino

# Handle timestamps and special characters
echo "2025-07-25T21:32:46Z error: connection failed" | deduplino --auto-escape --piped-input
# Output: '2025-07-25T21:32:46Z' 'error:' connection failed

Pipeline Usage

# Chain with other tools
some-tool | deduplino --piped-input | other-tool

# Multiple processing steps
cat input.lino | deduplino --piped-input -p 0.3 | tee intermediate.lino | final-processor

Error Handling and Validation

# Validate lino format - exit with code 1 if invalid
deduplino --fail-on-parse-error -i document.lino

# Auto-escape with validation - useful for CI/CD pipelines
deduplino --auto-escape --fail-on-parse-error -i log.txt
# This will attempt auto-escape, but fail if it still can't parse the result

# Check if auto-escape worked properly
echo "problematic: input" | deduplino --piped-input --auto-escape --fail-on-parse-error

Edge Case Detection and Analysis

# Analyze a log file to find problematic lines
deduplino --detect-auto-escape-edge-cases -i server.log

# Find edge cases in piped input
cat application.log | deduplino --piped-input --detect-auto-escape-edge-cases

# Example output:
# 🔍 Found 3 edge case(s) that auto-escape cannot fix:
# 
# 📂 Unbalanced Parentheses (2 cases):
#    Line 42: "))((("
#    Line 156: "))((()))(("
# 
# 📂 Only Punctuation (1 cases):  
#    Line 89: "( ( ( ) )"
#
# 📊 Statistics:
#    Total lines processed: 1000
#    Failed lines: 3
#    Success rate: 99.7%

Pattern Selection Strategy

The --deduplication-threshold parameter controls which patterns are applied:

  • 0.2 (default): Apply top 20% of patterns for optimal readability/compression balance
  • 0.5: More aggressive deduplication, may impact readability
  • 1.0: Maximum deduplication, applies all found patterns

Patterns are ranked by: frequency × pattern_length

Development

Setup

bun install

Testing

# Run all tests
bun test

# Watch mode
bun test --watch

Building

# Build for production
bun run build

# Development mode with file watching
bun run dev

Project Structure

src/
├── index.ts          # CLI interface and argument parsing
├── deduplicator.ts   # Core deduplication algorithm
tests/
└── deduplicator.test.ts  # Comprehensive test suite (27 tests)

Algorithm Details

Pattern Finding

  • Exact: Map-based counting of identical content
  • Prefix/Suffix: Pairwise comparison with reference-level matching
  • Structured: Special handling for nested link structures like (this is) a link

Pattern Scoring

Patterns are scored by count × pattern.split(' ').length to favor:

  • High-frequency patterns (appear many times)
  • Longer patterns (more compression benefit)

Overlap Prevention

Selected patterns are filtered to prevent overlap - each link content can only be part of one pattern.

Dependencies

License

This is free and unencumbered software released into the public domain.

See LICENSE for full details or visit https://unlicense.org

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass: bun test
  5. Submit a pull request

Links