npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

subtexty

v0.1.0

Published

Extract clean plain-text from subtitle files

Downloads

7

Readme

Subtexty

Extract clean plain-text from subtitle files with intelligent deduplication and format support.

License: MIT

Overview

Subtexty is a lightweight, open-source CLI tool and TypeScript library that extracts clean, deduplicated plain-text from subtitle files. It intelligently handles styling tags, timing metadata, and removes redundant content while preserving the original text flow.

Features

  • 🎯 Smart Text Extraction: Removes timing, positioning, and style tags while preserving content
  • 🔄 Intelligent Deduplication: Eliminates redundant lines and prefix duplicates
  • 🌐 Multi-Format Support: WebVTT (.vtt), SRT (.srt), TTML (.ttml/.xml), SBV (.sbv), JSON3 (.json/.json3)
  • 🔤 Encoding Handling: UTF-8 by default with fallback encoding detection and manual override support
  • 📝 Dual Interface: Both CLI tool and programmatic library
  • Performance: Stream processing for memory efficiency
  • 🧪 Well Tested: 80%+ test coverage with comprehensive test suite

Installation

NPM (Global CLI)

npm install -g subtexty

NPM (Project Dependency)

npm install subtexty

Quick Start

CLI Usage

# Extract text to stdout
subtexty input.vtt

# Save to file
subtexty input.srt -o clean-text.txt

# Specify encoding
subtexty input.vtt --encoding utf-8

Library Usage

import { extractText } from 'subtexty';

// Basic extraction
const cleanText = await extractText('subtitles.vtt');
console.log(cleanText);

// With options
const cleanText = await extractText('subtitles.srt', {
  encoding: 'utf-8'
});

CLI Reference

Basic Usage

subtexty [options] <input-file>

Arguments

  • input-file - Subtitle file to process (required)

Options

  • -v, --version - Display version number
  • -o, --output <file> - Output file (default: stdout)
  • --encoding <encoding> - File encoding (default: utf-8)
  • -h, --help - Display help for command

Examples

# Basic text extraction
subtexty movie-subtitles.vtt

# Multiple file processing with output
subtexty episode1.srt -o episode1-text.txt
subtexty episode2.srt -o episode2-text.txt

# Handle different encodings
subtexty foreign-film.srt --encoding latin1

# Pipe to other tools
subtexty subtitles.vtt | wc -w  # Word count
subtexty subtitles.vtt | grep "keyword"  # Search

Exit Codes

  • 0 - Success
  • 1 - File error (not found, permissions, etc.)
  • 2 - Parsing error (invalid format, corrupted data)

Library API

extractText(filePath, options?)

Extracts clean text from a subtitle file.

Parameters:

  • filePath (string) - Path to the subtitle file
  • options (object, optional) - Extraction options
    • encoding (string) - File encoding (default: utf-8)

Returns:

  • Promise<string> - Clean extracted text

Example:

import { extractText } from 'subtexty';

try {
  const text = await extractText('./subtitles.vtt');
  console.log(text);
} catch (error) {
  console.error('Extraction failed:', error.message);
}

Error Handling

import { extractText, isSubtextyError } from 'subtexty';

try {
  const text = await extractText('file.vtt', { encoding: 'utf-8' });
  // Process text...
} catch (error) {
  if (isSubtextyError(error)) {
    // Handle specific subtexty errors
    switch (error.code) {
      case 'FILE_NOT_FOUND':
        console.error('Subtitle file does not exist');
        break;
      case 'UNSUPPORTED_FORMAT':
        console.error('File format not supported');
        break;
      case 'FILE_NOT_READABLE':
        console.error('Cannot read the file');
        break;
      default:
        console.error('Extraction error:', error.message);
    }
  } else {
    console.error('Unexpected error:', error.message);
  }
}

Supported Formats

| Format | Extensions | Description | |--------|------------|-------------| | WebVTT | .vtt | Web Video Text Tracks | | SRT | .srt | SubRip Subtitle | | TTML | .ttml, .xml | Timed Text Markup Language | | SBV | .sbv | YouTube SBV format | | JSON3 | .json, .json3 | JSON-based subtitle format |

Text Processing Features

Tag Removal

Removes HTML, XML, and styling tags:

Input:  <b>Bold text</b> and <i>italic</i>
Output: Bold text and italic

Entity Conversion

Converts HTML entities:

Input:  Tom &amp; Jerry say &quot;Hello&quot;
Output: Tom & Jerry say "Hello"

Smart Deduplication

Removes redundant content intelligently:

Exact Duplicates:

Input:  Same line
        Same line
        Different line
Output: Same line
        Different line

Prefix Removal:

Input:  I love coding
        I love coding with TypeScript
        Amazing results
Output: I love coding with TypeScript
        Amazing results

Whitespace Normalization

Cleans up spacing issues:

Input:  Multiple   spaces    and	tabs
Output: Multiple spaces and tabs

Development

Prerequisites

  • Node.js ≥14.0.0
  • pnpm (recommended) or npm

Installation

git clone https://github.com/bytesnack114/subtexty.git
cd subtexty
pnpm install

Development Scripts

# Development
pnpm dev input.vtt              # Run CLI in development mode
pnpm build                      # Build TypeScript
pnpm clean                      # Clean build artifacts

# Testing
pnpm test                       # Run test suite
pnpm test:watch                 # Watch mode testing
pnpm test:coverage              # Coverage report

# Code Quality
pnpm lint                       # Run ESLint
pnpm lint:fix                   # Fix linting issues

Project Structure

subtexty/
├── src/
│   ├── cli.ts              # CLI interface
│   ├── constants.ts        # Application constants
│   ├── errors.ts           # Custom error classes
│   ├── index.ts            # Library entry point
│   ├── validation.ts       # Input validation
│   ├── cli/                # CLI-specific modules
│   ├── parsers/            # Format-specific parsers
│   ├── types/              # TypeScript definitions
│   ├── utils/              # Text cleaning utilities
│   └── __tests__/          # Test suite
├── coverage/               # Coverage Report (if run `pnpm test:coverage`)
├── dist/                   # Built files (if run `pnpm build`)
└── example/                # Example input files

Contributing

Quick Contribution Steps

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make changes and add tests
  4. Run tests with coverage: pnpm test:coverage
  5. Commit changes: git commit -m 'Add amazing feature'
  6. Push to branch: git push origin feature/amazing-feature
  7. Open a Pull Request

Testing

Subtexty has comprehensive test coverage:

# Run all tests
pnpm test

# Generate coverage report
pnpm test:coverage

# View coverage report
open coverage/lcov-report/index.html

Test Categories

  • Unit Tests: Individual component testing
  • Integration Tests: End-to-end workflow testing
  • Parser Tests: Format-specific parsing validation
  • CLI Tests: Command-line interface testing

Performance

  • Memory Efficient: Stream processing for large files
  • Fast Processing: Optimized text cleaning pipeline
  • Minimal Dependencies: Only essential packages included

Troubleshooting

Common Issues

File Not Found Error

Error: Input file not found: subtitle.vtt

Solution: Check file path and permissions

Unsupported Format

Error: Unsupported file format: .txt

Solution: Use supported subtitle formats (.vtt, .srt, .ttml, .sbv, .json)

Encoding Issues

# Specify encoding manually
subtexty file.srt --encoding latin1

Permission Errors

# Check file permissions
ls -la subtitle-file.vtt
chmod +r subtitle-file.vtt

License

MIT License - see LICENSE.md file for details.

Support