npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@veldica/prose-analyzer

v1.3.0

Published

Deterministic prose style metrics for measuring variety, density, repetition, and narrative texture. Designed for AI editorial pipelines and writing tools.

Readme

@veldica/prose-analyzer

NPM Version License: MIT Dependency Count

Deterministic prose style metrics for measuring variety, density, repetition, and narrative texture. While readability (via @veldica/readability) measures accessibility, prose analysis measures the sophistication and vividness of writing.

Designed for high-precision editorial pipelines, creative writing tools, and content auditing systems where deterministic, explainable signals are required.

Why Prose Analysis?

In an era of AI-generated content, understanding the "texture" of prose is critical:

  • Measuring "AI-ness": Detect the flat lexical fingerprints of LLMs by measuring moving-average diversity and repetition ratios.
  • "Show, Don't Tell" Auditing: Use Scene Density and Sensory Language proxies to mathematically evaluate the vividness of narrative sections.
  • Deterministic Consistency: Unlike LLM-based critiques, these scores are 100% repeatable, local-first, and require zero network calls.
  • Length-Independent Metrics: Implements an $O(N)$ MATTR (Moving Average TTR) algorithm to accurately and efficiently measure vocabulary richness in texts of any length, from single paragraphs to full-length novels.

Features

  • High-Performance Lexical Analysis: Optimized $O(N)$ MATTR algorithm and Map-based frequency tracking for massive datasets.
  • Narrative Texture Heuristics: Deterministic proxies for "active" vs. "passive" writing styles.
  • Linguistic Signal Detection: Categorization of sensory and abstract language usage with "Sanitize Once" performance.
  • Resilient Logic: Hardened against regex backtracking and synchronous blocking on large inputs.
  • Tree-Shakable: ESM-first design with modular exports for lexical, narrative, and categorization tools.

Core Capabilities

1. Lexical Variety & Density

Measures how "rich" the vocabulary is and the ratio of content-carrying words vs. structural filler (stopwords).

  • TTR (Type-Token Ratio): Basic unique word ratio.
  • MATTR (Moving-Average TTR): A more robust measure of diversity that isn't biased by text length.
  • Lexical Density: The percentage of content words (nouns, verbs, adjectives) using high-precision filtering.

2. Repetition Identification

Automatically detects overused content words while ignoring common stopwords. Provides both raw counts and frequency ratios.

3. Narrative Texture

Heuristic-based signals for analyzing the "pacing" of prose:

  • Dialogue Ratio: The balance between spoken word and narration.
  • Scene Density: A proxy for "active" writing, calculated by correlating dialogue frequency, high sentence length variation, and short sentence clusters.
  • Exposition Density: Detects "wall of text" explanation patterns based on paragraph length and sentence complexity.

4. Linguistic Signal Detection

  • Sensory Language: Detection of words related to sight, sound, smell, taste, and touch.
  • Abstract Concepts: Identification of non-concrete terminology (e.g., "policy", "concept", "theory").

Installation

npm install @veldica/prose-analyzer

Quick Start

Comprehensive Analysis (Recommended)

The easiest way to get a full stylistic profile is using the analyzeDocument bridge with the Veldica tokenizer.

import { tokenize } from "@veldica/prose-tokenizer";
import { analyzeDocument } from "@veldica/prose-analyzer";

const text = `The sun was a bright, heavy disk. "It is time," he said. 
The sky turned from blue to black as the shadows grew long.`;

// 1. Prepare text with the Veldica tokenizer
const doc = tokenize(text);

// 2. Run combined analysis using the bridge
const results = analyzeDocument(doc);

console.log(results.lexical.lexical_diversity_mattr); // 0.85 (High variety)
console.log(results.narrative.scene_density_proxy);   // 0.45 (Active pacing)
console.log(results.narrative.sensory_term_density); // 0.23 (Vivid imagery)

Advanced Usage (Manual Mapping)

If you are using a custom tokenizer, you can call analyzeProse directly by providing the required counts.

import { analyzeProse } from "@veldica/prose-analyzer";

const results = analyzeProse(
  sentences, 
  words, 
  sentenceWordCounts, 
  paragraphWordCounts,
  { 
    lexical: { windowSize: 100 } 
  }
);

Advanced Lexical Options

You can tune the analysis (e.g., changing the MATTR sliding window) to suit specific content types.

import { analyzeLexical } from "@veldica/prose-analyzer";

const lexical = analyzeLexical(words, {
  windowSize: 100,      // Larger window for long-form prose
  topRepeatedCount: 5,  // Return fewer, more significant repetitions
});

Signal Detection

Use the underlying categorization tools for custom filtering logic.

import { isSensory, isAbstract } from "@veldica/prose-analyzer";

isSensory("bright"); // true
isAbstract("policy"); // true

API Reference

analyzeDocument(doc, options?): ProseAnalysisResults

The recommended entry point for users of @veldica/prose-tokenizer. Automatically maps nested counts and runs a full analysis.

  • doc: A TokenizedDocument from the Veldica tokenizer.
  • options: (Optional) Object containing lexical and narrative configurations.

analyzeProse(sentences, words, sentenceWordCounts, paragraphWordCounts, options?): ProseAnalysisResults

Low-level combined entry point. Useful if you are using a custom tokenizer.

  • options: (Optional) Object containing lexical and narrative configurations.

analyzeLexical(words: string[], options?: LexicalAnalysisOptions): LexicalBase

Performs frequency and diversity analysis.

  • windowSize: (Default: 50) The sliding window for MATTR calculation.
  • topRepeatedCount: (Default: 10) Number of repeated words to return.
  • isStopword: (Optional) Custom function to override the default stopword detector.

analyzeNarrative(sentences, words, sentenceWordCounts, paragraphWordCounts, options?): FictionMetrics

Calculates narrative pacing and texture.

  • options.isSensory: (Optional) Custom function for sensory signal detection.
  • options.isAbstract: (Optional) Custom function for abstract concept detection.

isSensory(word: string, isNormalized?: boolean): boolean

Checks if a word belongs to the sensory language category. Set isNormalized to true if the word is already lowercased and stripped of non-alpha characters.

isAbstract(word: string, isNormalized?: boolean): boolean

Checks if a word belongs to the abstract concepts category. Set isNormalized to true if the word is already lowercased and stripped of non-alpha characters.

Practical Use Cases

  • AI Editorial Pipelines: Score AI-generated prose for "human-like" lexical variety.
  • Creative Writing Tools: Provide real-time feedback on "show vs. tell" using scene density proxies.
  • Content Quality Assurance: Identify repetitive or exposition-heavy sections in technical documentation.
  • Style Consistency: Ensure multiple authors maintain a consistent narrative texture across a large project.

Limitations

  • Language Support: Heuristics and categorizations (sensory/abstract) are currently optimized for English prose.
  • Dialogue Detection: Relies on common quotation mark heuristics. Complex formatting (e.g., non-standard punctuation in experimental fiction) may require custom options.
  • Contextual Nuance: Like all deterministic metrics, these scores measure "signals" rather than "meaning." They should be used to support, not replace, human editorial judgment.

Philosophy

We believe prose analysis should be deterministic and explainable. Unlike LLM-based scores which are opaque and probabilistic, @veldica/prose-analyzer provides raw linguistic signals that are:

  1. Fast: Executed in milliseconds without network calls.
  2. Private: No data ever leaves your machine.
  3. Consistent: The same text always yields the same result.
  4. Signal-Focused: We measure the mechanics of the prose (how it is built) rather than attempting to interpret its meaning.

The Veldica Suite

This package is part of the Veldica ecosystem of modular, high-performance linguistic tools:

Contributing

Contributions are welcome! Please follow our established technical mandates:

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/NewMetric)
  3. Ensure all changes pass the unit test suite (npm test).
  4. Push to the Branch (git push origin feature/NewMetric)
  5. Open a Pull Request

Ownership & Authority

This package is maintained by Veldica as a core part of our writing analysis platform. It is built for production environments that demand mathematical precision and deterministic reliability.

License

MIT © Veldica