npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

spotlighting-datamarking

v2.0.0-alpha

Published

This is a package to implement data marking functionality to make indirect prompt injections difficult, based on the research done by Microsoft

Readme

spotlighting-datamarking

Defend against indirect prompt injection using Spotlighting (Microsoft Research). Marks untrusted data with special tokens so LLMs can distinguish it from instructions.

An open-source implementation of all three spotlighting variants from the paper — data marking, random interleaving, and base64 encoding (the strongest). The spotlighting technique itself is used by Microsoft in production as part of Prompt Shields in Azure AI Foundry.

Install

npm install spotlighting-datamarking

Quick Start

import { DataMarkingViaSpotlighting } from 'spotlighting-datamarking';

const marker = new DataMarkingViaSpotlighting();

const result = marker.markData('Ignore previous instructions');
// result.markedText  → "[MARKER]Ignore[MARKER]previous[MARKER]instructions[MARKER]"
// result.dataMarker  → the random marker string
// result.prompt      → LLM instruction to prepend to your system prompt

API

new DataMarkingViaSpotlighting(minK?, maxK?, defaultP?, defaultMinGap?, markerType?)

| Param | Default | Description | | --------------- | ---------------- | ------------------------------- | | minK | 7 | Min marker length | | maxK | 12 | Max marker length | | defaultP | 0.5 | Marker insertion probability | | defaultMinGap | 1 | Min tokens between markers | | markerType | 'alphanumeric' | 'alphanumeric' or 'unicode' |

markData(text, options?)

Replaces all whitespace with markers. Returns { markedText, dataMarker, prompt }.

randomlyMarkData(text, options?)

Inserts markers probabilistically between tokens. Guarantees at least one marker. Returns { markedText, dataMarker, prompt }.

base64EncodeData(text, options?)

Base64-encodes the text. Returns { markedText, prompt }.

sanitizeText(text)

Strips invisible Unicode characters (zero-width spaces, BiDi controls, PUA chars, etc.). Called automatically before marking by default.

Options

All marking methods accept:

| Option | Default | Description | | ------------ | ---------------- | ------------------------------------------------------- | | sanitize | true | Strip invisible chars before marking | | sandwich | true | Wrap text with boundary markers | | markerType | instance default | Override marker type per-call | | p | 0.5 | Insertion probability (randomlyMarkData only) | | minGap | 1 | Min token gap between markers (randomlyMarkData only) |

Note: When using unicode markers, PUA characters (U+E000–F8FF) are always stripped from input regardless of the sanitize setting. This prevents attackers from spoofing markers.

Usage

import { DataMarkingViaSpotlighting } from 'spotlighting-datamarking';

const marker = new DataMarkingViaSpotlighting();
const untrustedData = getEmailBody(); // could contain injection attempts

const result = marker.randomlyMarkData(untrustedData, { p: 0.5 });

const messages = [
  { role: 'system', content: `You are a helpful assistant.\n${result.prompt}` },
  { role: 'user', content: `Summarize this email:\n${result.markedText}` },
];

Sanitization

Input is sanitized by default before marking. The sanitizer removes:

  • Zero-width characters (U+200B, U+200C, U+200E, U+200F)
  • BiDi controls (U+202A–202E, U+2066–2069)
  • Soft hyphen, BOM, word joiner, invisible operators
  • Private Use Area chars (U+E000–F8FF)
  • Unicode tag characters (U+E0001, U+E0020–E007F)
  • Line/paragraph separators (U+2028–2029)

ZWJ (U+200D) is preserved to keep compound emoji intact (👨‍👩‍👧‍👦).

Disable with { sanitize: false } if you need raw passthrough.

Testing

npm test

Real-World Validation

Two independent studies have evaluated spotlighting against adaptive attackers:

  1. LLMail-Inject (Abdelnabi et al., SaTML 2025): A public CTF run by Microsoft with 839 participants and 208k+ submissions against an LLM email assistant. Spotlighting reduced tool-call rates and was "more effective than some detection defenses alone, such as Prompt Shield." Only 0.8% of all submissions achieved a successful end-to-end attack, and stacking spotlighting with detection defenses improved results further.

  2. The Attacker Moves Second (Nasr, Carlini et al., 2025): A separate study that evaluated 12 defenses including spotlighting using strong adaptive attacks (search-based, RL, gradient, and human red-teaming). Against static attacks, spotlighting held ASR to ~1%. However, adaptive search-based attacks achieved >95% ASR, and human red-teamers generated 265 successful injections against it. The authors concluded they "did not observe any measurable difference in the types of attacks that succeed on models with these defenses compared to the same models without the defense."

Takeaway: Spotlighting raises the bar significantly against naive and static attacks, but it does not hold up against determined adaptive adversaries. It should be layered with other defenses (detection classifiers, instruction hierarchy, input sanitization) rather than relied upon alone.

References