@konemono/nostr-content-parser

v0.7.0

Published

4 months ago

Parse Nostr content into tokens

0High
0Medium
0Low

konemono

nostr parser nip19 content token

Nostr Content Parser

Parse Nostr content into structured tokens with full TypeScript support.

Installation

npm install @konemono/nostr-content-parser

Usage

import {
  parseContent,
  parseContentAsync,
  TokenType,
  NIP19SubType,
} from "@konemono/nostr-content-parser";

const content = "Hello npub1xyz... Check :custom_emoji: #nostr";
const tags = [["emoji", "custom_emoji", "https://example.com/emoji.png"]];

// Synchronous parsing (recommended for most cases)
const tokens = parseContent(content, tags);
console.log(tokens);

// Asynchronous parsing with URL type detection
const tokensWithUrlTypes = await parseContentAsync(content, tags);
console.log(tokensWithUrlTypes);

Token Types

Main Types

TokenType.TEXT - Plain text
TokenType.NIP19 - NIP-19 entities (npub, nprofile, note, nevent, naddr, nsec)
TokenType.URL - URLs
TokenType.CUSTOM_EMOJI - Custom emojis
TokenType.HASHTAG - Hashtags
TokenType.LN_ADDRESS - Lightning addresses
TokenType.LN_URL - Lightning URLs
TokenType.LNBC - Lightning invoices
TokenType.EMAIL - Email addresses
TokenType.BITCOIN_ADDRESS - Bitcoin addresses
TokenType.CASHU_TOKEN - Cashu tokens
TokenType.NIP_IDENTIFIER - NIP identifiers
TokenType.LEGACY_REFERENCE - Legacy references (#[0])
TokenType.RELAY - Relay URLs (wss://)

NIP19 Sub Types

NIP19SubType.NPUB - Public key
NIP19SubType.NPROFILE - Profile
NIP19SubType.NOTE - Note
NIP19SubType.NEVENT - Event
NIP19SubType.NADDR - Address
NIP19SubType.NSEC - Secret key

API

Core Functions

`parseContent(content, tags, options?)`

Synchronous parsing - Recommended for most use cases.

Parameters:

content: string – Input content to parse.
tags: string[][] – Optional tag array (used for custom emoji, legacy references, etc).
options: object – Optional settings:
- includeNostrPrefixOnly?: boolean If true (default), only tokens starting with nostr: will be included for NIP-19. If false, plain NIP-19 tokens (without prefix) will also be parsed.
- hashtagsFromTagsOnly?: boolean If true (default), only hashtags that match a t tag will be parsed as hashtags. If false, all #-prefixed words are treated as hashtags.

Returns: Token[]

URL type detection is performed based on file extensions only (fast and lightweight).

`parseContentAsync(content, tags, options?)`

Asynchronous parsing - Use when you need comprehensive URL type detection.

Parameters:

content: string – Input content to parse.
tags: string[][] – Optional tag array (used for custom emoji, legacy references, etc).
options: object – Optional settings:
- includeNostrPrefixOnly?: boolean (same as sync version)
- hashtagsFromTagsOnly?: boolean (same as sync version)

Returns: Promise<Token[]>

URL type detection includes HTTP HEAD requests to determine content type when file extension is not available.

Filter Functions

filterTokens(tokens, types) - Filter tokens by type
filterTokensBy(tokens, predicate) - Filter tokens by custom predicate

NIP19 Functions

getNip19Entities(tokens) - Get all NIP-19 entities
filterNip19BySubType(tokens, subType) - Filter NIP-19 by sub type
getNpubs(tokens) - Get npub tokens
getNprofiles(tokens) - Get nprofile tokens
getNotes(tokens) - Get note tokens
getNevents(tokens) - Get nevent tokens
getNaddrs(tokens) - Get naddr tokens
getNsecs(tokens) - Get nsec tokens

Other Entity Functions

getUrls(tokens) - Get URLs
getCustomEmojis(tokens) - Get custom emojis
getHashtags(tokens) - Get hashtags
getValidatedHashtags(tokens) - Get hashtags validated by t tags
getLightningAddresses(tokens) - Get Lightning addresses
getLightningUrls(tokens) - Get Lightning URLs
getLightningInvoices(tokens) - Get Lightning invoices
getBitcoinAddresses(tokens) - Get Bitcoin addresses
getCashuTokens(tokens) - Get Cashu tokens
getEmails(tokens) - Get email addresses
getNipIdentifiers(tokens) - Get NIP identifiers
getLegacyReferences(tokens) - Get legacy references

Utility Functions

resetPatterns() - Reset regex patterns (call if needed)

Examples

Basic Parsing (Synchronous)

// Fast, synchronous parsing
const tokens = parseContent("Check out npub1xyz... and #nostr!");
// Returns tokens with NIP19 and HASHTAG types

URL Type Detection (Asynchronous)

// Comprehensive URL type detection
const content =
  "Check this image: https://example.com/photo.png and https://example.com/video";
const tokens = await parseContentAsync(content);

const urls = getUrls(tokens);
urls.forEach((url) => {
  console.log(`URL: ${url.content}`);
  if (url.metadata?.scheme) {
    console.log(`Scheme: ${url.metadata.scheme}`); // "https", "http"
  }
  if (url.metadata?.type) {
    console.log(`Type: ${url.metadata.type}`); // "image", "video", "audio"
  }
});

Working with NIP19 Tokens

const tokens = parseContent("Check nostr:npub1xyz... and note1abc...", [], {
  includeNostrPrefixOnly: false, // Include plain NIP19 tokens
});

const nip19Tokens = getNip19Entities(tokens);
nip19Tokens.forEach((token) => {
  console.log(`Type: ${token.metadata.subType}`); // "npub", "note", etc.
  console.log(`Has nostr: prefix: ${token.metadata.hasNostrPrefix}`);
  console.log(`Plain NIP19: ${token.metadata.plainNip19}`);
});

// Filter specific NIP19 types
const npubs = getNpubs(tokens);
const notes = filterNip19BySubType(tokens, NIP19SubType.NOTE);

Hashtag Filtering

const content = "#nostr #dev";
const tags = [["t", "nostr"]];
const tokens = parseContent(content, tags, {
  hashtagsFromTagsOnly: true,
});

const hashtags = getHashtags(tokens);
hashtags.forEach((tag) => {
  console.log(`Tag: ${tag.metadata.tag}`);
  console.log(`Validated: ${tag.metadata.validated}`);
});

// Only "#nostr" will be returned as a validated HASHTAG token

Custom Emoji Handling

const content = "Hello :custom_emoji: and :unknown:!";
const tags = [["emoji", "custom_emoji", "https://example.com/emoji.png"]];
const tokens = parseContent(content, tags);

const emojis = getCustomEmojis(tokens);
emojis.forEach((emoji) => {
  console.log(`Name: ${emoji.metadata.name}`);
  if (emoji.metadata.hasMetadata) {
    // TypeScript knows url exists here
    console.log(`URL: ${emoji.metadata.url}`);
  } else {
    console.log("No metadata found in tags");
  }
});

Legacy References

const content = "See #[0] and #[1] for details";
const tags = [
  ["p", "npub1xyz..."],
  ["e", "note1abc..."],
];
const tokens = parseContent(content, tags);

const references = getLegacyReferences(tokens);
references.forEach((ref) => {
  console.log(`Index: ${ref.metadata.tagIndex}`);
  console.log(`Type: ${ref.metadata.tagType}`); // "p", "e", etc.
  console.log(`ID: ${ref.metadata.referenceId}`);
  console.log(`Reference Type: ${ref.metadata.referenceType}`); // "npub", "note", etc.
});

Performance Comparison

// Fast: Extension-based URL type detection only
const fastTokens = parseContent(content);

// Comprehensive: Includes HTTP requests for unknown URLs
const detailedTokens = await parseContentAsync(content);

Token Structure

Each token has the following base structure:

interface TokenBase {
  type: TokenType;
  content: string;
  start: number;
  end: number;
}

Tokens are discriminated unions - the type field determines which metadata fields are available, providing full TypeScript type safety.

Metadata Examples

NIP19 Token:

{
  type: "nip19",
  content: "nostr:npub1xyz...",
  start: 0,
  end: 69,
  metadata: {
    subType: "npub",
    hasNostrPrefix: true,
    plainNip19: "npub1xyz..."
  }
}

Custom Emoji Token (with metadata):

{
  type: "custom_emoji",
  content: ":pepe:",
  start: 0,
  end: 6,
  metadata: {
    name: "pepe",
    url: "https://example.com/pepe.png",
    hasMetadata: true
  }
}

Custom Emoji Token (without metadata):

{
  type: "custom_emoji",
  content: ":unknown:",
  start: 0,
  end: 9,
  metadata: {
    name: "unknown",
    hasMetadata: false
  }
}

URL Token:

{
  type: "url",
  content: "https://example.com/photo.png",
  start: 0,
  end: 29,
  metadata: {
    scheme: "https",
    type: "image"  // Optional: only if detected
  }
}

Hashtag Token:

{
  type: "hashtag",
  content: "#nostr",
  start: 0,
  end: 6,
  metadata: {
    tag: "nostr",
    validated: true  // true if matched with t tag
  }
}

Legacy Reference Token:

{
  type: "legacy_reference",
  content: "#[0]",
  start: 0,
  end: 4,
  metadata: {
    tagIndex: 0,
    tagType: "p",
    referenceId: "npub1xyz...",
    referenceType: "npub"  // "npub", "note", "naddr", or "unknown"
  }
}

TypeScript Support

Version 0.7.0 introduces full TypeScript type safety with discriminated unions:

import { parseContent, TokenType } from "@konemono/nostr-content-parser";

const tokens = parseContent("Hello :pepe:");

tokens.forEach((token) => {
  if (token.type === TokenType.CUSTOM_EMOJI) {
    // TypeScript knows metadata structure here
    console.log(token.metadata.name); // Always available

    if (token.metadata.hasMetadata) {
      // TypeScript knows url exists here
      console.log(token.metadata.url);
    }
  }

  if (token.type === TokenType.NIP19) {
    // TypeScript knows these fields exist
    console.log(token.metadata.plainNip19);
    console.log(token.metadata.subType);
    console.log(token.metadata.hasNostrPrefix);
  }
});

Commands

# Run tests
npm test

# Run tests once
npm run test:run

# Build
npm run build

# Pre-publish check
npm run prepublishOnly

# Publish to npm
npm publish

Changelog

0.7.0 (Breaking Changes)

Type System Overhaul:

Complete TypeScript rewrite with discriminated unions for type-safe token handling
Token metadata is now strongly typed based on token type
Breaking: Custom emoji tokens are now always created (with hasMetadata flag to distinguish between tagged and untagged emojis)
Breaking: Legacy reference tokens are only created when tags exist (out-of-bounds references are treated as text)

New Features:

LEGACY_REFERENCE token type for parsing #[0] style references
RELAY token type for WebSocket relay URLs
Full metadata type safety - TypeScript now knows exactly which fields exist for each token type

Improvements:

Custom emoji handling: Both tagged and untagged emojis are now recognized
URL type detection restored with proper typing ("image" | "video" | "audio")
Hashtag metadata now includes validated field to distinguish t-tag validated hashtags
Better discriminated union support for token filtering functions

Migration Guide:

// Before (0.6.x)
const emoji = tokens.find((t) => t.type === "custom_emoji");
if (emoji.metadata.url) {
  /* ... */
}

// After (0.7.0)
const emoji = tokens.find((t) => t.type === "custom_emoji");
if (emoji.metadata.hasMetadata) {
  // TypeScript knows url exists here
  console.log(emoji.metadata.url);
}