npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

find-urls

v0.3.4

Published

An intelligent, zero-dependency utility to find and normalize URLs in text, with advanced punctuation and context handling.

Readme

find-urls

NPM version License Bundle size Build status

An intelligent, zero-dependency utility to find and normalize URLs in text, with advanced punctuation and context handling.

Finding URLs in text is easy, but finding them in messy, human-written text can be hard. find-urls goes beyond simple regex matching by using a two-stage process: it first finds URL-like candidates, then uses a set of smart, context-aware heuristics to clean, validate, and normalize them into a ready-to-use list.

This means it can correctly handle URLs surrounded by punctuation (like (example.com) or even ...https://example.com/path-with-parentheses-(at-the-end-of-a-sentence).), filter out email addresses and common filenames (like notes.txt), add a default protocol to bare domains and protocol-relative domains (like example.comhttps://example.com/), and more.

Features

  • Intelligent Punctuation Handling: Correctly handles URLs wrapped in matching brackets (e.g., (en.wikipedia.org/wiki/Stack_(data_structure))) and strips trailing punctuation.
  • Zero Dependencies: Lightweight, secure, and easy to audit.
  • Context-Aware Filtering: Excludes email addresses and ignores common URL-like strings like file names if they don't have a protocol or path.
  • URL Normalization & Validation: Adds a default protocol to bare domains and uses the native URL constructor for validation.
  • Fully Typed: Written in TypeScript for type safety.
  • Isomorphic/Universal: Works in Node.js and the browser.
  • Configurable: Tailor the extraction logic with a set of options (see below).

Installation

npm install find-urls

Usage

The default export is a single function that takes a string of text and returns an array of UrlMatch objects.

import findUrls from 'find-urls';

const text = `
  Check out my site (example.com) and this cool link:
  https://en.wikipedia.org/wiki/Stack_(data_structure)!
  My email is [email protected].
`;

const urls = findUrls(text);

console.log(urls);

Output:

[
  {
    "raw": "example.com",
    "normalized": "https://example.com/",
    "index": 29
  },
  {
    "raw": "https://en.wikipedia.org/wiki/Stack_(data_structure)",
    "normalized": "https://en.wikipedia.org/wiki/Stack_(data_structure)",
    "index": 67
  }
]

Return Value: UrlMatch

Each item in the returned array is an object with the following properties:

  • raw: The original URL string as it was found in the text (after removing surrounding punctuation).
  • normalized: A fully qualified, normalized URL.
  • index: The starting index of the match within the input string.

API

findUrls(text, [options])

options

An optional object to configure the extraction behavior.

  • requireProtocol?: boolean

    • If true, only URLs that start with a protocol (e.g., https://) or are protocol-relative (//) will be extracted.

    • @default false

      const text = 'https://a.com and b.com';
      findUrls(text, { requireProtocol: true }); // → Finds only 'https://a.com'
  • defaultProtocol?: string

    • The protocol to prepend to URLs without one.

    • @default "https"

      const urls = findUrls('example.com', { defaultProtocol: 'http' });
      // urls[0].normalized → 'http://example.com/'
  • allowedProtocols?: string[] | null

    • An array of allowed protocol schemes. If null, any protocol is allowed.

    • @default ["http", "https"]

      const text = 'http://a.com, ftp://b.com';
      findUrls(text, { allowedProtocols: ['http'] }); // → Finds only 'http://a.com'
  • allowedBareHostnames?: string[]

    • An array of bare hostnames to allow if a potential URL does not have a TLD (e.g., localhost).

    • @default ["localhost"]

      const text = 'Connect to dev-server:3000';
      findUrls(text, { allowedBareHostnames: ['dev-server'] }); // → Finds 'dev-server:3000'
  • extensionsRequiringProtocol?: string[]

    • An array of file extensions that are considered part of a valid URL only if a protocol or path is present. This prevents treating a filename like notes.txt as a domain. This option replaces the default list.

    • @default [ ... (a list of common file extensions)]

      const text = 'This is a test.js file.';
      findUrls(text); // → Finds nothing
      findUrls(text, { extensionsRequiringProtocol: [] }); // → Finds 'test.js'
  • deduplicate?: boolean

    • If true, URLs will be deduplicated based on their normalized form.

    • @default false

      const text = 'example.com and https://example.com/';
      findUrls(text, { deduplicate: true }); // → Finds only the first instance

See Also

  • get-urls: A high-level URL extractor that uses url-regex-safe for matching but adds normalization, extraction of URLs from query strings, and more.
  • url-regex-safe: A low-level regex generator whose primary focus is security (ReDoS protection via re2), accuracy (real TLDs), and configurability at the regex level. It returns a RegExp object, leaving post-processing to the user.

Feature & Philosophy Comparison

| Feature / Aspect | find-urls (This Package) | get-urls | url-regex-safe | |---------------------------|-----------------------------------------------------|------------------------------------------------------|------------------------------------------| | Core Method | 2-Stage: Regex + JS Post-Processing | Wraps url-regex-safe + normalize-url | Regex | | Punctuation Handling | ✅ Sophisticated (context-aware, balanced). | ❌ Basic (inherits from url-regex-safe) | ❌ Basic (character exclusion) | | Context Filtering | ✅ Built-in (emails, filenames) | ❌ No | ❌ No | | Output | ✅ Structured objects { raw, normalized, index } | ✅ Set of normalized URL strings | ✅ A RegExp object | | Dependencies | ✅ Zero-dependency | ❌ url-regex-safe, super-regex, normalize-url | ❌ tlds, ip-regex, optionally re2 | | URL Normalization | ✅ Built-in | ✅ Via normalize-url dependency | ❌ None | | TLD Validation | ❌ Generic pattern | ✅ TLD database | ✅ TLD database | | IPv6 Support | ❌ No | ✅ Yes | ✅ Yes | | ReDoS Security | ❌ Standard RegExp | ✅ Via super-regex dependency | ✅ Via optional re2 engine |

When to Use Which

  • Use find-urls (this library) when:

    • You are extracting URLs from messy, human-written text and need context-aware handling of surrounding punctuation.
    • You need to filter out things that look like URLs but aren't (e.g., readme.md, [email protected]).
    • You want a lightweight, zero-dependency solution.
  • Use get-urls or url-regex-safe when:

    • You are handling simpler input cases or want more control over the extraction logic.
    • You need a highly configurable, security-hardened regex pattern as a building block.
    • You are concerned about optimizing for speed and preventing ReDoS attacks when processing untrusted input.

Limitations

  • No IPv6 Support: Does not match IPv6 addresses.
  • No TLD Validation: Uses a generic pattern (\p{L}{2,}) to match TLDs—this is more flexible for new or internal TLDs it can result in false positives.
  • ReDoS Risk: Uses RegExp for matching URLs and may be vulnerable to ReDoS if you are processing untrusted, malicious input.

Contributing

Contributions are welcome! Please feel free to open an issue or submit a pull request.

License

MIT