npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

react-indexable

v0.5.0

Published

Automatically extract semantic content from rendered UI and inject it back as crawlable, SEO-safe markup

Downloads

505

Readme

React Indexable

Serving the same content to humans, search engines, and AI crawlers the right way.

npm version npm downloads License: MIT


The Problem

When we build websites, we think about human users first design, images, interactions, and layout. But today, the web is read by more than just humans:

  • Search engines crawl pages to index and rank them
  • AI crawlers read content to understand, summarize, and reuse it

The problem is that humans, search engines, and AI crawlers do not read the web the same way.

What Goes Wrong?

A typical web page contains images, buttons, layout containers, and JavaScript-driven interactions. This works great for humans, but machines care about content clarity, not UI.

  • Search engines must parse unnecessary UI elements before reaching core text
  • Important information may be hidden behind JavaScript interactions
  • AI crawlers struggle to extract clean, structured information
  • Educational content, tutorials, and Q&A become harder to interpret

The content exists, but machines don't read it efficiently.


The Solution

Indexable solves this by separating content from presentation. It automatically extracts semantic content from your rendered UI and injects it as hidden, crawlable markup.

Core Principle: Same Content, Different Presentation

  • Humans → Rich UI (images, buttons, interactions)
  • Search Engines → Clean, fast, text-focused HTML
  • AI Crawlers → Structured Markdown

As long as the meaning and data remain the same, this approach is SEO-safe and future-proof.

Key Features

  • Automatic Extraction: Identifies and preserves meaningful content (headings, paragraphs, lists, code blocks)
  • UI Noise Removal: Strips away interactive elements (buttons, forms, navigation)
  • Image Handling: Converts images to text representations using alt attributes
  • Deterministic: No AI, no guessing pure, predictable content transformation
  • SEO Safe: Hidden with CSS only (display: none), no cloaking, no user-agent detection
  • Zero Configuration: Simple API with sensible defaults

Installation

npm install react-indexable

Quick Start

Basic Example

Let's take a simple math question page:

import { Indexable } from 'react-indexable';

export default function MathQuestion() {
  const [showAnswer, setShowAnswer] = useState(false);

  return (
    <>
      <main id="content">
        <h1>Math Question</h1>
        <img src="math.png" alt="Math illustration" />
        <p><strong>Question:</strong> What is 8 + 4?</p>
        <button onClick={() => setShowAnswer(true)}>Show Answer</button>
        {showAnswer && <p>Answer: 12</p>}
      </main>

      <Indexable source="#content" />
    </>
  );
}

What humans see:

  • Rich UI with images and interactive button

What gets extracted for crawlers:

# Math Question

[Image: Math illustration]

**Question:** What is 8 + 4?

Answer: 12

The button is automatically removed. Images are converted to text. Only semantic content remains.


Advanced Example

Component-Based Application

import { Indexable } from 'react-indexable';

function Hero() {
  return (
    <section className="hero-section">
      <h1>Product Name</h1>
      <p>Revolutionary solution for modern web development.</p>
      <button onClick={handleClick}>Get Started</button>
      <img src="hero.jpg" alt="Product dashboard screenshot" />
    </section>
  );
}

function Features() {
  return (
    <section className="features-grid">
      <h2>Key Features</h2>
      <div className="feature-cards">
        <div className="card">
          <img src="icon1.svg" alt="Performance icon" />
          <h3>Fast Performance</h3>
          <p>Optimized for speed</p>
        </div>
        <div className="card">
          <img src="icon2.svg" alt="Integration icon" />
          <h3>Easy Integration</h3>
          <p>Works with existing tools</p>
        </div>
      </div>
    </section>
  );
}

export default function LandingPage() {
  return (
    <>
      <main id="main-content">
        <Hero />
        <Features />
      </main>

      <Indexable source="#main-content" />
    </>
  );
}

Extracted Markdown (what crawlers see):

# Product Name

Revolutionary solution for modern web development.

[Image: Product dashboard screenshot]

## Key Features

[Image: Performance icon]

### Fast Performance

Optimized for speed

[Image: Integration icon]

### Easy Integration

Works with existing tools

All CSS classes, interactive elements, wrapper divs, and layout containers are removed. Only semantic content remains.


Why This Approach Works

No Content Mismatch

The same factual content is served to everyone. Only the presentation differs.

Faster Crawling

Search engines immediately see clean content without parsing UI elements.

Better AI Understanding

AI systems get structured Markdown without HTML noise.

SEO Safe

  • No user-agent detection
  • No cloaking (content hidden with CSS, not conditionally rendered)
  • No content generation or modification
  • Follows search engine guidelines

API Reference

<Indexable />

The primary component for content extraction.

Props

| Prop | Type | Required | Default | Description | |------|------|----------|---------|-------------| | source | string | Yes | - | CSS selector of the content container to extract from | | enabled | boolean | No | true | Toggle extraction on/off | | onExtract | (markdown: string) => void | No | - | Callback function that receives the extracted markdown |

Example with All Props

<Indexable
  source="#content"
  enabled={true}
  onExtract={(markdown) => {
    console.log('Extracted:', markdown);
  }}
/>

How It Works

Indexable follows a four-step process to transform UI-heavy content into clean, crawlable markup:

1. DOM Extraction

Clones the specified content container from the DOM without mutating the original. Your interactive UI remains untouched.

const cloned = sourceElement.cloneNode(true);

2. Semantic Filtering

Identifies and preserves only meaningful content:

Keeps:

  • Headings: h1, h2, h3, h4, h5, h6
  • Text: p, strong, em, blockquote
  • Lists: ul, ol, li
  • Code: pre, code
  • Links: a
  • Tables: table, thead, tbody, tr, th, td

Removes:

  • Interactive: button, form, input, select, textarea
  • Navigation: nav, header, footer
  • Media: svg, canvas, video, audio
  • Scripts: script, style
  • Layout: All wrapper div and span elements

Converts:

  • Images to [Image: alt text] format

Strips:

  • All HTML attributes (class, style, id, etc.)

3. Markdown Conversion

Converts cleaned HTML to Markdown using Turndown with deterministic rules. No AI involved just pure transformation.

4. Hidden Injection

Injects the markdown into a hidden container that's visible to crawlers but not to users:

<div data-indexable style="display:none" aria-hidden="true">
  <article>
    <pre><!-- Extracted markdown here --></pre>
  </article>
</div>

Image Handling

Images are automatically converted to text representations using their alt attributes, making visual content accessible to text-based crawlers:

Input:

<img src="dashboard.png" alt="Analytics dashboard showing user metrics" />

Output:

[Image: Analytics dashboard showing user metrics]

If no alt attribute is provided, it defaults to [Image: Image].


Verification

To verify that Indexable is working correctly:

  1. Right-click on your page and select View Page Source
  2. Press Ctrl+F (or Cmd+F on Mac) and search for data-indexable
  3. You should see the hidden container with your extracted content in clean Markdown format

Core Philosophy

Indexable operates under strict principles to ensure SEO safety and content integrity:

  • No user-agent detection: Same content served to all visitors
  • No cloaking: Content is hidden with CSS, not conditionally rendered based on who's viewing
  • No content generation: Your content is never modified, rewritten, or generated by AI
  • No AI involvement: Deterministic extraction only predictable and transparent
  • Presentation layer only: Changes how content is presented, not what content exists

Indexable does not change what content is served. It changes how clearly that content can be understood.


Use Cases

Educational Content & Tutorials

Extract learning material while removing code editors, interactive widgets, and UI controls.

<article id="tutorial">
  <h1>Learn React Hooks</h1>
  <p>useState allows you to add state to functional components.</p>
  <CodeEditor /> {/* Removed from extraction */}
  <button>Run Code</button> {/* Removed from extraction */}
</article>

<Indexable source="#tutorial" />

Product Pages

Index product descriptions and specifications without "Add to Cart" buttons, image galleries, and review widgets.

Documentation Sites

Make API references and guides crawlable without navigation menus, search bars, and interactive examples.

Blog Posts & Articles

Extract article content while removing social sharing buttons, comment forms, and related post widgets.

Q&A Platforms

Preserve questions and answers while removing voting buttons, user avatars, and interaction controls.


Achievements

By using Indexable, you achieve:

  • Faster crawling for search engines no UI parsing overhead
  • Better content understanding by AI systems clean, structured data
  • Improved SEO clarity without cloaking or manipulation
  • Future-ready content that works with emerging AI technologies
  • Same data, multiple formats humans get UI, machines get structure

TypeScript Support

Indexable is written in TypeScript and includes full type definitions.

import { Indexable, IndexableProps } from 'react-indexable';

const props: IndexableProps = {
  source: '#content',
  enabled: true,
  onExtract: (markdown: string) => {
    // Type-safe callback
  }
};

Browser Compatibility

Indexable works in all modern browsers that support:

  • ES2020
  • React 18+
  • DOM APIs (querySelector, cloneNode)

Performance Considerations

  • Extraction runs once after component mount using useEffect
  • Uses setTimeout to ensure DOM is fully rendered
  • Clones DOM nodes to avoid mutating the original tree
  • Minimal runtime overhead extraction happens client-side after initial render

Limitations

  • Requires JavaScript to be enabled for extraction (client-side only in current version)
  • Content must be in the DOM at extraction time
  • Does not handle dynamically loaded content after initial render
  • Server-side rendering support planned for future releases

Contributing

Contributions are welcome! Please follow these guidelines:

Getting Started

  1. Fork the repository
  2. Clone your fork: git clone https://github.com/Amit00008/indexable.git
  3. Install dependencies: npm install
  4. Create a branch: git checkout -b feature/your-feature-name

Development Workflow

# Build the package
npm run build

Testing Your Changes

  1. Make changes to the source code in src/
  2. Build the package: npm run build
  3. Test in the playground:
    cd playground
    npm install
    npm run dev
  4. Open http://localhost:3000 to see your changes

Contribution Guidelines

  • Code Quality: Follow existing code style and patterns
  • Type Safety: Maintain full TypeScript coverage
  • Documentation: Update README and inline comments for new features
  • Philosophy: Ensure changes align with core principles (no AI, no cloaking, deterministic)
  • Testing: Add tests for new functionality (when test suite is available)

Pull Request Process

  1. Ensure your code builds without errors
  2. Update documentation to reflect changes
  3. Write a clear PR description explaining the changes
  4. Reference any related issues

What We're Looking For

  • Bug fixes
  • Performance improvements
  • Better error handling
  • Additional semantic element support
  • Server-side rendering support (Next.js App Router)
  • Test coverage
  • Documentation improvements

What We're Not Looking For

  • AI-based content generation or modification
  • User-agent detection
  • Content manipulation features
  • SEO manipulation techniques
  • Cloaking implementations

Support


Acknowledgments

Built with:


Roadmap

Future enhancements under consideration:

  • Server-side rendering support for Next.js App Router
  • data-indexable-ignore attribute for manual exclusions
  • handle dynamically loaded content after initial render
  • Automated testing suite

Conclusion

The web is no longer read by humans alone. Search engines and AI crawlers consume content differently, and designing only for UI is no longer enough.

By separating content from presentation, Indexable helps you:

  • Keep users happy with rich interfaces
  • Help search engines index content faster
  • Make your content AI-readable and future-ready

This approach is simple, safe, and practical and it fits perfectly into modern web development.

Remember: Indexable is an infrastructure primitive, not an SEO hack. Use it to make your content more accessible to search engines and AI systems while maintaining the same content for all users.