react-indexable
v0.5.0
Published
Automatically extract semantic content from rendered UI and inject it back as crawlable, SEO-safe markup
Downloads
505
Maintainers
Readme
React Indexable
Serving the same content to humans, search engines, and AI crawlers the right way.
The Problem
When we build websites, we think about human users first design, images, interactions, and layout. But today, the web is read by more than just humans:
- Search engines crawl pages to index and rank them
- AI crawlers read content to understand, summarize, and reuse it
The problem is that humans, search engines, and AI crawlers do not read the web the same way.
What Goes Wrong?
A typical web page contains images, buttons, layout containers, and JavaScript-driven interactions. This works great for humans, but machines care about content clarity, not UI.
- Search engines must parse unnecessary UI elements before reaching core text
- Important information may be hidden behind JavaScript interactions
- AI crawlers struggle to extract clean, structured information
- Educational content, tutorials, and Q&A become harder to interpret
The content exists, but machines don't read it efficiently.
The Solution
Indexable solves this by separating content from presentation. It automatically extracts semantic content from your rendered UI and injects it as hidden, crawlable markup.
Core Principle: Same Content, Different Presentation
- Humans → Rich UI (images, buttons, interactions)
- Search Engines → Clean, fast, text-focused HTML
- AI Crawlers → Structured Markdown
As long as the meaning and data remain the same, this approach is SEO-safe and future-proof.
Key Features
- Automatic Extraction: Identifies and preserves meaningful content (headings, paragraphs, lists, code blocks)
- UI Noise Removal: Strips away interactive elements (buttons, forms, navigation)
- Image Handling: Converts images to text representations using alt attributes
- Deterministic: No AI, no guessing pure, predictable content transformation
- SEO Safe: Hidden with CSS only (
display: none), no cloaking, no user-agent detection - Zero Configuration: Simple API with sensible defaults
Installation
npm install react-indexableQuick Start
Basic Example
Let's take a simple math question page:
import { Indexable } from 'react-indexable';
export default function MathQuestion() {
const [showAnswer, setShowAnswer] = useState(false);
return (
<>
<main id="content">
<h1>Math Question</h1>
<img src="math.png" alt="Math illustration" />
<p><strong>Question:</strong> What is 8 + 4?</p>
<button onClick={() => setShowAnswer(true)}>Show Answer</button>
{showAnswer && <p>Answer: 12</p>}
</main>
<Indexable source="#content" />
</>
);
}What humans see:
- Rich UI with images and interactive button
What gets extracted for crawlers:
# Math Question
[Image: Math illustration]
**Question:** What is 8 + 4?
Answer: 12The button is automatically removed. Images are converted to text. Only semantic content remains.
Advanced Example
Component-Based Application
import { Indexable } from 'react-indexable';
function Hero() {
return (
<section className="hero-section">
<h1>Product Name</h1>
<p>Revolutionary solution for modern web development.</p>
<button onClick={handleClick}>Get Started</button>
<img src="hero.jpg" alt="Product dashboard screenshot" />
</section>
);
}
function Features() {
return (
<section className="features-grid">
<h2>Key Features</h2>
<div className="feature-cards">
<div className="card">
<img src="icon1.svg" alt="Performance icon" />
<h3>Fast Performance</h3>
<p>Optimized for speed</p>
</div>
<div className="card">
<img src="icon2.svg" alt="Integration icon" />
<h3>Easy Integration</h3>
<p>Works with existing tools</p>
</div>
</div>
</section>
);
}
export default function LandingPage() {
return (
<>
<main id="main-content">
<Hero />
<Features />
</main>
<Indexable source="#main-content" />
</>
);
}Extracted Markdown (what crawlers see):
# Product Name
Revolutionary solution for modern web development.
[Image: Product dashboard screenshot]
## Key Features
[Image: Performance icon]
### Fast Performance
Optimized for speed
[Image: Integration icon]
### Easy Integration
Works with existing toolsAll CSS classes, interactive elements, wrapper divs, and layout containers are removed. Only semantic content remains.
Why This Approach Works
No Content Mismatch
The same factual content is served to everyone. Only the presentation differs.
Faster Crawling
Search engines immediately see clean content without parsing UI elements.
Better AI Understanding
AI systems get structured Markdown without HTML noise.
SEO Safe
- No user-agent detection
- No cloaking (content hidden with CSS, not conditionally rendered)
- No content generation or modification
- Follows search engine guidelines
API Reference
<Indexable />
The primary component for content extraction.
Props
| Prop | Type | Required | Default | Description |
|------|------|----------|---------|-------------|
| source | string | Yes | - | CSS selector of the content container to extract from |
| enabled | boolean | No | true | Toggle extraction on/off |
| onExtract | (markdown: string) => void | No | - | Callback function that receives the extracted markdown |
Example with All Props
<Indexable
source="#content"
enabled={true}
onExtract={(markdown) => {
console.log('Extracted:', markdown);
}}
/>How It Works
Indexable follows a four-step process to transform UI-heavy content into clean, crawlable markup:
1. DOM Extraction
Clones the specified content container from the DOM without mutating the original. Your interactive UI remains untouched.
const cloned = sourceElement.cloneNode(true);2. Semantic Filtering
Identifies and preserves only meaningful content:
Keeps:
- Headings:
h1,h2,h3,h4,h5,h6 - Text:
p,strong,em,blockquote - Lists:
ul,ol,li - Code:
pre,code - Links:
a - Tables:
table,thead,tbody,tr,th,td
Removes:
- Interactive:
button,form,input,select,textarea - Navigation:
nav,header,footer - Media:
svg,canvas,video,audio - Scripts:
script,style - Layout: All wrapper
divandspanelements
Converts:
- Images to
[Image: alt text]format
Strips:
- All HTML attributes (class, style, id, etc.)
3. Markdown Conversion
Converts cleaned HTML to Markdown using Turndown with deterministic rules. No AI involved just pure transformation.
4. Hidden Injection
Injects the markdown into a hidden container that's visible to crawlers but not to users:
<div data-indexable style="display:none" aria-hidden="true">
<article>
<pre><!-- Extracted markdown here --></pre>
</article>
</div>Image Handling
Images are automatically converted to text representations using their alt attributes, making visual content accessible to text-based crawlers:
Input:
<img src="dashboard.png" alt="Analytics dashboard showing user metrics" />Output:
[Image: Analytics dashboard showing user metrics]If no alt attribute is provided, it defaults to [Image: Image].
Verification
To verify that Indexable is working correctly:
- Right-click on your page and select View Page Source
- Press
Ctrl+F(orCmd+Fon Mac) and search fordata-indexable - You should see the hidden container with your extracted content in clean Markdown format
Core Philosophy
Indexable operates under strict principles to ensure SEO safety and content integrity:
- No user-agent detection: Same content served to all visitors
- No cloaking: Content is hidden with CSS, not conditionally rendered based on who's viewing
- No content generation: Your content is never modified, rewritten, or generated by AI
- No AI involvement: Deterministic extraction only predictable and transparent
- Presentation layer only: Changes how content is presented, not what content exists
Indexable does not change what content is served. It changes how clearly that content can be understood.
Use Cases
Educational Content & Tutorials
Extract learning material while removing code editors, interactive widgets, and UI controls.
<article id="tutorial">
<h1>Learn React Hooks</h1>
<p>useState allows you to add state to functional components.</p>
<CodeEditor /> {/* Removed from extraction */}
<button>Run Code</button> {/* Removed from extraction */}
</article>
<Indexable source="#tutorial" />Product Pages
Index product descriptions and specifications without "Add to Cart" buttons, image galleries, and review widgets.
Documentation Sites
Make API references and guides crawlable without navigation menus, search bars, and interactive examples.
Blog Posts & Articles
Extract article content while removing social sharing buttons, comment forms, and related post widgets.
Q&A Platforms
Preserve questions and answers while removing voting buttons, user avatars, and interaction controls.
Achievements
By using Indexable, you achieve:
- Faster crawling for search engines no UI parsing overhead
- Better content understanding by AI systems clean, structured data
- Improved SEO clarity without cloaking or manipulation
- Future-ready content that works with emerging AI technologies
- Same data, multiple formats humans get UI, machines get structure
TypeScript Support
Indexable is written in TypeScript and includes full type definitions.
import { Indexable, IndexableProps } from 'react-indexable';
const props: IndexableProps = {
source: '#content',
enabled: true,
onExtract: (markdown: string) => {
// Type-safe callback
}
};Browser Compatibility
Indexable works in all modern browsers that support:
- ES2020
- React 18+
- DOM APIs (querySelector, cloneNode)
Performance Considerations
- Extraction runs once after component mount using
useEffect - Uses
setTimeoutto ensure DOM is fully rendered - Clones DOM nodes to avoid mutating the original tree
- Minimal runtime overhead extraction happens client-side after initial render
Limitations
- Requires JavaScript to be enabled for extraction (client-side only in current version)
- Content must be in the DOM at extraction time
- Does not handle dynamically loaded content after initial render
- Server-side rendering support planned for future releases
Contributing
Contributions are welcome! Please follow these guidelines:
Getting Started
- Fork the repository
- Clone your fork:
git clone https://github.com/Amit00008/indexable.git - Install dependencies:
npm install - Create a branch:
git checkout -b feature/your-feature-name
Development Workflow
# Build the package
npm run buildTesting Your Changes
- Make changes to the source code in
src/ - Build the package:
npm run build - Test in the playground:
cd playground npm install npm run dev - Open
http://localhost:3000to see your changes
Contribution Guidelines
- Code Quality: Follow existing code style and patterns
- Type Safety: Maintain full TypeScript coverage
- Documentation: Update README and inline comments for new features
- Philosophy: Ensure changes align with core principles (no AI, no cloaking, deterministic)
- Testing: Add tests for new functionality (when test suite is available)
Pull Request Process
- Ensure your code builds without errors
- Update documentation to reflect changes
- Write a clear PR description explaining the changes
- Reference any related issues
What We're Looking For
- Bug fixes
- Performance improvements
- Better error handling
- Additional semantic element support
- Server-side rendering support (Next.js App Router)
- Test coverage
- Documentation improvements
What We're Not Looking For
- AI-based content generation or modification
- User-agent detection
- Content manipulation features
- SEO manipulation techniques
- Cloaking implementations
Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- NPM: npmjs.com/package/react-indexable
Acknowledgments
Built with:
Roadmap
Future enhancements under consideration:
- Server-side rendering support for Next.js App Router
data-indexable-ignoreattribute for manual exclusions- handle dynamically loaded content after initial render
- Automated testing suite
Conclusion
The web is no longer read by humans alone. Search engines and AI crawlers consume content differently, and designing only for UI is no longer enough.
By separating content from presentation, Indexable helps you:
- Keep users happy with rich interfaces
- Help search engines index content faster
- Make your content AI-readable and future-ready
This approach is simple, safe, and practical and it fits perfectly into modern web development.
Remember: Indexable is an infrastructure primitive, not an SEO hack. Use it to make your content more accessible to search engines and AI systems while maintaining the same content for all users.
