npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@tfw.in/structura-lib

v0.2.12

Published

Structura Library Components

Readme

@tfw.in/structura-lib

A React component library for PDF document viewing with structured data extraction and rendering.

Features

  • PDF & JSON Side-by-Side Viewing - View original PDF alongside extracted structured content
  • Edit Mode - Inline editing of extracted content
  • Math Rendering - LaTeX math expressions rendered via KaTeX ($...$ inline, $$...$$ display)
  • Semantic Tags - Visual highlighting for corrections, additions, and deletions
  • Header/Footer Detection - Automatic badges for header and footer content
  • Table Support - Rich table rendering with cell-level editing

Installation

npm install @tfw.in/structura-lib

Usage

import { Structura } from '@tfw.in/structura-lib';
import '@tfw.in/structura-lib/styles.css';

function App() {
  return (
    <Structura
      apiKey="your-api-key"
      baseUrl="https://api.example.com"
    />
  );
}

Props

Core Props

| Prop | Type | Default | Description | |------|------|---------|-------------| | apiKey | string | required | API key for authentication | | baseUrl | string | undefined | Optional API base URL | | initialPdfPath | string \| null | null | Initial PDF file path to load | | initialJsonData | any | null | Initial JSON data to display |

Feature Flags

| Prop | Type | Default | Description | |------|------|---------|-------------| | editMode | boolean | true | Enable/disable edit mode toggle | | jsonMode | boolean | true | Enable/disable JSON view mode toggle | | mathRendering | boolean | true | Enable LaTeX math rendering | | semanticTags | boolean | true | Enable/disable semantic tags toggle | | headerFooterBadges | boolean | true | Show header/footer badges | | postProcessors | PostProcessor[] | defaultPostProcessors | Ordered array of HTML → HTML plugins. Runs before math rendering. Pass [] to disable all cleanup. See Post-Processing Plugins. | | dedupePostTableText | boolean | true | Deprecated. Flip the default preset on/off. Prefer postProcessors. | | htmlPostProcessor | (html: string) => string | undefined | Deprecated. Appended to the plugin chain. Prefer postProcessors. | | defaultViewMode | 'read' \| 'edit' \| 'json' | 'read' | Initial view mode |

Callbacks

| Prop | Type | Default | Description | |------|------|---------|-------------| | onContentChange | function | undefined | Callback when content is edited: (blockId, oldContent, newContent) => void | | onExport | function | undefined | Callback when data is exported: (data) => void |

Styling Props

| Prop | Type | Default | Description | |------|------|---------|-------------| | className | string | undefined | Custom class for the container | | style | React.CSSProperties | undefined | Inline styles for the container | | pdfPanelClassName | string | undefined | Custom class for the PDF panel | | htmlPanelClassName | string | undefined | Custom class for the HTML panel | | theme | object | undefined | Theme customization object (see below) |

Theme Object

theme={{
  primaryColor: '#3b82f6',      // Primary accent color
  backgroundColor: '#ffffff',   // Background color
  textColor: '#1f2937',         // Text color
  borderColor: '#e5e7eb',       // Border color
  fontFamily: 'Inter, sans-serif'  // Font family
}}

Full Example

import { Structura } from '@tfw.in/structura-lib';
import '@tfw.in/structura-lib/dist/esm/styles.css';

function App() {
  const handleContentChange = (blockId, oldContent, newContent) => {
    console.log(`Block ${blockId} changed`);
    console.log('Old:', oldContent);
    console.log('New:', newContent);
  };

  const handleExport = (data) => {
    console.log('Exported data:', data);
    // Save to your backend, etc.
  };

  return (
    <Structura
      apiKey="your-api-key"
      baseUrl="https://api.example.com"
      editMode={true}
      jsonMode={true}
      mathRendering={true}
      semanticTags={true}
      headerFooterBadges={true}
      defaultViewMode="read"
      onContentChange={handleContentChange}
      onExport={handleExport}
    />
  );
}

Math Rendering

The Structura component automatically renders LaTeX math expressions ($m^2$, $L/m^2$, etc.) when mathRendering={true} (the default).

If you extract text from the response structure (e.g. Gemini corrected output) and render it in your own components, the raw LaTeX delimiters will appear as plain text. Use renderMathInHtml to convert them to rendered math:

import { renderMathInHtml } from '@tfw.in/structura-lib';

// Convert LaTeX delimiters to rendered HTML
const rawText = "Total membrane area required 0.02$m^2$";
const rendered = renderMathInHtml(rawText);
// Use with dangerouslySetInnerHTML
<div dangerouslySetInnerHTML={{ __html: rendered }} />

Utilities

| Export | Type | Description | |--------|------|-------------| | renderMathInHtml(html) | function | Converts $...$ (inline) and $$...$$ (display) math to KaTeX HTML. Use this when rendering extracted text in your own components. | | containsMath(html) | function | Returns true if the string contains math delimiters. | | MathContent | React component | Renders HTML with math expressions. Props: html, className, as (element type). | | useMathHtml(html) | React hook | Returns rendered math HTML string via useMemo. |

import { MathContent, renderMathInHtml, containsMath, useMathHtml } from '@tfw.in/structura-lib';

// As a React component
<MathContent html="The formula is $E = mc^2$" />

// As a hook
function MyComponent({ text }) {
  const rendered = useMathHtml(text);
  return <span dangerouslySetInnerHTML={{ __html: rendered }} />;
}

// Check before processing
if (containsMath(text)) {
  const html = renderMathInHtml(text);
}

Note: Ensure you import the library styles (@tfw.in/structura-lib/styles.css) — this loads the KaTeX CSS required for proper math rendering.

Post-Processing Plugins

HTML cleanup is plugin-based. A PostProcessor is any pure function (html: string) => string. Plugins run in order, before math rendering. You can use the defaults, subset them, extend, or replace entirely.

import type { PostProcessor } from '@tfw.in/structura-lib';
type PostProcessor = (html: string) => string;

Built-in plugins

| Plugin | What it does | |--------|-------------| | dedupeTableText | Strips floating <td>/<th>/<tr> elements or prose blocks that repeat the preceding <table>'s content (a frequent Gemini artifact). | | fixTocTable | For TOC-like tables where Gemini leaves section numbers stuck at the start of the title cell ("10 In Process Data" in one <td>), moves the digits into the leading cell. |

The default preset is exported as defaultPostProcessors:

import { defaultPostProcessors } from '@tfw.in/structura-lib';
// === [dedupeTableText, fixTocTable]

Using the plugin system

import {
  Structura,
  defaultPostProcessors,
  dedupeTableText,
  fixTocTable,
} from '@tfw.in/structura-lib';

// 1. Default — all built-ins applied:
<Structura initialJsonData={data} />

// 2. Subset — only dedupe, skip TOC fix:
<Structura initialJsonData={data} postProcessors={[dedupeTableText]} />

// 3. Extend with your own plugin:
const stripLatexNoise = (html) =>
  html.replace(/\\bigcirc/g, '○').replace(/\\overline\{([^}]*)\}/g, '$1');

<Structura
  initialJsonData={data}
  postProcessors={[...defaultPostProcessors, stripLatexNoise]}
/>

// 4. Turn off all cleanup:
<Structura initialJsonData={data} postProcessors={[]} />

Writing your own plugin

A plugin is any function matching (html: string) => string. Keep it pure:

import type { PostProcessor } from '@tfw.in/structura-lib';

const removeEmptyParagraphs: PostProcessor = (html) =>
  html.replace(/<p>\s*<\/p>/g, '');

Plugins that throw are isolated — the rest of the chain continues on the last good output.

Composing outside of Structura

If you process HTML outside the component (custom sidebar, PDF export, etc.), use composePostProcessors:

import {
  composePostProcessors,
  defaultPostProcessors,
  renderMathInHtml,
} from '@tfw.in/structura-lib';

const pipeline = composePostProcessors(defaultPostProcessors);
const cleaned = renderMathInHtml(pipeline(rawHtml));

Math rendering is separate

renderMathInHtml (controlled by the mathRendering prop) runs after all post-processors and is not a plugin. It transforms $...$ → KaTeX HTML, which changes output semantics beyond cleanup.

Cleaning the whole JSON tree

If you extract content from the parsed JSON for downstream processing (not just rendering), use cleanJsonData. It walks the tree and runs your plugin chain on every /GeminiCorrected block's html.

import { cleanJsonData, dedupeTableText, defaultPostProcessors } from '@tfw.in/structura-lib';

// Default — uses defaultPostProcessors:
const cleaned = cleanJsonData(responseJson);

// Only dedupe, skip TOC fix:
const cleaned2 = cleanJsonData(responseJson, { postProcessors: [dedupeTableText] });

// Extend defaults with your own plugin:
const cleaned3 = cleanJsonData(responseJson, {
  postProcessors: [...defaultPostProcessors, myPlugin],
});

// Raw — no cleanup:
const raw = cleanJsonData(responseJson, { postProcessors: [] });

Inside Structura this is applied automatically with your chosen postProcessors, so onContentChange / onExport already hand you clean HTML.

Processor contract (immutability)

Every processor exported by this library follows the same contract:

  • Pure — the input value is never mutated
  • Immutable output — strings are immutable by language; object outputs are deep-frozen so downstream code cannot accidentally mutate them

| Processor | Input | Output | |-----------|-------|--------| | dedupeTableText(html) | string | string (new) | | fixTocTable(html) | string | string (new) | | renderMathInHtml(html) | string | string (new) | | cleanJsonData(data, opts?) | object tree | new tree, deep-frozen |

If you need a mutable tree (e.g. to hand to another library that edits in place), pass { disableFreeze: true }:

const mutable = cleanJsonData(data, { disableFreeze: true });

Recipes for common Gemini quirks

htmlPostProcessor runs after built-in dedupe and before math rendering. Use it to patch issues you see in your own data without waiting on an SDK release. Below are tested recipes for issues seen in customer documents — copy what you need.

Tip: Always wrap your processor in useCallback so React doesn't re-run rendering on every parent update. Compose multiple recipes by chaining .replace(...) or piping helper functions.

1. Section header order swap ("Equipment 3""3 Equipment")

Gemini sometimes emits headings as name + number instead of number + name. Detect the pattern at the end of a heading's text and swap.

const fixSectionOrder = (html) =>
  html.replace(
    /<(h[1-6])([^>]*)>([^<]+?)\s+(\d+(?:\.\d+)*)\s*<\/\1>/g,
    '<$1$2>$4 $3</$1>'
  );

2. Strip stray LaTeX tokens that aren't real math

Gemini occasionally inserts \bullet, \bigcirc, \overline{...}, \mathbf{...} around plain text. Replace them with their visual equivalents.

const stripLatexNoise = (html) =>
  html
    .replace(/\\bullet/g, '•')
    .replace(/\\bigcirc/g, '○')
    .replace(/\\overline\{([^}]*)\}/g, '$1')
    .replace(/\\mathbf\{([^}]*)\}/g, '$1')
    .replace(/\\text\{([^}]*)\}/g, '$1');

3. Drop block-type pseudo-attributes from prose

Gemini sometimes leaves debug-style attributes like <p block-type="Text">. Harmless but noisy.

const stripBlockType = (html) => html.replace(/\s+block-type="[^"]*"/g, '');

4. Compose multiple recipes

import { useCallback } from 'react';
import { Structura } from '@tfw.in/structura-lib';

function MyViewer({ jsonData }) {
  const postProcess = useCallback((html) => {
    return [fixSectionOrder, stripLatexNoise, stripBlockType]
      .reduce((acc, fn) => fn(acc), html);
  }, []);

  return (
    <Structura
      initialJsonData={jsonData}
      htmlPostProcessor={postProcess}
    />
  );
}

5. Apply the same transforms to the JSON before extracting data

If your downstream code reads from node.html directly (export, search, etc.), run the same transform on the JSON tree so consumers see the cleaned content:

import { cleanJsonData } from '@tfw.in/structura-lib';

const cleaned = cleanJsonData(jsonData, {
  transformHtml: (html) => stripLatexNoise(fixSectionOrder(html)),
});

// `cleaned` now has both built-in table dedupe AND your custom rules
// applied to every /GeminiCorrected block's `html` field.

Heads up: Some quirks (missing columns, deteriorated OCR values) are upstream data issues that no UI-side post-processor can fix — those need a pipeline change. The recipes above only address things visible in the HTML string.

Semantic Tags

Parse and render semantic tags for document corrections:

import { SemanticTagRenderer, parseSemanticTags } from '@tfw.in/structura-lib';

// Render with visual highlighting
<SemanticTagRenderer content="Text with <add>additions</add> and <del>deletions</del>" />

// Parse tags programmatically
const parsed = parseSemanticTags(content);

License

MIT