@uniformdev/siphon-explorer

v1.0.11

Published

18 days ago

Siphon Explorer

0High
0Medium
0Low

Siphon Explorer

A local developer tool for browsing Sitecore content tree data exported to JSON files. Runs as a Node.js CLI that spins up a Next.js web app on localhost.

Purpose

When working with exported Sitecore databases, it's useful to be able to explore the content tree, inspect item fields, and follow references between items — without needing a running Sitecore instance. Siphon Explorer provides a read-only UI that mirrors the Sitecore Content Editor interface.

Features

Content tree (left panel)

Collapsible panel — the "CONTENT TREE" label in the top-left corner is a toggle button (◀ / ▶); click it to collapse the entire tree panel down to a thin vertical bar, reclaiming the space for the item detail view. Click again to expand back to the previous width. Collapsed state is persisted in localStorage (treePanelCollapsed). While collapsed, the resize handle is hidden.
Full Sitecore item hierarchy, expandable/collapsible, sorted by Sortorder
Virtual folder nodes — path segments missing from the index are shown as non-selectable grey 📁 folders so the tree is always structurally complete
"content" first under /sitecore — the content node is always sorted first among its siblings
"Data" child always first — under any page item, the "Data" child node is sorted to the top regardless of Sortorder (uses the IsPage check, or raw HasLayout in Legacy counting mode)
Icons — pages show 🌐; items with a file on disk show a yellow CSS folder icon; items named Data with a file show a green folder icon; virtual path-gap folders show 📁; items in the index but without a JSON file show no icon and are not selectable. The "is page" definition uses the IsPage check by default; Legacy counting downgrades it to raw HasLayout
Language selector in the panel header — dropdown listing every language present in the data directory (any subfolder of items/ that has a matching _index.json). Selection persists in the URL as ?lang=<code> (the default is omitted). Changing the language reloads the tree, item detail, and full-text index for the new language.
Child / descendant counts — nodes with children show gray counts on the right: {children} ({descendants} items {x}%, {descendantPages} pages {y}%) where item % is relative to all items in the dataset and page % is relative to total counted pages. By default (smart counting), only pages that are (a) descendants of a node named Home and (b) not inside a Data subfolder of another page are counted. The Legacy counting checkbox next to the sort control switches to the original behavior: all items with HasLayout under /sitecore/content are counted. State is persisted in the URL as ?legacy=1
Sort mode — dropdown with __SortOrder (default), Alphabet, or Pages Count. State is persisted in the URL as ?sort=alphabet or ?sort=pagesCount (default sortorder is omitted)
Search — Ctrl+E (or Cmd+E) opens a search box; results filter by item name or item ID with multi-word support (all words must match, e.g. cha eugene finds CHA Website Eugene's Copy); curly braces are stripped from search words so {guid} and guid match identically; each result displays the same icon as in the tree (🌐, 🔴, folder, paper); results are ranked exact-match-first — items whose name exactly equals the query come first, then prefix matches, then substring matches, then items matched only by ID (so searching hello returns hello before hello1 and hello-world); when a GUID is entered, an additional lookup checks whether it matches any item's Item ID, Composition ID, or Entry IDs — if found, the matched item appears at the top of results with a label showing which ID type matched (and is de-duplicated from the regular results); Escape closes
Full Text Search — click "Full Text Search" link in the panel header to open a modal; searches across all field values; supports glob-style patterns: abc*def matches text containing abc then def anywhere after it (ignoring line breaks); results are ranked exact-match-first on item name (exact → prefix → substring → matched only by path/field text); indexing runs in the background on startup and is resumable across restarts via a search-cache.json file in the data directory

Item detail (right panel)

Header icon — the detail header shows the same icon as the tree (🌐, 🔴, folder, paper) instead of a static 🗃️
Quick Info section: Display name, Slug, Item ID, Item name, Item path, Composition id, Entry id, Template, Template ID, Language, Database. Composition id is shown only for pages (per the IsPage check); for non-page items it renders as empty
Item path breadcrumb — each path segment is a clickable link that navigates to that item; segments without a corresponding file on disk are shown as plain text (not clickable)
Fields grouped by section, sections sorted by Sortorder; field name and value separated by a fixed-width vertical divider
Referrers tab — shows all items whose field values contain the current item's GUID; uses the same background full-text index as Full Text Search; available on all items (pages and non-pages alike)
Presentation tab — shown for pages (per the IsPage check) that have a presentation file on disk; displays the parsed index.json as a tree of placeholders and renderings. The resolved absolute file path of the index.json is shown at the top of the tab so you can see exactly which folder (presentation/<language>/… or html/<language>/…) the data came from.
Html tab — shown for pages that have a presentation file available (i.e. alongside the Presentation tab); displays the raw source of the index.html file sitting next to the index.json file used by the Presentation tab (same folder resolution: presentation/<language>/<sub-path>/index.json then html/<language>/<sub-path>/index.json, where <sub-path> is the item's path segments after the first home ancestor, matched case-insensitively) as preformatted text inside a <pre><code> block — the HTML is not rendered or injected into the DOM. The resolved absolute file path of the index.html is shown at the top of the tab. If the HTML file does not exist, shows a "not found" state.
Analysis tab — shown alongside the Presentation tab (last in the tab row) for pages with a presentation file; fetches the same presentation data and displays a "Components" section header with a bulleted list of every unique component (rendering) referenced anywhere in the presentation tree. Each bullet shows the component name, a clickable GUID link to its rendering definition (when available), and the item path of the rendering definition if it is present in the index.
Template tab — shown only when viewing a Template item (TemplateID {AB86861A-6030-46C5-B394-E8F99E8B87DB}); lists all items based on this template; includes a search box to filter by name/path and pagination (50 items per page) for large result sets; results are ranked exact-match-first on item name
Hide empty fields checkbox — hides fields with no value; state persisted in localStorage
Hide system fields checkbox — hides fields whose name starts with __; state persisted in localStorage
GUID navigation — any GUID value (Item ID, Template ID, field values, pipe-separated multilist/treelist values) is rendered as a clickable blue link that navigates to that item
Resizable panels — drag the divider between the tree and detail panels to adjust the split; constrained between 200px and 480px
Copy buttons — each field value and Quick Info row shows a copy button (⎘) on hover that copies the raw value to the clipboard; turns green (✓) briefly on success

Navigation

URL persistence — selected item ID stored as ?id=<uuid>; language as ?lang=<code>, sort mode as ?sort=<mode>, legacy counting as ?legacy=1, and active detail tab as ?tab=<tab> (only non-default values are written so the URL stays clean). F5 restores the same item with ancestors expanded plus the chosen language/sort/counting/tab
Browser back/forward — each item selection is a router.push entry; Back/Forward work as expected; the tree panel automatically scrolls to keep the selected item visible
Browser history titles — document.title is updated to Siphon Explorer — /sitecore/content/… on each selection, so Chrome history shows the item path rather than a generic URL
Default selection — on first load (no URL param), /sitecore/content is pre-selected if present

Usage

# Install dependencies (once)
npm install

# Start the browser (defaults to port 5000)
node bin/siphon-explorer.js ./data

# Override the port explicitly
node bin/siphon-explorer.js ./data --port 3000

# Or set it via the PORT env var (handy for process managers)
PORT=3000 node bin/siphon-explorer.js ./data

# Or if installed globally via npm link / npm install -g
siphon-explorer ./data

# Run unit tests
npm test

The language is selectable at runtime via the dropdown in the tree panel header — no restart required. The list of languages is auto-discovered from the data directory (every subfolder of items/ that has a matching _index.json). The initial selection on first load is picked in this order:

en if present
otherwise en-US, en-UK, en-CA, en-AU (first one that's present)
otherwise the first language alphabetically

After that, ?lang=<code> in the URL takes precedence — share/bookmark links to land on a specific language.

The server bundles the React frontend with esbuild at startup (takes < 1 second), then serves everything from a single HTTP server — no Next.js, no build step required.

Then open http://localhost:3000 in your browser.

<data-dir> must be the root of the exported Sitecore data, containing an items/ subdirectory (see Data Format below).

Data Format

The tool expects the following structure inside the data directory:

<data-dir>/
  items/
    <language>/
      _index.json                # item index (preferred location)
      [<x>/[<y>/[<z>/]]]<uuid>.json  # one file per item, optionally sharded by leading hex chars
      {uuid}.json                # legacy unsharded layout (still supported as a fallback)
    <language>_index.json        # alternative index location (checked first)

<language> is the language code chosen via the in-app selector (e.g. en, en-US). All paths below are resolved under that subfolder. The list of available languages is computed by scanning subfolders of items/ and keeping those that have either an inner _index.json or a sibling <language>_index.json.

Index file (`_index.json` or `<language>_index.json`)

A flat JSON object keyed by lowercase UUID without braces:

{
  "0de95ae4-41ab-4d01-9eb0-67441b7c2450": {
    "ItemID": "0de95ae4-41ab-4d01-9eb0-67441b7c2450",
    "Name": "content",
    "DisplayName": null,
    "Path": "/sitecore/content",
    "ParentId": "11111111-1111-1111-1111-111111111111",
    "Sortorder": null,
    "HasLayout": false,
    "TemplateName": "Main section",
    "TemplateID": "e3e2d58c-df95-4230-adc9-279924cece84",
    ...
  }
}

Special entries:

00000000-0000-0000-0000-000000000000 — fake/virtual root used as a sentinel; skipped entirely
11111111-1111-1111-1111-111111111111 — the real Sitecore root (/sitecore)

The HasLayout boolean is used to determine page icons (🌐) and "Data first" sort order.

Item files

Items are stored in one of these layouts (the explorer accepts any of them, on a per-file basis — different items may live at different depths):

| Path under items/<language>/ | Notes | |--------------------------------------------|--------------------------------------------------------| | <uuid>.json | Flat, no shard folders | | <x>/<uuid>.json | Sharded by first hex char | | <x>/<y>/<uuid>.json | Sharded by first two hex chars (current default) | | <x>/<y>/<z>/<uuid>.json | Sharded by first three hex chars | | {uuid}.json | Legacy unsharded layout (lowercase, with curly braces) |

<x>, <y>, <z> are the leading hex characters of the UUID (e.g. for dac24edd-… they are d, a, c). <uuid> is the UUID without braces. Filename and folder casing are matched as written on disk; lookups and directory scans are case-insensitive on Windows/macOS and try lowercase by default on Linux.

The legacy {uuid}.json layout only applies at the top of items/<language>/; sharded folders only contain the no-braces form.

{
  "ItemID": "{0DE95AE4-41AB-4D01-9EB0-67441B7C2450}",
  "Name": "content",
  "DisplayName": null,
  "Path": "/sitecore/content",
  "TemplateName": "Main section",
  "TemplateID": "{E3E2D58C-DF95-4230-ADC9-279924CEE84}",
  "Language": "en",
  "Database": "web",
  "Fields": [
    {
      "ID": "some-guid",
      "Name": "__Display name",
      "DisplayName": "Display name",
      "Type": "Single-Line Text",
      "Value": "Content",
      "Section": {
        "Name": "Appearance",
        "Sortorder": 100
      }
    }
  ]
}

Subdirectories data/media and data/presentation are ignored.

Project Structure

siphon-explorer/
├── bin/
│   └── siphon-explorer.js        # CLI entry point — parses args, sets env vars, starts server
├── pages/
│   ├── _app.js             # Next.js app wrapper
│   ├── index.js            # Main UI: tree panel + detail panel
│   └── api/
│       ├── tree.js         # GET /api/tree — builds and returns the full tree
│       └── item/
│           └── [id].js     # GET /api/item/:id — returns a single item's JSON
├── styles/
│   └── globals.css         # All styles (no CSS modules, no Tailwind)
├── server.js               # Custom Next.js HTTP server
├── next.config.js
└── package.json

Architecture Notes

Server startup

bin/siphon-explorer.js sets DATA_DIR and PORT as environment variables before require('../server'). The language is per-request: every API endpoint reads ?lang=<code> from the query string and passes it to its file-reading helper. The server exposes /api/languages which lists available languages and the resolved default; the frontend calls this on load and stamps ?lang=<code> onto every subsequent request. The tree cache and indexer state are both keyed by language so switching back-and-forth doesn't trigger a rebuild for an already-loaded language.

Tree building (`pages/api/tree.js`)

The tree is built entirely from item paths, not from ParentId. This is more reliable because ParentId references can point to items not present in the index.

The built tree is cached by language: treeCache: Map<language, string>. The pre-warm task on startup builds the cache for the resolved default language only — other languages are built on first request.

Single-language retention. Only the currently-active language's caches are kept hot. When a request resolves to a different language than the last one served, the server evicts the previous language's treeCache entry and calls indexer.evict(prevLanguage) to drop its contentMap and FlexSearch index. This bounds memory at roughly one language's worth of state regardless of how many times the user switches — large datasets (200k+ items) would otherwise accumulate ~1–2 GB per visited language and exhaust Node's default heap. Switching back to a previously-visited language re-reads the on-disk search-cache-<language>.json and rebuilds the FlexSearch index in seconds rather than from scratch.

Algorithm:

Read the item index (_index.json).
Walk the items directory once with fs.readdirSync, recursing into single-hex-char folders up to 3 levels deep, into a Set<string> of existing item UUIDs (lowercased, no braces). Both sharded (<x>/<y>/<uuid>.json etc.) and legacy ({uuid}.json) filenames populate the same set. Per-item hasFile is then an O(1) Set lookup rather than a fs.existsSync syscall per item — this is a ~200× speedup for large (100k+) datasets.
Build a byPath map (lowercase path → node) from the index. Each node includes hasLayout and hasFile.
For each real node, call ensureNode(parentPath) — this recursively creates virtual folder nodes for any missing path segments and links them upward all the way to the root.
Roots are detected as nodes not referenced as a child by any other node.
Children are sorted by Sortorder; under page nodes (hasLayout: true), a child named "Data" is always sorted first.

The built tree is cached in memory as a pre-serialized JSON string per language. After the HTTP server starts listening, setImmediate(() => buildTree(...)) warms the cache for the default language so the user's first /api/tree request returns immediately — on a 200k-item dataset this takes the first browser load from ~18s down to ~0.4s (just the raw transfer of the ~60 MB JSON payload over localhost). Subsequent requests hit the same cached string. The cache is invalidated when the active language changes (see Single-language retention above) or on server restart.

Virtual nodes have id: null and virtual: true. They are expandable but not selectable, displayed with a 📁 icon and grey text.

Nodes whose {id}.json file does not exist on disk have hasFile: false. They appear in the tree (with no icon) but are not selectable and do not trigger a detail panel fetch.

Icon mapping summary (evaluated top-to-bottom — first match wins):

| Condition | Icon | |-----------|------| | Virtual path-gap folder | 📁 emoji | | isPage (smart-counted page) | 🌐 emoji | | isPage (non-smart page in default mode) | 🔴 emoji | | hasVersion: false | Yellow CSS folder | | name === "Data" and hasFile | Green CSS folder | | hasFile with children | Yellow CSS folder | | hasFile leaf | Green CSS paper | | hasFile: false (index-only) | (none) |

Page detection (IsPage)

A node/item is treated as a page when all of the following hold (a port of the C# IsPage extension used by upstream tooling on these files):

HasLayout === true
HasVersion === true (the item has a version in the current language)
TemplateID is not the Email Message Root template {3F12D78C-B7B7-4157-98FC-DA3322EE1A5B} (/sitecore/templates/System/Email/Messages/Inner Content/Message Root) — these have a layout but are content fragments, not pages.

IsPage (server-side) and node.isPage (tree node) drive page icons, "Data first" sorting, Composition ID display, and Presentation/Html/Analysis tab visibility.

HasVersion: false items render as a yellow folder regardless of children, file presence, or HasLayout. They are excluded from descendant items and descendant page counts in tree statistics. The right panel renders normally for these items: Quick Info plus the Fields and Referrers tabs (the {id}.json file, when present, carries shared/system fields worth inspecting). The Presentation, Html, and Analysis tabs are hidden — they're gated by IsPage, which requires HasVersion === true.

The Presentation and Html tabs are further hidden when the corresponding index.json / index.html file is missing on disk (server probes for them via resolvePresentationFile and reports HasPresentationJson / HasPresentationHtml on each item).

Item loading (`pages/api/item/[id].js`)

Accepts any GUID format (with/without braces, any case). Normalizes the id and probes the on-disk file under <DATA_DIR>/items/<language>/ from deepest sharding to shallowest, then the legacy layout:

<x>/<y>/<z>/<uuid>.json — 3-level sharded
<x>/<y>/<uuid>.json — 2-level sharded
<x>/<uuid>.json — 1-level sharded
<uuid>.json — flat, no braces
{uuid}.json — legacy unsharded layout (lowercase, with curly braces)

<language> is the value of the ?lang=<code> query string parameter on the request (resolved by the frontend from the in-app language selector).

UI state (`pages/index.js`)

State is managed with React hooks, no external state library.

Key state:

tree — full tree array from /api/tree
flatNodes — flat array of all selectable nodes (hasFile: true), used for search
selectedId — currently selected item ID
expandedIds — Set of node keys that are expanded; real nodes use their UUID, virtual nodes use their path string
treeLoaded — boolean, set once when the tree fetch completes
hideEmpty / hideSystem — field filter flags, persisted in localStorage

Initial expansion (important)

expandedIds is set only in the initial selection effect (not in the tree-fetch effect). This avoids a React strict-mode bug where the tree fetch runs twice — the second run would overwrite the correctly-expanded ancestors with only root IDs.

The initial selection effect runs once when both treeLoaded and router.isReady become true. It builds the full toExpand set from scratch:

Root-level node keys
All ancestor keys of the selected item (using node.id ?? node.path so virtual ancestors are included)
The selected item's own key

Selection flow

Page load — initial selection effect reads ?id= from URL and expands ancestors, or falls back to /sitecore/content; sets document.title to the item path
User click / GUID link / path breadcrumb — selectAndExpand updates state immediately and calls router.push (adds to browser history); document.title is updated in the routeChangeComplete event (fired after history.pushState completes) so each history entry is stamped with the correct path
Browser back/forward — a useEffect on router.query.id detects when the URL changes externally and calls applySelection to sync state

Note on title timing: document.title is managed imperatively (no <title> in JSX) to prevent Next.js from resetting it to the default on every render, which would corrupt history entry titles.

Tree search

flatNodes is built once when the tree loads. The useMemo-derived searchResults splits the query on whitespace and requires all words to appear in the item name (case-insensitive substring). Capped at 60 results.

Results are ranked by rankNameMatch(name, query):

| Rank | Condition | |------|-----------| | 0 | Name equals the query (case-insensitive, braces stripped) | | 1 | Name starts with the query | | 2 | Name contains the query | | 3 | Matched only via Item ID (no name match) |

Results are sorted by rank ascending, then alphabetically by name. The same ranking is applied to the field filter (Filter fields by name, ID or group…), the Template tab items list, and the full-text search results (server-side via indexer.js#rankNameMatch).

GUID detection

FieldValue handles three cases in order:

Single standalone GUID — entire value is a GUID; rendered as one GuidLink
Pipe-separated GUID list — value split by | where every non-empty token is a GUID (Multilist, Treelist fields); each token rendered as a GuidLink
Text with embedded GUIDs — any other value (short or long, including rich text HTML and Sitecore XML) is scanned with GUID_SCAN_RE via renderInlineGuids, which splits the string at every GUID and interleaves plain text with GuidLink elements

In all three cases, a Links bulleted list is rendered below the field value for every GUID found. Each bullet shows:

• {GUID}  /item/path
• {GUID}  — Item Not Found

The GUID is a clickable link. Items present in the index show their path; items not in the index show "Item Not Found" in red.

Two GUID formats are recognised during inline scanning:

Hyphenated — {xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx} with or without braces
Compact 32-hex — F8508D17FFF349AD81D91E38B71E94D6 with or without braces; negative lookarounds prevent matching a substring of a longer hex string; automatically expanded to hyphenated form for navigation

normalizeGuid handles both formats and always returns a lowercase hyphenated UUID or null.

nodeById is a useMemo-derived map (id → node) built from flatNodes in Home and passed down to ItemDetail → FieldValue to resolve paths for the bullet list without any extra API calls.

QuickInfoRow renders Item ID and Template ID as GUID links via the same normalizeGuid check.

Resizable panel splitter

A 5px .panel-resizer div sits between .tree-panel and .detail-panel. On mousedown, global mousemove/mouseup listeners track the cursor's clientX and update panelWidth state (clamped to 200–480px), which is applied as an inline width style on .tree-panel. The CSS min/max-width constraints are removed from .tree-panel; clamping is done entirely in the JS handler.

Tree panel collapse

A treeCollapsed state (persisted to localStorage under treePanelCollapsed) toggles a compact mode where .tree-panel is fixed at ~32px wide via the .tree-panel--collapsed class, all body content (search box, tree scroll, action buttons) is hidden, and the resize handle is hidden. The toggle button lives where the "CONTENT TREE" label normally sits and displays ◀ when expanded or ▶ when collapsed.

Path breadcrumb

Each segment of item.Path is rendered as a link only if its full cumulative path exists in navigablePaths — a Set<string> of lowercase paths derived from flatNodes. Segments without a matching file on disk are rendered as plain text.

CI/CD

GitHub Actions workflow (.github/workflows/ci.yml) runs on every push/PR to main:

build job — installs dependencies (npm ci) and runs npm run build (esbuild bundle verification)
release-and-publish job — on successful push to main (skips chore: bump version commits):
- Publishes the current version to npm (npm publish --access public)
- Tags the commit with v{version}
- Bumps the patch version in package.json for the next release
- Commits chore: bump version to v{next} and pushes with the tag

To release a major/minor version, manually edit the version in package.json before merging (e.g. 1.0.0 → 2.0.0); CI will publish that version and then bump to 2.0.1.

Requires an NPM_TOKEN repository secret for publishing.

Known Limitations / Future Work

Read-only — no editing capability
~~No field-value search~~ — full text search across all field values is now supported via the Full Text Search modal
No icons by template — page items use 🌐, others use a CSS document icon; no per-template icon mapping
Large field values — values over 300 characters are shown as a pre-formatted block but not truncated/collapsed
Items outside the tree — if a GUID link points to an item not in the tree (e.g. from another database), the detail panel loads correctly via the API but the tree does not scroll to or highlight it
No media preview — data/media is ignored

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Siphon Explorer

Purpose

Features

Content tree (left panel)

Item detail (right panel)

Navigation

Usage

Data Format

Index file (_index.json or <language>_index.json)

Item files

Project Structure

Architecture Notes

Server startup

Tree building (pages/api/tree.js)

Page detection (IsPage)

Item loading (pages/api/item/[id].js)

UI state (pages/index.js)

Initial expansion (important)

Selection flow

Tree search

GUID detection

Resizable panel splitter

Tree panel collapse

Path breadcrumb

CI/CD

Known Limitations / Future Work

Index file (`_index.json` or `<language>_index.json`)

Tree building (`pages/api/tree.js`)

Item loading (`pages/api/item/[id].js`)

UI state (`pages/index.js`)