@uniformdev/siphon-explorer
v1.0.11
Published
Siphon Explorer
Keywords
Readme
Siphon Explorer
A local developer tool for browsing Sitecore content tree data exported to JSON files. Runs as a Node.js CLI that spins up a Next.js web app on localhost.
Purpose
When working with exported Sitecore databases, it's useful to be able to explore the content tree, inspect item fields, and follow references between items — without needing a running Sitecore instance. Siphon Explorer provides a read-only UI that mirrors the Sitecore Content Editor interface.
Features
Content tree (left panel)
- Collapsible panel — the "CONTENT TREE" label in the top-left corner is a toggle button (◀ / ▶); click it to collapse the entire tree panel down to a thin vertical bar, reclaiming the space for the item detail view. Click again to expand back to the previous width. Collapsed state is persisted in
localStorage(treePanelCollapsed). While collapsed, the resize handle is hidden. - Full Sitecore item hierarchy, expandable/collapsible, sorted by
Sortorder - Virtual folder nodes — path segments missing from the index are shown as non-selectable grey 📁 folders so the tree is always structurally complete
- "content" first under
/sitecore— thecontentnode is always sorted first among its siblings - "Data" child always first — under any page item, the "Data" child node is sorted to the top regardless of
Sortorder(uses the IsPage check, or rawHasLayoutin Legacy counting mode) - Icons — pages show 🌐; items with a file on disk show a yellow CSS folder icon; items named
Datawith a file show a green folder icon; virtual path-gap folders show 📁; items in the index but without a JSON file show no icon and are not selectable. The "is page" definition uses the IsPage check by default; Legacy counting downgrades it to rawHasLayout - Language selector in the panel header — dropdown listing every language present in the data directory (any subfolder of
items/that has a matching_index.json). Selection persists in the URL as?lang=<code>(the default is omitted). Changing the language reloads the tree, item detail, and full-text index for the new language. - Child / descendant counts — nodes with children show gray counts on the right:
{children} ({descendants} items {x}%, {descendantPages} pages {y}%)where item % is relative to all items in the dataset and page % is relative to total counted pages. By default (smart counting), only pages that are (a) descendants of a node namedHomeand (b) not inside aDatasubfolder of another page are counted. The Legacy counting checkbox next to the sort control switches to the original behavior: all items withHasLayoutunder/sitecore/contentare counted. State is persisted in the URL as?legacy=1 - Sort mode — dropdown with
__SortOrder(default),Alphabet, orPages Count. State is persisted in the URL as?sort=alphabetor?sort=pagesCount(defaultsortorderis omitted) - Search —
Ctrl+E(orCmd+E) opens a search box; results filter by item name or item ID with multi-word support (all words must match, e.g.cha eugenefindsCHA Website Eugene's Copy); curly braces are stripped from search words so{guid}andguidmatch identically; each result displays the same icon as in the tree (🌐, 🔴, folder, paper); results are ranked exact-match-first — items whose name exactly equals the query come first, then prefix matches, then substring matches, then items matched only by ID (so searchinghelloreturnshellobeforehello1andhello-world); when a GUID is entered, an additional lookup checks whether it matches any item's Item ID, Composition ID, or Entry IDs — if found, the matched item appears at the top of results with a label showing which ID type matched (and is de-duplicated from the regular results);Escapecloses - Full Text Search — click "Full Text Search" link in the panel header to open a modal; searches across all field values; supports glob-style patterns:
abc*defmatches text containingabcthendefanywhere after it (ignoring line breaks); results are ranked exact-match-first on item name (exact → prefix → substring → matched only by path/field text); indexing runs in the background on startup and is resumable across restarts via asearch-cache.jsonfile in the data directory
Item detail (right panel)
- Header icon — the detail header shows the same icon as the tree (🌐, 🔴, folder, paper) instead of a static 🗃️
- Quick Info section: Display name, Slug, Item ID, Item name, Item path, Composition id, Entry id, Template, Template ID, Language, Database. Composition id is shown only for pages (per the IsPage check); for non-page items it renders as empty
- Item path breadcrumb — each path segment is a clickable link that navigates to that item; segments without a corresponding file on disk are shown as plain text (not clickable)
- Fields grouped by section, sections sorted by
Sortorder; field name and value separated by a fixed-width vertical divider - Referrers tab — shows all items whose field values contain the current item's GUID; uses the same background full-text index as Full Text Search; available on all items (pages and non-pages alike)
- Presentation tab — shown for pages (per the IsPage check) that have a presentation file on disk; displays the parsed
index.jsonas a tree of placeholders and renderings. The resolved absolute file path of theindex.jsonis shown at the top of the tab so you can see exactly which folder (presentation/<language>/…orhtml/<language>/…) the data came from. - Html tab — shown for pages that have a presentation file available (i.e. alongside the Presentation tab); displays the raw source of the
index.htmlfile sitting next to theindex.jsonfile used by the Presentation tab (same folder resolution:presentation/<language>/<sub-path>/index.jsonthenhtml/<language>/<sub-path>/index.json, where<sub-path>is the item's path segments after the firsthomeancestor, matched case-insensitively) as preformatted text inside a<pre><code>block — the HTML is not rendered or injected into the DOM. The resolved absolute file path of theindex.htmlis shown at the top of the tab. If the HTML file does not exist, shows a "not found" state. - Analysis tab — shown alongside the Presentation tab (last in the tab row) for pages with a presentation file; fetches the same presentation data and displays a "Components" section header with a bulleted list of every unique component (rendering) referenced anywhere in the presentation tree. Each bullet shows the component name, a clickable GUID link to its rendering definition (when available), and the item path of the rendering definition if it is present in the index.
- Template tab — shown only when viewing a Template item (TemplateID
{AB86861A-6030-46C5-B394-E8F99E8B87DB}); lists all items based on this template; includes a search box to filter by name/path and pagination (50 items per page) for large result sets; results are ranked exact-match-first on item name - Hide empty fields checkbox — hides fields with no value; state persisted in
localStorage - Hide system fields checkbox — hides fields whose name starts with
__; state persisted inlocalStorage - GUID navigation — any GUID value (Item ID, Template ID, field values, pipe-separated multilist/treelist values) is rendered as a clickable blue link that navigates to that item
- Resizable panels — drag the divider between the tree and detail panels to adjust the split; constrained between 200px and 480px
- Copy buttons — each field value and Quick Info row shows a copy button (⎘) on hover that copies the raw value to the clipboard; turns green (✓) briefly on success
Navigation
- URL persistence — selected item ID stored as
?id=<uuid>; language as?lang=<code>, sort mode as?sort=<mode>, legacy counting as?legacy=1, and active detail tab as?tab=<tab>(only non-default values are written so the URL stays clean). F5 restores the same item with ancestors expanded plus the chosen language/sort/counting/tab - Browser back/forward — each item selection is a
router.pushentry; Back/Forward work as expected; the tree panel automatically scrolls to keep the selected item visible - Browser history titles —
document.titleis updated toSiphon Explorer — /sitecore/content/…on each selection, so Chrome history shows the item path rather than a generic URL - Default selection — on first load (no URL param),
/sitecore/contentis pre-selected if present
Usage
# Install dependencies (once)
npm install
# Start the browser (defaults to port 5000)
node bin/siphon-explorer.js ./data
# Override the port explicitly
node bin/siphon-explorer.js ./data --port 3000
# Or set it via the PORT env var (handy for process managers)
PORT=3000 node bin/siphon-explorer.js ./data
# Or if installed globally via npm link / npm install -g
siphon-explorer ./data
# Run unit tests
npm testThe language is selectable at runtime via the dropdown in the tree panel header — no restart required. The list of languages is auto-discovered from the data directory (every subfolder of items/ that has a matching _index.json). The initial selection on first load is picked in this order:
enif present- otherwise
en-US,en-UK,en-CA,en-AU(first one that's present) - otherwise the first language alphabetically
After that, ?lang=<code> in the URL takes precedence — share/bookmark links to land on a specific language.
The server bundles the React frontend with esbuild at startup (takes < 1 second), then serves everything from a single HTTP server — no Next.js, no build step required.
Then open http://localhost:3000 in your browser.
<data-dir> must be the root of the exported Sitecore data, containing an items/ subdirectory (see Data Format below).
Data Format
The tool expects the following structure inside the data directory:
<data-dir>/
items/
<language>/
_index.json # item index (preferred location)
[<x>/[<y>/[<z>/]]]<uuid>.json # one file per item, optionally sharded by leading hex chars
{uuid}.json # legacy unsharded layout (still supported as a fallback)
<language>_index.json # alternative index location (checked first)<language> is the language code chosen via the in-app selector (e.g. en, en-US). All paths below are resolved under that subfolder. The list of available languages is computed by scanning subfolders of items/ and keeping those that have either an inner _index.json or a sibling <language>_index.json.
Index file (_index.json or <language>_index.json)
A flat JSON object keyed by lowercase UUID without braces:
{
"0de95ae4-41ab-4d01-9eb0-67441b7c2450": {
"ItemID": "0de95ae4-41ab-4d01-9eb0-67441b7c2450",
"Name": "content",
"DisplayName": null,
"Path": "/sitecore/content",
"ParentId": "11111111-1111-1111-1111-111111111111",
"Sortorder": null,
"HasLayout": false,
"TemplateName": "Main section",
"TemplateID": "e3e2d58c-df95-4230-adc9-279924cece84",
...
}
}Special entries:
00000000-0000-0000-0000-000000000000— fake/virtual root used as a sentinel; skipped entirely11111111-1111-1111-1111-111111111111— the real Sitecore root (/sitecore)
The HasLayout boolean is used to determine page icons (🌐) and "Data first" sort order.
Item files
Items are stored in one of these layouts (the explorer accepts any of them, on a per-file basis — different items may live at different depths):
| Path under items/<language>/ | Notes |
|--------------------------------------------|--------------------------------------------------------|
| <uuid>.json | Flat, no shard folders |
| <x>/<uuid>.json | Sharded by first hex char |
| <x>/<y>/<uuid>.json | Sharded by first two hex chars (current default) |
| <x>/<y>/<z>/<uuid>.json | Sharded by first three hex chars |
| {uuid}.json | Legacy unsharded layout (lowercase, with curly braces) |
<x>, <y>, <z> are the leading hex characters of the UUID (e.g. for dac24edd-… they are d, a, c). <uuid> is the UUID without braces. Filename and folder casing are matched as written on disk; lookups and directory scans are case-insensitive on Windows/macOS and try lowercase by default on Linux.
The legacy {uuid}.json layout only applies at the top of items/<language>/; sharded folders only contain the no-braces form.
{
"ItemID": "{0DE95AE4-41AB-4D01-9EB0-67441B7C2450}",
"Name": "content",
"DisplayName": null,
"Path": "/sitecore/content",
"TemplateName": "Main section",
"TemplateID": "{E3E2D58C-DF95-4230-ADC9-279924CEE84}",
"Language": "en",
"Database": "web",
"Fields": [
{
"ID": "some-guid",
"Name": "__Display name",
"DisplayName": "Display name",
"Type": "Single-Line Text",
"Value": "Content",
"Section": {
"Name": "Appearance",
"Sortorder": 100
}
}
]
}Subdirectories data/media and data/presentation are ignored.
Project Structure
siphon-explorer/
├── bin/
│ └── siphon-explorer.js # CLI entry point — parses args, sets env vars, starts server
├── pages/
│ ├── _app.js # Next.js app wrapper
│ ├── index.js # Main UI: tree panel + detail panel
│ └── api/
│ ├── tree.js # GET /api/tree — builds and returns the full tree
│ └── item/
│ └── [id].js # GET /api/item/:id — returns a single item's JSON
├── styles/
│ └── globals.css # All styles (no CSS modules, no Tailwind)
├── server.js # Custom Next.js HTTP server
├── next.config.js
└── package.jsonArchitecture Notes
Server startup
bin/siphon-explorer.js sets DATA_DIR and PORT as environment variables before require('../server'). The language is per-request: every API endpoint reads ?lang=<code> from the query string and passes it to its file-reading helper. The server exposes /api/languages which lists available languages and the resolved default; the frontend calls this on load and stamps ?lang=<code> onto every subsequent request. The tree cache and indexer state are both keyed by language so switching back-and-forth doesn't trigger a rebuild for an already-loaded language.
Tree building (pages/api/tree.js)
The tree is built entirely from item paths, not from ParentId. This is more reliable because ParentId references can point to items not present in the index.
The built tree is cached by language: treeCache: Map<language, string>. The pre-warm task on startup builds the cache for the resolved default language only — other languages are built on first request.
Single-language retention. Only the currently-active language's caches are kept hot. When a request resolves to a different language than the last one served, the server evicts the previous language's treeCache entry and calls indexer.evict(prevLanguage) to drop its contentMap and FlexSearch index. This bounds memory at roughly one language's worth of state regardless of how many times the user switches — large datasets (200k+ items) would otherwise accumulate ~1–2 GB per visited language and exhaust Node's default heap. Switching back to a previously-visited language re-reads the on-disk search-cache-<language>.json and rebuilds the FlexSearch index in seconds rather than from scratch.
Algorithm:
- Read the item index (
_index.json). - Walk the items directory once with
fs.readdirSync, recursing into single-hex-char folders up to 3 levels deep, into aSet<string>of existing item UUIDs (lowercased, no braces). Both sharded (<x>/<y>/<uuid>.jsonetc.) and legacy ({uuid}.json) filenames populate the same set. Per-itemhasFileis then an O(1) Set lookup rather than afs.existsSyncsyscall per item — this is a ~200× speedup for large (100k+) datasets. - Build a
byPathmap (lowercase path → node) from the index. Each node includeshasLayoutandhasFile. - For each real node, call
ensureNode(parentPath)— this recursively creates virtual folder nodes for any missing path segments and links them upward all the way to the root. - Roots are detected as nodes not referenced as a child by any other node.
- Children are sorted by
Sortorder; under page nodes (hasLayout: true), a child named"Data"is always sorted first.
The built tree is cached in memory as a pre-serialized JSON string per language. After the HTTP server starts listening, setImmediate(() => buildTree(...)) warms the cache for the default language so the user's first /api/tree request returns immediately — on a 200k-item dataset this takes the first browser load from ~18s down to ~0.4s (just the raw transfer of the ~60 MB JSON payload over localhost). Subsequent requests hit the same cached string. The cache is invalidated when the active language changes (see Single-language retention above) or on server restart.
Virtual nodes have id: null and virtual: true. They are expandable but not selectable, displayed with a 📁 icon and grey text.
Nodes whose {id}.json file does not exist on disk have hasFile: false. They appear in the tree (with no icon) but are not selectable and do not trigger a detail panel fetch.
Icon mapping summary (evaluated top-to-bottom — first match wins):
| Condition | Icon |
|-----------|------|
| Virtual path-gap folder | 📁 emoji |
| isPage (smart-counted page) | 🌐 emoji |
| isPage (non-smart page in default mode) | 🔴 emoji |
| hasVersion: false | Yellow CSS folder |
| name === "Data" and hasFile | Green CSS folder |
| hasFile with children | Yellow CSS folder |
| hasFile leaf | Green CSS paper |
| hasFile: false (index-only) | (none) |
Page detection (IsPage)
A node/item is treated as a page when all of the following hold (a port of the C# IsPage extension used by upstream tooling on these files):
HasLayout === trueHasVersion === true(the item has a version in the current language)TemplateIDis not the Email Message Root template{3F12D78C-B7B7-4157-98FC-DA3322EE1A5B}(/sitecore/templates/System/Email/Messages/Inner Content/Message Root) — these have a layout but are content fragments, not pages.
IsPage (server-side) and node.isPage (tree node) drive page icons, "Data first" sorting, Composition ID display, and Presentation/Html/Analysis tab visibility.
HasVersion: false items render as a yellow folder regardless of children, file presence, or HasLayout. They are excluded from descendant items and descendant page counts in tree statistics. The right panel renders normally for these items: Quick Info plus the Fields and Referrers tabs (the {id}.json file, when present, carries shared/system fields worth inspecting). The Presentation, Html, and Analysis tabs are hidden — they're gated by IsPage, which requires HasVersion === true.
The Presentation and Html tabs are further hidden when the corresponding index.json / index.html file is missing on disk (server probes for them via resolvePresentationFile and reports HasPresentationJson / HasPresentationHtml on each item).
Item loading (pages/api/item/[id].js)
Accepts any GUID format (with/without braces, any case). Normalizes the id and probes the on-disk file under <DATA_DIR>/items/<language>/ from deepest sharding to shallowest, then the legacy layout:
<x>/<y>/<z>/<uuid>.json— 3-level sharded<x>/<y>/<uuid>.json— 2-level sharded<x>/<uuid>.json— 1-level sharded<uuid>.json— flat, no braces{uuid}.json— legacy unsharded layout (lowercase, with curly braces)
<language> is the value of the ?lang=<code> query string parameter on the request (resolved by the frontend from the in-app language selector).
UI state (pages/index.js)
State is managed with React hooks, no external state library.
Key state:
tree— full tree array from/api/treeflatNodes— flat array of all selectable nodes (hasFile: true), used for searchselectedId— currently selected item IDexpandedIds—Setof node keys that are expanded; real nodes use their UUID, virtual nodes use their path stringtreeLoaded— boolean, set once when the tree fetch completeshideEmpty/hideSystem— field filter flags, persisted inlocalStorage
Initial expansion (important)
expandedIds is set only in the initial selection effect (not in the tree-fetch effect). This avoids a React strict-mode bug where the tree fetch runs twice — the second run would overwrite the correctly-expanded ancestors with only root IDs.
The initial selection effect runs once when both treeLoaded and router.isReady become true. It builds the full toExpand set from scratch:
- Root-level node keys
- All ancestor keys of the selected item (using
node.id ?? node.pathso virtual ancestors are included) - The selected item's own key
Selection flow
- Page load — initial selection effect reads
?id=from URL and expands ancestors, or falls back to/sitecore/content; setsdocument.titleto the item path - User click / GUID link / path breadcrumb —
selectAndExpandupdates state immediately and callsrouter.push(adds to browser history);document.titleis updated in therouteChangeCompleteevent (fired afterhistory.pushStatecompletes) so each history entry is stamped with the correct path - Browser back/forward — a
useEffectonrouter.query.iddetects when the URL changes externally and callsapplySelectionto sync state
Note on title timing:
document.titleis managed imperatively (no<title>in JSX) to prevent Next.js from resetting it to the default on every render, which would corrupt history entry titles.
Tree search
flatNodes is built once when the tree loads. The useMemo-derived searchResults splits the query on whitespace and requires all words to appear in the item name (case-insensitive substring). Capped at 60 results.
Results are ranked by rankNameMatch(name, query):
| Rank | Condition | |------|-----------| | 0 | Name equals the query (case-insensitive, braces stripped) | | 1 | Name starts with the query | | 2 | Name contains the query | | 3 | Matched only via Item ID (no name match) |
Results are sorted by rank ascending, then alphabetically by name. The same ranking is applied to the field filter (Filter fields by name, ID or group…), the Template tab items list, and the full-text search results (server-side via indexer.js#rankNameMatch).
GUID detection
FieldValue handles three cases in order:
- Single standalone GUID — entire value is a GUID; rendered as one
GuidLink - Pipe-separated GUID list — value split by
|where every non-empty token is a GUID (Multilist, Treelist fields); each token rendered as aGuidLink - Text with embedded GUIDs — any other value (short or long, including rich text HTML and Sitecore XML) is scanned with
GUID_SCAN_REviarenderInlineGuids, which splits the string at every GUID and interleaves plain text withGuidLinkelements
In all three cases, a Links bulleted list is rendered below the field value for every GUID found. Each bullet shows:
• {GUID} /item/path
• {GUID} — Item Not FoundThe GUID is a clickable link. Items present in the index show their path; items not in the index show "Item Not Found" in red.
Two GUID formats are recognised during inline scanning:
- Hyphenated —
{xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}with or without braces - Compact 32-hex —
F8508D17FFF349AD81D91E38B71E94D6with or without braces; negative lookarounds prevent matching a substring of a longer hex string; automatically expanded to hyphenated form for navigation
normalizeGuid handles both formats and always returns a lowercase hyphenated UUID or null.
nodeById is a useMemo-derived map (id → node) built from flatNodes in Home and passed down to ItemDetail → FieldValue to resolve paths for the bullet list without any extra API calls.
QuickInfoRow renders Item ID and Template ID as GUID links via the same normalizeGuid check.
Resizable panel splitter
A 5px .panel-resizer div sits between .tree-panel and .detail-panel. On mousedown, global mousemove/mouseup listeners track the cursor's clientX and update panelWidth state (clamped to 200–480px), which is applied as an inline width style on .tree-panel. The CSS min/max-width constraints are removed from .tree-panel; clamping is done entirely in the JS handler.
Tree panel collapse
A treeCollapsed state (persisted to localStorage under treePanelCollapsed) toggles a compact mode where .tree-panel is fixed at ~32px wide via the .tree-panel--collapsed class, all body content (search box, tree scroll, action buttons) is hidden, and the resize handle is hidden. The toggle button lives where the "CONTENT TREE" label normally sits and displays ◀ when expanded or ▶ when collapsed.
Path breadcrumb
Each segment of item.Path is rendered as a link only if its full cumulative path exists in navigablePaths — a Set<string> of lowercase paths derived from flatNodes. Segments without a matching file on disk are rendered as plain text.
CI/CD
GitHub Actions workflow (.github/workflows/ci.yml) runs on every push/PR to main:
- build job — installs dependencies (
npm ci) and runsnpm run build(esbuild bundle verification) - release-and-publish job — on successful push to
main(skipschore: bump versioncommits):- Publishes the current version to npm (
npm publish --access public) - Tags the commit with
v{version} - Bumps the patch version in
package.jsonfor the next release - Commits
chore: bump version to v{next}and pushes with the tag
- Publishes the current version to npm (
To release a major/minor version, manually edit the version in package.json before merging (e.g. 1.0.0 → 2.0.0); CI will publish that version and then bump to 2.0.1.
Requires an NPM_TOKEN repository secret for publishing.
Known Limitations / Future Work
- Read-only — no editing capability
- ~~No field-value search~~ — full text search across all field values is now supported via the Full Text Search modal
- No icons by template — page items use 🌐, others use a CSS document icon; no per-template icon mapping
- Large field values — values over 300 characters are shown as a pre-formatted block but not truncated/collapsed
- Items outside the tree — if a GUID link points to an item not in the tree (e.g. from another database), the detail panel loads correctly via the API but the tree does not scroll to or highlight it
- No media preview —
data/mediais ignored
