npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

qtools-graph-forge-core

v1.4.6

Published

Shared infrastructure for building queryable Neo4j graph databases from source data — container management, loading, embedding, search, bridges, and askMilo tool generation

Downloads

1,238

Readme

graph-forge-core

The engine that turns source data files into queryable Neo4j graph databases with semantic search.

What It Does

Input: A source data file in any format (JSON, RDF/XML, TSV, text documents, CRM exports) plus a parser that converts it to a standard node/edge format.

Transformation: Loads nodes and edges into a Neo4j container, generates Voyage AI vector embeddings for semantic search, creates BM25 fulltext indexes for keyword search, builds cross-standard bridge edges between data sources, and generates askMilo provider.json so the graph is immediately queryable through the web UI.

Infrastructure: Each graph runs in a Docker container with Neo4j Community Edition. The containerManager creates containers on demand (docker run with --restart unless-stopped), scans for available ports starting at 7700 (using docker ps to avoid collisions), and manages the full lifecycle (start, stop, status, destroy). Docker Desktop must be running. Each container uses ~300-700MB RAM.

Standalone containers (for development/testing) are prefixed gf_ (e.g., gf_edmatrix). Production data goes into the shared rag_DataModelExplorer container via the --target=dme export path.

Output: A running Neo4j graph database with:

  • Labeled nodes with properties, relationships, and vector embeddings
  • Per-source fulltext and vector indexes for scoped search
  • Cross-standard bridge edges (similarity-based and rule-based)
  • askMilo tool definitions (search, explore, stats, raw Cypher, history)
  • A timestamped audit trail of every operation

The only custom code per data source is a parser (~50-100 lines) that reads the source format and produces standard node objects. Everything else — container management, loading, embedding, indexing, searching, bridge building, provider generation — is handled by this library.

This is NOT a CLI module — it lives at cli/graph-forge-core/ (peer of cli/lib.d/, not inside it) and is require()'d by forge CLIs via relative path.

Modules

| Module | Purpose | Used By | |--------|---------|---------| | forgeRunner.js | Shared CLI engine — config-driven dispatcher for all forge actions | Every forge CLI (30-line config → full CLI) | | containerManager.js | Docker Neo4j lifecycle: create, start, stop, destroy, port scanning | forgeRunner, rebuildDme | | graphLoader.js | Parser output → Neo4j nodes/edges with dual labels + GraphSource root | forgeRunner | | graphEmbedder.js | Voyage-4 vector embeddings + per-source fulltext/vector index creation | forgeRunner | | graphSearchTool.js | Generic hybrid BM25 + vector search, stats, explore, rawCypher, history | forgeRunner, generated search CLIs | | bridgeBuilder.js | Execute bridge spec JSON files to create cross-standard edges | forgeRunner | | providerGenerator.js | Auto-generate askMilo provider.json + search CLI from graph schema | forgeRunner | | graphHistory.js | Timestamped audit trail — records every load, embed, bridge, wipe | graphLoader, graphEmbedder, bridgeBuilder | | voyageClient.js | Single Voyage AI embedding client (voyage-4, 1024d) — THE source of truth | graphEmbedder, bridgeBuilder, graphSearchTool |

forgeRunner — The Shared Engine

Every forge CLI is ~30 lines of config that calls forgeRunner(config). forgeRunner handles:

  • Bootstrap (config loading, process.global, targets.ini)
  • Help text generation (dynamic from config)
  • Command dispatch (all standard actions)
  • Connection resolution (standalone container or external target)
  • Super-label application post-load
// Example: forge-edmatrix/forgeEdMatrix.js (entire file)
const forgeRunner = require('../../graph-forge-core/lib/forgeRunner');

forgeRunner({
    graphName: 'EdMatrix',
    superLabel: 'EdMatrix',
    toolPrefix: 'edmatrix',
    cliName: 'forgeEdMatrix',
    displayName: 'EdMatrix education standards graph',
    description: '35 standards across categories, layers, organizations',
    parser: require('./lib/parser'),
    sourceFileName: 'edmatrix-data.json',
    sourceConfigKey: 'forge-edmatrix',
    hasBridges: true,
    hasExport: true,
    forgeDir: __dirname
});

forgeRunner Config Options

| Option | Type | Required | Description | |--------|------|----------|-------------| | graphName | string | yes | Internal name, used as _source and GraphSource name | | superLabel | string | yes | Neo4j label applied to all nodes, used for per-source indexes | | toolPrefix | string | yes | Prefix for generated askMilo tools (must be globally unique) | | cliName | string | no | Display name in help text (default: forge{graphName}) | | displayName | string | no | One-line description for help header | | description | string | no | Full description for provider.json | | parser | function | yes | Parser module: (sourcePath, options, callback) => {} | | sourceFileName | string | no | Default source data filename | | sourceConfigKey | string | no | Key in [sourceData] config section | | sourceParamName | string | no | CLI param name override (default: 'source') | | sourceRequired | boolean | no | If true, source path must be provided (no default) | | hasBridges | boolean | no | Enable -bridge command | | hasExport | boolean | no | Enable -export command (default: true) | | forgeDir | string | yes | __dirname of the forge CLI (for resolving bridge specs, assets) | | postLoadHook | function | no | Called after graphLoader: (parseResult, connInfo, callback) => {} | | additionalLoadOptions | function | no | Returns extra parser options: (getVal, sourceDir) => ({}) | | additionalSourceFiles | object | no | Extra source files: { resolutionMapPath: 'file.tsv' } |

voyageClient — Single Embedding Client

All Voyage AI calls go through this module. One model, one dimension, one place to change.

const voyageClient = require('./voyageClient');
// voyageClient.MODEL = 'voyage-4'
// voyageClient.DIMENSION = 1024
// voyageClient.BATCH_SIZE = 20
voyageClient.embed(['text1', 'text2'], apiKey, (err, embeddings) => {});

Key Conventions

Super-Labels and Indexes

Every source must define a super-label. Vector and fulltext indexes are created ON the super-label, not on :ForgedNode. This ensures bridge similarity queries search only the target source's embeddings.

| Source | Super-Label | Vector Index | Fulltext Index | |--------|------------|--------------|----------------| | CEDS | :CEDS | ceds_vector | ceds_fulltext | | SIF | :SifModel | sif_vector | sif_fulltext | | EdMatrix | :EdMatrix | edmatrix_vector | edmatrix_fulltext | | CareerStories | :CareerStoryModel | careerstories_vector | careerstories_fulltext | | Himed | :HimedModel | himed_vector | himed_fulltext |

Node Labeling

Every node gets three labels:

  1. :ForgedNode — universal marker
  2. Source-specific (:CedsProperty, :SifField, :EdStandard)
  3. Super-label (:CEDS, :SifModel, :EdMatrix)

Required Node Properties

Every node: _id (unique within source), _source (matches GraphSource name), name, description.

Parser Contract — Complete Reference

The parser is the ONLY custom code per data source. It reads a source file and returns a standard array of node objects. The graphLoader handles everything else.

Signature

module.exports = (sourcePath, options, callback) => {
    // sourcePath: absolute path to the source data file
    // options:    object with parser-specific settings (chunkStrategy, resolutionMapPath, etc.)
    // callback:   (error, result) — error is a string (truthy = failure), result is { nodes, metadata }

    callback('', {
        nodes: [ /* array of node objects */ ],
        metadata: { version: '1.0', sourceFormat: 'json' }
    });
};

The Node Object

Each element in the nodes array must have this shape:

{
    id: 'unique-within-source',       // REQUIRED. String. Becomes _id in Neo4j.
                                       // Must be unique within THIS source (not globally).
                                       // Convention: 'type-slugified-name' (e.g., 'standard-ceds', 'type-organizational')

    label: 'EdStandard',              // REQUIRED. String. Becomes a Neo4j node label.
                                       // Use PascalCase. This is the source-specific label
                                       // (the super-label and :ForgedNode are added automatically).

    properties: {
        name: 'CEDS',                 // REQUIRED. String. Human-readable display name.
                                       // Used by -explore, displayed in search results.

        description: 'Common vocab…', // REQUIRED. String. Searchable text.
                                       // Indexed for BM25 fulltext search.
                                       // Embedded as a vector for semantic search.
                                       // Make this as descriptive as possible — it drives search quality.

        // Any additional properties are preserved as-is on the Neo4j node:
        url: 'http://ceds.ed.gov/',
        org: 'US Ed',
        types: 'Organizational, Personal, Event',
        // Numbers, booleans, arrays of strings — all valid Neo4j property types.
    },

    edges: [                           // OPTIONAL. Array of outgoing relationships.
        {
            type: 'HAS_TYPE',         // REQUIRED. String. Neo4j relationship type.
                                       // Convention: UPPER_SNAKE_CASE.

            targetId: 'type-organizational',  // REQUIRED. String. Must match the `id` of another
                                               // node in this same nodes array.

            targetLabel: 'DataCategory',      // OPTIONAL. String. If the target node doesn't exist
                                               // as a top-level node, the loader auto-creates a
                                               // minimal node with this label. Useful for creating
                                               // category/type nodes from edge references.

            properties: {}             // OPTIONAL. Object. Properties set on the relationship.
        }
    ]
}

What the Loader Does With Each Node

  1. Creates the node with labels :ForgedNode:YourLabel (super-label added post-load)
  2. Sets _id from node.id
  3. Sets _source from the forge's graphName config
  4. Copies all node.properties as Neo4j properties
  5. For each edge, matches source and target by _id + _source and creates the relationship
  6. If an edge's targetId doesn't match any top-level node, creates a minimal target node with the targetLabel

Edge Auto-Creation

You don't need to create both ends of a relationship as top-level nodes. If your data has standards with types, you can define type nodes inline via edges:

// This standard references a DataCategory that may not exist yet
{
    id: 'standard-ceds',
    label: 'EdStandard',
    properties: { name: 'CEDS', description: '...' },
    edges: [
        { type: 'HAS_TYPE', targetId: 'type-organizational', targetLabel: 'DataCategory' }
    ]
}

If type-organizational doesn't appear as a top-level node, the loader creates:

(:ForgedNode:DataCategory { _id: 'type-organizational', _source: 'EdMatrix', name: 'type-organizational' })

But it's better to create explicit top-level nodes with proper names and descriptions:

// Explicit node — better for search and display
{ id: 'type-organizational', label: 'DataCategory', properties: { name: 'Organizational', description: 'Data category: Organizational' }, edges: [] },
// Then reference it from the standard
{ id: 'standard-ceds', label: 'EdStandard', properties: { name: 'CEDS', description: '...' }, edges: [
    { type: 'HAS_TYPE', targetId: 'type-organizational', targetLabel: 'DataCategory' }
] }

The Metadata Object

metadata: {
    version: '1.0',           // Version of the source data (shown in GraphSource node)
    sourceFormat: 'json'       // Format identifier (json, rdf, tsv, txt, etc.)
}

Rules

  1. Parser NEVER touches Neo4j. No require('neo4j-driver'). No database connections. Pure data transformation.
  2. Parser NEVER imports graph-forge-core modules. It's standalone — testable without Docker, without a running database.
  3. Use callback pattern. callback(errorString, result). Error is a truthy string on failure, empty string on success.
  4. No async/await. Use callbacks per TQ coding standards.
  5. Every node must have id, label, properties.name, properties.description. The loader and embedder depend on these.
  6. IDs must be unique within the source. Use slugified compound names: 'standard-ceds', 'type-organizational', 'layer-data-dictionary'.
  7. Edge targetIds must match an id in the same source. Cross-source edges are built by bridge specs, not parsers.

Common Patterns

Deduplication: When multiple source records reference the same entity (e.g., many standards published by "1EdTech"), use a Set to track what you've already emitted:

const seenOrgs = new Set();
rawData.forEach((entry) => {
    const orgId = `org-${slugify(entry.org)}`;
    if (!seenOrgs.has(orgId)) {
        seenOrgs.add(orgId);
        nodes.push({ id: orgId, label: 'Organization', properties: { name: entry.org, description: `Publisher: ${entry.org}` }, edges: [] });
    }
    // Edge from standard to org
    standardNode.edges.push({ type: 'PUBLISHED_BY', targetId: orgId, targetLabel: 'Organization' });
});

Slugification: Convert names to URL-safe lowercase IDs:

const slugify = (str) => str.toLowerCase().replace(/[^a-z0-9]+/g, '-').replace(/(^-|-$)/g, '');

Testing: Run the parser standalone to verify counts before loading:

const parser = require('./lib/parser');
parser('/path/to/data.json', {}, (err, result) => {
    console.log(`Nodes: ${result.nodes.length}`);
    const labels = {};
    result.nodes.forEach(n => { labels[n.label] = (labels[n.label] || 0) + 1; });
    console.log('Labels:', labels);
});

Real Parser Examples

| Parser | Lines | Source Format | Nodes Produced | Complexity | |--------|-------|-------------|----------------|------------| | forge-edmatrix | 87 | JSON array of standards | 83 (standards + categories + layers + orgs + formats) | Simple | | forge-ceds-rdf | 200 | RDF/XML ontology | 23,238 (classes + properties + option sets + values) | Complex (XML parsing) | | forge-sif-tsv | 614 | TSV specification + resolution map | 23,005 (objects + fields + types + codesets + XML elements) | Complex (RefId resolution) | | forge-career-stories | 223 | Text documents (glob) | Variable (documents + chunks) | Medium (text chunking) | | forge-himed | 150 | Tab-delimited CRM export | ~26,000 (call reports + customers + contacts) | Medium (deduplication) |

Directory Structure

cli/
  graph-forge-core/           ← THIS LIBRARY (peer of lib.d/, not inside it)
    lib/
      forgeRunner.js          ← shared CLI engine
      containerManager.js     ← Docker Neo4j lifecycle
      graphLoader.js          ← nodes → Neo4j
      graphEmbedder.js        ← Voyage embeddings + indexes
      graphSearchTool.js      ← search, stats, explore, rawCypher, history
      bridgeBuilder.js        ← cross-standard edge creation
      providerGenerator.js    ← askMilo provider.json generation
      graphHistory.js         ← audit trail
      voyageClient.js         ← Voyage AI client (single source of truth)
    package.json              ← dependencies (neo4j-driver, qtools-*)
    node_modules/
  lib.d/
    forge-edmatrix/           ← 30 lines + parser
    forge-ceds-rdf/           ← 30 lines + parser (will be refactored)
    forge-sif-tsv/            ← 30 lines + parser (will be refactored)
    forge-career-stories/     ← 30 lines + parser (will be refactored)
    forge-himed/              ← 30 lines + parser (will be refactored)
    rebuild-dme/              ← orchestration script
    rebuild-himed/            ← orchestration script

How Forge CLIs Require This Library

From cli/lib.d/forge-xxx/forgeXxx.js:

const forgeRunner = require('../../graph-forge-core/lib/forgeRunner');

The path ../../graph-forge-core/ goes up from lib.d/forge-xxx/ to cli/, then into graph-forge-core/.