@healthcare-interoperability/fhir-storage-core

v1.12.0

Published

9 days ago

FHIR R4 MongoDB storage primitives — extensible repository base class, reference resolver, and ingest-engine configuration for healthcare interoperability pipelines.

0High
0Medium
0Low

darkknight

carnivalofrust

ageofai

fhirservice-guyatwork

fhir r4 storage mongodb healthcare interoperability repository hl7

@healthcare-interoperability/fhir-storage-core

FHIR R4 MongoDB storage primitives — extensible repository base class, reference resolver, and ingest-engine configuration for healthcare interoperability pipelines.

Works with both require() (CommonJS) and import (ESM). Node.js ≥ 20.

Install

npm install @healthcare-interoperability/fhir-storage-core
# peer dependencies
npm install mongodb @quicore/hash

Usage

// ESM
import {
  FHIRResourceRepository,
  FHIRReferenceResolver,
  FHIREngineConfig,
  bootstrap,
  createRepositories,
  resolveResourceClasses,
} from '@healthcare-interoperability/fhir-storage-core';

// CommonJS
const {
  FHIRResourceRepository,
  FHIRReferenceResolver,
  FHIREngineConfig,
  bootstrap,
  createRepositories,
  resolveResourceClasses,
} = require('@healthcare-interoperability/fhir-storage-core');

Core concepts

Three dedup modes

Every write is scoped by one of three modes — selected per call via integrationConfig.dedupMode (or the engine config default):

| Mode | _id formula | One document per | |---|---|---| | integration | idHash({scopeId: integrationId, resourceType, fhirId}) | (integration, fhirId) | | parent | idHash({scopeId: parentIntegrationId, resourceType, fhirId}) | (parentIntegration, fhirId) | | customer | idHash({scopeId: integrationId, resourceType, fhirId, customerId}) | (integration, fhirId, customer) |

In integration and parent modes, one resource produces one document (the customerIds array accumulates across upserts). In customer mode, one resource fans out to N documents — one per customerId in the call.

Always-stored fields

Every stored document carries:

_id — hashed per the active dedup mode (see above).
integrationScopedId — always idHash({scopeId: integrationId, resourceType, fhirId}), regardless of mode. Equals _id in integration mode; differs in parent/customer. Use it to find "this logical resource within the integration" across modes.
fhirId — the source FHIR id, preserved verbatim regardless of canonicalization.
integrationId, parentIntegrationId (when applicable), customerIds, dedupMode — tenancy/mode metadata.
idsCanonical — locked on first write; whether the stored body has been canonicalized.
contentHash, meta.versionId, createdAt, updatedAt, optional deletedAt, optional workflow fields (processed, flags, attempts).

Bootstrap

The package exposes a three-step composition for initialising the ingest topology against a MongoDB instance:

import { MongoClient } from 'mongodb';
import {
  bootstrap,
  createRepositories,
  FHIREngineConfig,
} from '@healthcare-interoperability/fhir-storage-core';

const client = await MongoClient.connect(process.env.MONGO_URI);
const db = client.db('fhir');

const config = new FHIREngineConfig({
  defaultDedupMode: 'integration',
  supportedResourceTypes: ['Patient', 'Observation', 'Encounter'],
});

const registry = { Patient: PatientRepository, Observation: ObservationRepository };

// 1. Bootstrap — pings DB, resolves classes, runs initialize() on each.
const resolved = await bootstrap(db, registry, config);

// 2. Instantiate repositories.
const repos = createRepositories(db, resolved, config);

// 3. Use:
const result = await repos.Patient.upsert(patient, { integrationId: 'acme' });

Three exports, each does one thing:

resolveResourceClasses(config, registry) — pure. Merges explicit registry, supportedResourceTypes, and disabledResourceTypes into a frozen map. No I/O.
bootstrap(db, registry, config, options?) — DB ping, resolve, initialize each class. Returns the resolved map so you can pass it to createRepositories without re-resolving.
createRepositories(db, registry, config) — instantiate one repository per class. No I/O.

supportedResourceTypes entries without an explicit registry entry get filled in with a generic handler via either config.defaultResourceRegistryClass.for(type) or FHIRResourceRepository.for(type) (built-in fallback). Bootstrap warns when this happens — generic handlers lack resource-specific search indexes.

API

`FHIREngineConfig`

Normalised configuration for the ingest pipeline. Pass a plain options object; use FHIREngineConfig.from() inside pipeline constructors so callers can pass either a plain object or an existing instance.

const config = new FHIREngineConfig({
  // Mode defaults — overridden per call by integrationConfig.dedupMode etc.
  defaultDedupMode:      'integration',  // 'integration' | 'customer' | 'parent'
  defaultReferencesMode: 'minimal',      // 'all' | 'minimal' | 'none'
  defaultHistoryMode:    'off',          // 'off' | 'async' | 'transactional'

  // Storage form
  collectionPrefix: 'fhir_',
  idsCanonical:     false,               // canonicalize Resource.id + Reference.reference on write

  // Schema validation
  schemaValidationLevel:  'moderate',    // 'off' | 'moderate' | 'strict'
  schemaValidationAction: 'warn',        // 'warn' | 'error'

  // Hash algorithms (see "Hashing" below for the full algorithm table)
  contentHashAlgorithm: 'sha256',
  idHashAlgorithm:      'compact',

  // Pagination defaults
  defaultSearchLimit:     20,
  defaultHistoryLimit:    50,
  defaultExportBatchSize: 500,

  // Write defaults
  defaultBulkWriteOptions: { ordered: false, writeConcern: { w: 1, j: false } },
  defaultWriteConcern:     undefined,    // single-doc writes

  // Resource-type policy
  supportedResourceTypes:       null,    // null = allow all
  disabledResourceTypes:        [],
  allowUnknownResourceTypes:    false,
  defaultResourceRegistryClass: null,    // fallback class for supportedResourceTypes
                                          // without explicit registry entries

  // Hook called at every ingest call
  validateIntegrationConfig: null,
});

Configs are frozen after construction (shallow). Top-level reassignment throws in strict mode; nested options remain mutable (caller-owned references).

Hashing

Backed by @quicore/hash. Two closed algorithm sets:

idHashAlgorithm — used for MongoDB _id values.

| Value | Output | Notes | |---|---|---| | 'compact' (default) | 22-char Base62 | BLAKE2b → Base62. ~131 bits entropy. Recommended. | | 'sha256' | 64-char hex | Larger _id. | | 'sha512' | 128-char hex | Largest. | | 'sha3-256' | 64-char hex | | | 'blake2s256' | 64-char hex | |

contentHashAlgorithm — used for change-detection. Stored as "algo:hex" so a future algorithm change is detectable from existing hashes.

| Value | Output | |---|---| | 'sha256' (default) | 64-char hex | | 'sha512' | 128-char hex | | 'sha3-256' | 64-char hex | | 'sha3-512' | 128-char hex | | 'blake2s256' | 64-char hex |

Custom hash functions

Replace either default entirely:

const config = new FHIREngineConfig({
  idHashFunction: ({ scopeId, resourceType, fhirId, customerId }) =>
    myIdGen(scopeId, resourceType, fhirId, customerId),
  contentHashFunction: (resource) => myHashLib.hash(resource),
});

The idHash contract: required {scopeId, resourceType, fhirId}, optional customerId (present only in customer dedup mode). Required fields must be non-empty strings; the default implementation throws otherwise. Custom implementations own their own input validation.

Or subclass for deeper customisation:

class MyConfig extends FHIREngineConfig {
  contentHash(resource) { return myHashLib.hash(resource); }
  idHash({ scopeId, resourceType, fhirId, customerId }) { /* ... */ }
}

FHIREngineConfig.from(input) — accepts a plain object or an existing instance (pass-through).

`FHIRResourceRepository`

Base class for per-resource-type MongoDB collections. Subclass with at minimum a RESOURCE_TYPE static property:

import { FHIRResourceRepository } from '@healthcare-interoperability/fhir-storage-core';

class PatientRepository extends FHIRResourceRepository {
  static RESOURCE_TYPE = 'Patient';

  // Optional: add indexes specific to this resource type
  static SUGGESTED_INDEXES = [
    { key: { integrationId: 1, 'searchIdx.identifier': 1 }, name: 'idx_patient_identifier' },
  ];

  // Optional: force history mode for this type regardless of engine config
  static HISTORY_MODE = null;  // 'off' | 'async' | 'transactional' | null
}

// Initialise (creates collection + indexes). Idempotent.
await PatientRepository.initialize(db, engineConfig);

// Instantiate
const repo = new PatientRepository(db, engineConfig);

`static for(resourceType)`

Factory for generic, unspecialized handlers — used by bootstrap to fill in supportedResourceTypes that lack explicit registry entries. Returns a fresh subclass with RESOURCE_TYPE set.

const GenericObservationRepo = FHIRResourceRepository.for('Observation');
// ...or, on a custom base class to inherit subclass behavior:
const AuditedObservationRepo = MyAuditedRepository.for('Observation');

Write methods

Both upsert and bulkUpsert share a uniform argument order and return shape.

upsert(resource, integrationConfig, options?)

const result = await repo.upsert(
  fhirPatient,
  { integrationId: 'acme', customerIds: ['C1'] },
  // optional:
  { writeOptions: { session }, aggregateOutput: true }
);

bulkUpsert(resources, integrationConfig, options?)

const result = await repo.bulkUpsert(
  [patient1, patient2, patient3],
  { integrationId: 'acme', customerIds: ['C1', 'C2'], dedupMode: 'customer' },
  { aggregateOutput: true }
);

All resources in a bulkUpsert call share one integrationConfig. Mixed-mode batches are not supported by construction — group writes by integrationConfig at the caller and issue one bulkUpsert per group.

`WriteResult` — uniform return shape

{
  items: [
    { originalIndex, fhirId, _id, customerId, action }
  ],
  counts: { created, modified, matched },
  dedupMode: 'integration' | 'customer' | 'parent',
  resources?: { /* only when options.aggregateOutput is true */ }
}

items — one entry per document written. In customer-mode fan-out, length is customerIds.length × resources.length.
originalIndex — index back into the input resources[] (always 0 for single-doc upsert).
customerId — non-null only in customer mode.
action — 'created' | 'updated' | 'noop' | 'resurrected' for upsert; 'created' | 'written' for bulkUpsert (MongoDB's bulkWrite result is aggregate, not per-document).

`resources` map (opt-in)

Pass options.aggregateOutput: true to include a fhirId → _id lookup map. The leaf shape depends on dedupMode:

// integration / parent — scalar leaf
{ Patient: { 'pt-1': 'hash_xyz' } }

// customer — customerId-keyed object leaf
{ Patient: { 'pt-1': { 'C1': 'hash_aaa', 'C2': 'hash_bbb' } } }

Downstream code branches on result.dedupMode to read the structure correctly.

Read methods

await repo.findById(_id);                                         // by internal hashed _id
await repo.findManyByIds([_id1, _id2]);                            // batch lookup by _id
await repo.findByFhirId(integrationId, fhirId);                   // by source FHIR id
await repo.search(integrationId, filter, { limit, afterId });     // cursor-paginated
await repo.count(integrationId, filter);
await repo.exportCursor(integrationId, filter, { batchSize });    // for streaming exports

findById and findManyByIds take the internal _id. findByFhirId looks up by the source FHIR id (backed by the idx_base_fhirId index) — in customer mode multiple documents share the same fhirId, so this returns the first match; use search(integrationId, { fhirId }) to retrieve all customer-scoped copies.

History methods (require `historyMode !== 'off'`)

await repo.findByVersion(_id, versionId);
await repo.findHistory(_id, { limit, afterVersion });

Both take the internal _id.

Delete methods

await repo.softDelete(_id);                                        // single doc
await repo.bulkSoftDeleteByFilter(integrationId, { flags: 'stale' }); // all matching filter
await repo.bulkSoftDeleteById(integrationId, [_id1, _id2]);        // specific docs

softDelete is idempotent. Re-upserting a soft-deleted document with new content clears deletedAt ("resurrection" semantics, matching FHIR's update-as-create). bulkSoftDeleteById takes an integrationId as a safety guard against cross-tenant accidents. All bulk methods return { modified, matched }.

Summary update

// Single key
await repo.updateSummary(_id, 'resolvedName', 'Jane Doe');

// Multiple keys
await repo.updateSummary(_id, { resolvedName: 'Jane Doe', riskScore: 4.2 });

Updates fields inside the summary subdocument without re-upserting the full resource. Does NOT touch meta.lastUpdated or contentHash — summary is a derived/denormalized field.

Workflow state methods

Processing lifecycle:

await repo.markAsProcessed(_id);                     // single doc
await repo.bulkMarkAsProcessed([_id1, _id2, _id3]);  // batch — returns { modified, matched }
await repo.findUnprocessed(integrationId, {           // query the partial index
  filter: { flags: 'priority' },
  limit: 100,
  afterId: lastId,
});

Flag operations — single document:

await repo.addFlag(_id, 'needs-review');              // convenience wrapper
await repo.addFlags(_id, ['needs-review', 'high-priority']); // multiple at once
await repo.removeFlags(_id, ['stale', 'duplicate']);  // $pull
await repo.updateFlag(_id,                            // atomic add + remove in one call
  ['reviewed'],                                       //   flags to add
  ['needs-review'],                                   //   flags to remove
  { processed: true },                                //   optional processed state
);

updateFlag uses a pipeline update ($setUnion / $setDifference) so add and remove happen atomically.

Flag operations — bulk:

// Add-only (faster — uses classic update operators, not pipeline)
await repo.addBulkFlags([
  { _id: id1, flags: ['flagA'], processed: false },
  { _id: id2, flags: ['flagB'] },                     // processed omitted = no change
]);

// Add + remove (pipeline update per entry)
await repo.updateBulkFlags([
  { _id: id1, add: ['reviewed'], remove: ['needs-review'], processed: true },
  { _id: id2, add: ['priority'], remove: [] },
]);

Both return { modified, matched }. addBulkFlags uses classic $addToSet operators and is faster than updateBulkFlags which uses pipeline updates — prefer it when no removal is needed.

Monitoring:

const n = await repo.countByFlag(integrationId, 'needs-review');

History modes

| Mode | Behavior | |---|---| | 'off' | No history. Fastest. | | 'async' | Single-doc: findOneAndUpdate with prior-state read, then separate history insertOne. Bulk: prior-state read, bulkWrite, then insertMany for changed docs. Tiny crash window between writes; close it with periodic reconciliation. | | 'transactional' | Both writes inside a Mongo transaction. Atomic, ~25–40% slower. NOT supported by bulkUpsert — use upsert() in a loop. |

Set via engineConfig.defaultHistoryMode or override per-class via static HISTORY_MODE.

`FHIRReferenceResolver`

Standalone class for FHIR reference handling. Importable by callers other than the repository (export servers, Bundle expanders, synthesis layers).

import { FHIRReferenceResolver } from '@healthcare-interoperability/fhir-storage-core';

// The resolver is scope-agnostic. The caller provides an idHash function whose
// closure adds whatever scope context applies.
const resolver = new FHIRReferenceResolver({
  idHash: (input) => config.idHash({ scopeId: 'integ_A', ...input }),
});

The resolver invokes idHash({resourceType, fhirId}) — those two are all it can parse from a reference string. Anything else (scopeId, customerId, etc.) is added by the caller's closure.

// Static parser — exposed so callers can do their own parsing
FHIRReferenceResolver.parseReferenceString('Patient/abc');
// → { type: 'Patient', fhirId: 'abc' }

// Static single-shot resolution
FHIRReferenceResolver.resolveOne('Patient/abc', idHashFn);
// → { id: 'hash_xyz', type: 'Patient', originalReference: 'Patient/abc' }

// Extract resolved references at specific paths (minimal mode)
resolver.extract(resource, ['subject', 'encounter', 'performer.actor']);

// Recursively extract every reference in a resource (all mode — CPU-heavy)
resolver.extractAll(resource);

// Produce a canonicalized copy: Resource.id rewritten to hashed form,
// every embedded Reference.reference rewritten to "Type/<hash>"
resolver.canonicalize(resource);

// Write a resolved reference at a nested path
resolver.placeAt(target, ['contact', 0, 'organization'], 'Organization/o1');

Reference handling rules

| Input | Behavior | |---|---| | Relative (Type/id) | Resolves to hashed id | | Absolute (https://.../Type/id) | Resolves (host discarded) | | Local fragment (#localId) | Returns null (points into contained[]) | | Versioned (Type/id/_history/N) | Stripped to base, then resolved | | URN (urn:uuid:...) | Throws — higher layer must assign concrete ids first |

contained[] and embedded resource keys are not recursed into — they have their own scope.

Requirements

Node.js ≥ 20
MongoDB ≥ 6 (peer dependency)
@quicore/hash (peer dependency)

License

MIT

Changelog

`processedResetPolicy` — unified processed-field control

Breaking: None. Fully backward compatible.

Replaces the two boolean flags (defaultProcessedOnChange + forceProcessedFalse) with a single enum option that clearly expresses all three behaviours in one place.

New API

Engine config (FHIREngineConfig):

const config = new FHIREngineConfig({
  processedResetPolicy: 'always',    // 'always' | 'on-change' | 'never'
});

| Value | Behavior | |---|---| | 'always' | Always set processed: false on every write, regardless of content changes. Downstream processors always pick up the document. | | 'on-change' | (default) Reset processed to false only when content hash changes. No-op writes preserve the existing value. | | 'never' | Only initialise processed: false on first write. Never modified by the upsert pipeline — the processor owns the flag. |

Per-resource-type (static PROCESSED_RESET_POLICY):

class PatientRepository extends FHIRResourceRepository {
  static RESOURCE_TYPE = 'Patient';

  // Static string:
  static PROCESSED_RESET_POLICY = 'always';

  // Or dynamic function — return null to fall back to config:
  static PROCESSED_RESET_POLICY = (op) => {
    if (op.integrationId === 'urgent') return 'always';
    return null;  // use config.processedResetPolicy
  };
}

Resolution order:

static PROCESSED_RESET_POLICY on the subclass (string or function)
If null → deprecated static FORCE_PROCESSED_FALSE / static PROCESSED_ON_CHANGE (mapped)
If still null → config.processedResetPolicy

Deprecated options (still work, mapped internally)

| Old option | Maps to | |---|---| | config.forceProcessedFalse: true | processedResetPolicy: 'always' | | config.defaultProcessedOnChange: true | processedResetPolicy: 'on-change' | | config.defaultProcessedOnChange: false | processedResetPolicy: 'never' | | static FORCE_PROCESSED_FALSE = true | PROCESSED_RESET_POLICY = 'always' | | static PROCESSED_ON_CHANGE = true | PROCESSED_RESET_POLICY = 'on-change' | | static PROCESSED_ON_CHANGE = false | PROCESSED_RESET_POLICY = 'never' |

Existing code using the old booleans continues to work without changes. The deprecated statics and config options are resolved in priority order and mapped to the new enum internally.

Migration

Replace old usage:

- const config = new FHIREngineConfig({ defaultProcessedOnChange: true });
+ const config = new FHIREngineConfig({ processedResetPolicy: 'on-change' });

- const config = new FHIREngineConfig({ forceProcessedFalse: true });
+ const config = new FHIREngineConfig({ processedResetPolicy: 'always' });

- static PROCESSED_ON_CHANGE = true;
- static FORCE_PROCESSED_FALSE = true;
+ static PROCESSED_RESET_POLICY = 'always';

Function form:

- static FORCE_PROCESSED_FALSE = (op) => op.integrationId === 'urgent' ? true : null;
- static PROCESSED_ON_CHANGE = (op) => op.integrationId === 'batch' ? false : null;
+ static PROCESSED_RESET_POLICY = (op) => {
+   if (op.integrationId === 'urgent') return 'always';
+   if (op.integrationId === 'batch') return 'never';
+   return null;
+ };

History mode impact

None. History documents intentionally omit workflow state (processed, flags, attempts). The history write decision is based on content change / resurrection — completely independent of processedResetPolicy.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@healthcare-interoperability/fhir-storage-core

Install

Usage

Core concepts

Three dedup modes

Always-stored fields

Bootstrap

API

FHIREngineConfig

Hashing

FHIRResourceRepository

static for(resourceType)

Write methods

WriteResult — uniform return shape

resources map (opt-in)

Read methods

History methods (require historyMode !== 'off')

Delete methods

Summary update

Workflow state methods

History modes

FHIRReferenceResolver

Requirements

License

Changelog

processedResetPolicy — unified processed-field control

New API

Deprecated options (still work, mapped internally)

Migration

History mode impact

`FHIREngineConfig`

`FHIRResourceRepository`

`static for(resourceType)`

`WriteResult` — uniform return shape

`resources` map (opt-in)

History methods (require `historyMode !== 'off'`)

`FHIRReferenceResolver`

`processedResetPolicy` — unified processed-field control