@healthcare-interoperability/fhir-storage-core
v1.7.0
Published
FHIR R4 MongoDB storage primitives — extensible repository base class, reference resolver, and ingest-engine configuration for healthcare interoperability pipelines.
Downloads
943
Readme
@healthcare-interoperability/fhir-storage-core
FHIR R4 MongoDB storage primitives — extensible repository base class, reference resolver, and ingest-engine configuration for healthcare interoperability pipelines.
Works with both require() (CommonJS) and import (ESM). Node.js ≥ 20.
Install
npm install @healthcare-interoperability/fhir-storage-core
# peer dependencies
npm install mongodb @quicore/hashUsage
// ESM
import {
FHIRResourceRepository,
FHIRReferenceResolver,
FHIREngineConfig,
bootstrap,
createRepositories,
resolveResourceClasses,
} from '@healthcare-interoperability/fhir-storage-core';
// CommonJS
const {
FHIRResourceRepository,
FHIRReferenceResolver,
FHIREngineConfig,
bootstrap,
createRepositories,
resolveResourceClasses,
} = require('@healthcare-interoperability/fhir-storage-core');Core concepts
Three dedup modes
Every write is scoped by one of three modes — selected per call via integrationConfig.dedupMode (or the engine config default):
| Mode | _id formula | One document per |
|---|---|---|
| integration | idHash({scopeId: integrationId, resourceType, fhirId}) | (integration, fhirId) |
| parent | idHash({scopeId: parentIntegrationId, resourceType, fhirId}) | (parentIntegration, fhirId) |
| customer | idHash({scopeId: integrationId, resourceType, fhirId, customerId}) | (integration, fhirId, customer) |
In integration and parent modes, one resource produces one document (the customerIds array accumulates across upserts). In customer mode, one resource fans out to N documents — one per customerId in the call.
Always-stored fields
Every stored document carries:
_id— hashed per the active dedup mode (see above).integrationScopedId— alwaysidHash({scopeId: integrationId, resourceType, fhirId}), regardless of mode. Equals_idinintegrationmode; differs inparent/customer. Use it to find "this logical resource within the integration" across modes.fhirId— the source FHIR id, preserved verbatim regardless of canonicalization.integrationId,parentIntegrationId(when applicable),customerIds,dedupMode— tenancy/mode metadata.idsCanonical— locked on first write; whether the stored body has been canonicalized.contentHash,meta.versionId,createdAt,updatedAt, optionaldeletedAt, optional workflow fields (processed,flags,attempts).
Bootstrap
The package exposes a three-step composition for initialising the ingest topology against a MongoDB instance:
import { MongoClient } from 'mongodb';
import {
bootstrap,
createRepositories,
FHIREngineConfig,
} from '@healthcare-interoperability/fhir-storage-core';
const client = await MongoClient.connect(process.env.MONGO_URI);
const db = client.db('fhir');
const config = new FHIREngineConfig({
defaultDedupMode: 'integration',
supportedResourceTypes: ['Patient', 'Observation', 'Encounter'],
});
const registry = { Patient: PatientRepository, Observation: ObservationRepository };
// 1. Bootstrap — pings DB, resolves classes, runs initialize() on each.
const resolved = await bootstrap(db, registry, config);
// 2. Instantiate repositories.
const repos = createRepositories(db, resolved, config);
// 3. Use:
const result = await repos.Patient.upsert(patient, { integrationId: 'acme' });Three exports, each does one thing:
resolveResourceClasses(config, registry)— pure. Merges explicit registry,supportedResourceTypes, anddisabledResourceTypesinto a frozen map. No I/O.bootstrap(db, registry, config, options?)— DB ping, resolve, initialize each class. Returns the resolved map so you can pass it tocreateRepositorieswithout re-resolving.createRepositories(db, registry, config)— instantiate one repository per class. No I/O.
supportedResourceTypes entries without an explicit registry entry get filled in with a generic handler via either config.defaultResourceRegistryClass.for(type) or FHIRResourceRepository.for(type) (built-in fallback). Bootstrap warns when this happens — generic handlers lack resource-specific search indexes.
API
FHIREngineConfig
Normalised configuration for the ingest pipeline. Pass a plain options object; use FHIREngineConfig.from() inside pipeline constructors so callers can pass either a plain object or an existing instance.
const config = new FHIREngineConfig({
// Mode defaults — overridden per call by integrationConfig.dedupMode etc.
defaultDedupMode: 'integration', // 'integration' | 'customer' | 'parent'
defaultReferencesMode: 'minimal', // 'all' | 'minimal' | 'none'
defaultHistoryMode: 'off', // 'off' | 'async' | 'transactional'
// Storage form
collectionPrefix: 'fhir_',
idsCanonical: false, // canonicalize Resource.id + Reference.reference on write
// Schema validation
schemaValidationLevel: 'moderate', // 'off' | 'moderate' | 'strict'
schemaValidationAction: 'warn', // 'warn' | 'error'
// Hash algorithms (see "Hashing" below for the full algorithm table)
contentHashAlgorithm: 'sha256',
idHashAlgorithm: 'compact',
// Pagination defaults
defaultSearchLimit: 20,
defaultHistoryLimit: 50,
defaultExportBatchSize: 500,
// Write defaults
defaultBulkWriteOptions: { ordered: false, writeConcern: { w: 1, j: false } },
defaultWriteConcern: undefined, // single-doc writes
// Resource-type policy
supportedResourceTypes: null, // null = allow all
disabledResourceTypes: [],
allowUnknownResourceTypes: false,
defaultResourceRegistryClass: null, // fallback class for supportedResourceTypes
// without explicit registry entries
// Hook called at every ingest call
validateIntegrationConfig: null,
});Configs are frozen after construction (shallow). Top-level reassignment throws in strict mode; nested options remain mutable (caller-owned references).
Hashing
Backed by @quicore/hash. Two closed algorithm sets:
idHashAlgorithm — used for MongoDB _id values.
| Value | Output | Notes |
|---|---|---|
| 'compact' (default) | 22-char Base62 | BLAKE2b → Base62. ~131 bits entropy. Recommended. |
| 'sha256' | 64-char hex | Larger _id. |
| 'sha512' | 128-char hex | Largest. |
| 'sha3-256' | 64-char hex | |
| 'blake2s256' | 64-char hex | |
contentHashAlgorithm — used for change-detection. Stored as "algo:hex" so a future algorithm change is detectable from existing hashes.
| Value | Output |
|---|---|
| 'sha256' (default) | 64-char hex |
| 'sha512' | 128-char hex |
| 'sha3-256' | 64-char hex |
| 'sha3-512' | 128-char hex |
| 'blake2s256' | 64-char hex |
Custom hash functions
Replace either default entirely:
const config = new FHIREngineConfig({
idHashFunction: ({ scopeId, resourceType, fhirId, customerId }) =>
myIdGen(scopeId, resourceType, fhirId, customerId),
contentHashFunction: (resource) => myHashLib.hash(resource),
});The idHash contract: required {scopeId, resourceType, fhirId}, optional customerId (present only in customer dedup mode). Required fields must be non-empty strings; the default implementation throws otherwise. Custom implementations own their own input validation.
Or subclass for deeper customisation:
class MyConfig extends FHIREngineConfig {
contentHash(resource) { return myHashLib.hash(resource); }
idHash({ scopeId, resourceType, fhirId, customerId }) { /* ... */ }
}FHIREngineConfig.from(input) — accepts a plain object or an existing instance (pass-through).
FHIRResourceRepository
Base class for per-resource-type MongoDB collections. Subclass with at minimum a RESOURCE_TYPE static property:
import { FHIRResourceRepository } from '@healthcare-interoperability/fhir-storage-core';
class PatientRepository extends FHIRResourceRepository {
static RESOURCE_TYPE = 'Patient';
// Optional: add indexes specific to this resource type
static SUGGESTED_INDEXES = [
{ key: { integrationId: 1, 'searchIdx.identifier': 1 }, name: 'idx_patient_identifier' },
];
// Optional: force history mode for this type regardless of engine config
static HISTORY_MODE = null; // 'off' | 'async' | 'transactional' | null
}
// Initialise (creates collection + indexes). Idempotent.
await PatientRepository.initialize(db, engineConfig);
// Instantiate
const repo = new PatientRepository(db, engineConfig);static for(resourceType)
Factory for generic, unspecialized handlers — used by bootstrap to fill in supportedResourceTypes that lack explicit registry entries. Returns a fresh subclass with RESOURCE_TYPE set.
const GenericObservationRepo = FHIRResourceRepository.for('Observation');
// ...or, on a custom base class to inherit subclass behavior:
const AuditedObservationRepo = MyAuditedRepository.for('Observation');Write methods
Both upsert and bulkUpsert share a uniform argument order and return shape.
upsert(resource, integrationConfig, options?)
const result = await repo.upsert(
fhirPatient,
{ integrationId: 'acme', customerIds: ['C1'] },
// optional:
{ writeOptions: { session }, aggregateOutput: true }
);bulkUpsert(resources, integrationConfig, options?)
const result = await repo.bulkUpsert(
[patient1, patient2, patient3],
{ integrationId: 'acme', customerIds: ['C1', 'C2'], dedupMode: 'customer' },
{ aggregateOutput: true }
);All resources in a bulkUpsert call share one integrationConfig. Mixed-mode batches are not supported by construction — group writes by integrationConfig at the caller and issue one bulkUpsert per group.
WriteResult — uniform return shape
{
items: [
{ originalIndex, fhirId, _id, customerId, action }
],
counts: { created, modified, matched },
dedupMode: 'integration' | 'customer' | 'parent',
resources?: { /* only when options.aggregateOutput is true */ }
}items— one entry per document written. In customer-mode fan-out, length iscustomerIds.length × resources.length.originalIndex— index back into the inputresources[](always 0 for single-docupsert).customerId— non-null only in customer mode.action—'created' | 'updated' | 'noop' | 'resurrected'forupsert;'created' | 'written'forbulkUpsert(MongoDB's bulkWrite result is aggregate, not per-document).
resources map (opt-in)
Pass options.aggregateOutput: true to include a fhirId → _id lookup map. The leaf shape depends on dedupMode:
// integration / parent — scalar leaf
{ Patient: { 'pt-1': 'hash_xyz' } }
// customer — customerId-keyed object leaf
{ Patient: { 'pt-1': { 'C1': 'hash_aaa', 'C2': 'hash_bbb' } } }Downstream code branches on result.dedupMode to read the structure correctly.
Read methods
await repo.findById(_id); // by internal hashed _id
await repo.search(integrationId, filter, { limit, afterId }); // cursor-paginated
await repo.count(integrationId, filter);
await repo.exportCursor(integrationId, filter, { batchSize }); // for streaming exportsfindById takes the internal _id, not the FHIR id. If a subclass needs FHIR-id lookup, it adds its own index on {integrationId, fhirId} plus a finder method.
History methods (require historyMode !== 'off')
await repo.findByVersion(_id, versionId);
await repo.findHistory(_id, { limit, afterVersion });Both take the internal _id.
Lifecycle / workflow methods
await repo.softDelete(_id, { session });
await repo.markAsProcessed(_id); // sets processed: true, increments attempts
await repo.addFlag(_id, 'needs-review'); // $addToSet on flags arraysoftDelete is idempotent. Re-upserting a soft-deleted document with new content clears deletedAt ("resurrection" semantics, matching FHIR's update-as-create).
History modes
| Mode | Behavior |
|---|---|
| 'off' | No history. Fastest. |
| 'async' | Single-doc: findOneAndUpdate with prior-state read, then separate history insertOne. Bulk: prior-state read, bulkWrite, then insertMany for changed docs. Tiny crash window between writes; close it with periodic reconciliation. |
| 'transactional' | Both writes inside a Mongo transaction. Atomic, ~25–40% slower. NOT supported by bulkUpsert — use upsert() in a loop. |
Set via engineConfig.defaultHistoryMode or override per-class via static HISTORY_MODE.
FHIRReferenceResolver
Standalone class for FHIR reference handling. Importable by callers other than the repository (export servers, Bundle expanders, synthesis layers).
import { FHIRReferenceResolver } from '@healthcare-interoperability/fhir-storage-core';
// The resolver is scope-agnostic. The caller provides an idHash function whose
// closure adds whatever scope context applies.
const resolver = new FHIRReferenceResolver({
idHash: (input) => config.idHash({ scopeId: 'integ_A', ...input }),
});The resolver invokes idHash({resourceType, fhirId}) — those two are all it can parse from a reference string. Anything else (scopeId, customerId, etc.) is added by the caller's closure.
// Static parser — exposed so callers can do their own parsing
FHIRReferenceResolver.parseReferenceString('Patient/abc');
// → { type: 'Patient', fhirId: 'abc' }
// Static single-shot resolution
FHIRReferenceResolver.resolveOne('Patient/abc', idHashFn);
// → { id: 'hash_xyz', type: 'Patient', originalReference: 'Patient/abc' }
// Extract resolved references at specific paths (minimal mode)
resolver.extract(resource, ['subject', 'encounter', 'performer.actor']);
// Recursively extract every reference in a resource (all mode — CPU-heavy)
resolver.extractAll(resource);
// Produce a canonicalized copy: Resource.id rewritten to hashed form,
// every embedded Reference.reference rewritten to "Type/<hash>"
resolver.canonicalize(resource);
// Write a resolved reference at a nested path
resolver.placeAt(target, ['contact', 0, 'organization'], 'Organization/o1');Reference handling rules
| Input | Behavior |
|---|---|
| Relative (Type/id) | Resolves to hashed id |
| Absolute (https://.../Type/id) | Resolves (host discarded) |
| Local fragment (#localId) | Returns null (points into contained[]) |
| Versioned (Type/id/_history/N) | Stripped to base, then resolved |
| URN (urn:uuid:...) | Throws — higher layer must assign concrete ids first |
contained[] and embedded resource keys are not recursed into — they have their own scope.
Requirements
- Node.js ≥ 20
- MongoDB ≥ 6 (peer dependency)
@quicore/hash(peer dependency)
License
MIT
