apollo-cache-audit

v0.5.3

Published

20 days ago

Detect missing id fields on entity-shaped GraphQL types that would otherwise break Apollo Client cache normalization.

Downloads

785

0High
0Medium
0Low

yuki2006

apollo apollo-client graphql cache normalization relay audit lint

apollo-cache-audit

Detect GraphQL Object types that look like entities but lack an id field, so they are silently inlined by Apollo Client's InMemoryCache instead of being normalized — the root cause of "stale after mutation", "cache data may be lost" warnings, and infinite-loop pagination bugs.

Community project. Not affiliated with or endorsed by Apollo GraphQL, Inc.

Why this exists

Apollo's InMemoryCache normalizes any Object type whose schema declares an id (or _id) field, or whose type policy provides custom key fields. Types without any of those are inlined into their parent. That's correct for value objects, but silently wrong for types that are conceptually entities — the symptoms only appear at runtime, often after a refactor, often only in mutations.

The existing tools cover adjacent problems:

| Tool | Covers | Does not cover | |---|---|---| | @graphql-eslint/require-selections | Operation forgets to select id | Schema-side id is missing | | @graphql-eslint/strict-id-in-types | All types must have id | False positives on value objects, suffix allowlist is too coarse | | Apollo dev warnings | Runtime detection after a merge collision | Pre-merge static prevention, CI gating |

apollo-cache-audit uses Apollo's own InMemoryCache.identify() as the source of truth, combined with the Relay Node convention as the contract layer:

Apollo-grounded probe. For each Object type, the tool constructs a real InMemoryCache from your statically-extracted config and calls cache.identify(synthInstance). This is the same logic Apollo runs at request time — if identify() returns undefined, the type will not be normalized at runtime, period.
Reference graph. Types that fail to normalize but are referenced as a field from a normalizing parent are reported as promotion candidates — the symptoms (stale-after-mutation, key collision, fetchMore loops) only manifest in this configuration.
Invalid-keyFields detection. If typePolicies[T].keyFields lists a field name that isn't declared on T in the schema, Apollo throws InvariantError the first time it sees that type. The audit catches this statically as a high-confidence misconfiguration.
Node interface contract check. Types listed in dataIdFromObject / typePolicies.keyFields that don't implement the Node interface are reported as customButNotNode — Apollo treats them as entities but the schema disagrees with itself.
Suffix backstop. A small allowlist (Edge, Connection, PageInfo, Payload, etc.) avoids false positives on Relay/GraphQL structural types. This is not the primary detection mechanism.

What distinguishes this tool from strict-id-in-types is that detection is grounded in Apollo's actual normalization logic. The Node-interface and suffix lists only adjust how findings are bucketed.

Install

npm install -D apollo-cache-audit
# or
pnpm add -D apollo-cache-audit

Peer dependencies: graphql >= 16, @apollo/client >= 3.

Quickstart

npx apollo-cache-audit \
  --schema ./schema.graphql \
  --cache-config ./src/apollo/cache.ts

Sample output:

apollo-cache-audit
==================

schema sha256: 9f3a7b21c4d8…
node-implemented:        42
apollo-ok-not-node:      3  ℹ
value-objects:           18
custom-handled:          3
custom-but-not-node:     1  ←
node-promotion-candidate:4  ←
invalid-keyfields:       1  ←

⚠ Types with custom cache config but no Node interface
   (these are treated as entities by the cache but the schema declares no id)
   - Organization  (dataIdFromObject)

⚠ Node-promotion candidates
   (referenced from a Node-implementing type; likely entities)
   - Author (./schema.graphql:42) ← Post, Comment
   - Membership (./schema.graphql:71) ← Workspace
   - Subscription (./schema.graphql:88) ← Account
   - WebhookConfig (./schema.graphql:104) ← Project

CLI

apollo-cache-audit --schema <path> --cache-config <path> [options]

| Option | Default | Description | |---|---|---| | --schema <path> | (required) | GraphQL SDL file | | --cache-config <path> | (required) | TS/JS file with new InMemoryCache({...}) | | --ts-config <path> | auto-detected | tsconfig.json for cross-file resolution | | --node-interface <name> | Node | Entity-marker interface name | | --ignore-suffixes <list> | Response,Result,Payload,Edge,Connection,PageInfo,Aggregation,Csv,Report | Suffix list for value-object backstop | | --ignore-types <list> | (empty) | Type names to skip entirely (third-party/legacy) | | --baseline <path> | (none) | Known-violation JSON; only new findings beyond this are surfaced as new | | --update-baseline | false | Rewrite the --baseline file with current findings | | --format <text\|json\|github> | text | Output format. github emits ::warning:: annotations | | --fail-on <none\|new\|suspect\|all> | none | Exit non-zero condition | | --fail-on-custom-without-node | false | Exit non-zero on customButNotNode findings (high-confidence) | | --fail-on-invalid-keyfields | false | Exit non-zero when typePolicies.keyFields references a missing schema field | | --fail-on-not-node | false | Exit non-zero on types that normalize via id but lack Node interface (strict Relay) | | --strict-recommend | false | Omit low-confidence recommendations from output (only medium/high are emitted) | | --multi-hop | false | Walk transitively through non-normalized intermediates (Normalized→ValueObject→Candidate) | | --cache-config <list> | (required) | Comma-separate to merge multiple cache-config files (conflicts reported) | | --format jsonschema | — | Emit the AuditResult JSON Schema (no audit run; for downstream tooling) | | --report <path> | (none) | Write rendered output to a file instead of stdout | | --verbose | false | Verbose logging |

Exit codes

| Code | Meaning | |---|---| | 0 | Success / no failure condition triggered | | 1 | Findings exist and a --fail-on* condition triggered | | 2 | Invocation error (missing args, file not found, etc.) |

`--fail-on` semantics

| Value | Triggers exit 1 when… | |---|---| | none | never | | new | a --baseline is provided and there are candidates outside the baseline | | suspect | any nodePromotionCandidate exists (with or without baseline) | | all | any candidate or any customButNotNode finding exists |

--fail-on-custom-without-node independently triggers exit 1 when customButNotNode is non-empty — useful as a hard gate even when adopting the candidate list gradually.

Programmatic API

import { audit } from "apollo-cache-audit";

const result = await audit({
  schema: "./schema.graphql",      // path or SDL string
  cacheConfig: "./src/cache.ts",   // path to TS/JS file
  nodeInterface: "Node",
  ignoreSuffixes: ["Edge", "Connection"],
  ignoreTypes: ["LegacyType"],
  baseline: "./apollo-cache-audit.baseline.json",
});

// Shape:
// {
//   nodeImplemented: string[]
//   valueObject: { name, reason }[]
//   customHandled: { name, via, keyFields? }[]
//   customButNotNode: { name, via, keyFields? }[]
//   nodePromotionCandidate: { name, referencedFrom, line?, file? }[]
//   newSinceBaseline: NodeCandidateInfo[]
//   resolvedSinceBaseline: string[]
//   schemaHash: string
// }

buildBaseline, writeBaseline, and loadBaseline are also exported for custom CI setups.

Baseline workflow

New projects rarely start clean. Adopt incrementally:

Run once to discover all current candidates:

apollo-cache-audit --schema ./schema.graphql --cache-config ./src/cache.ts \
  --baseline ./apollo-cache-audit.baseline.json --update-baseline

Commit the baseline JSON.

In CI, fail only on new candidates:

apollo-cache-audit --schema ./schema.graphql --cache-config ./src/cache.ts \
  --baseline ./apollo-cache-audit.baseline.json --fail-on new \
  --fail-on-custom-without-node

As types are migrated to Node, rerun with --update-baseline to shrink the file.

The baseline records the schema SHA-256; when the schema changes substantially, schemaChanged: true is included in JSON output so reviewers know to revisit.

Baseline JSON shape

{
  "tool": "[email protected]",
  "generated": "2026-05-25T00:00:00.000Z",
  "schemaHash": "9f3a7b…",
  "nodePromotionCandidate": [
    { "type": "Author", "referencedFrom": ["Post"], "addedAt": "2026-05-25T00:00:00.000Z" }
  ],
  "customButNotNode": [
    { "type": "Organization", "referencedFrom": [], "addedAt": "2026-05-25T00:00:00.000Z" }
  ]
}

addedAt is preserved across --update-baseline runs to track aging.

CI integration

GitHub Actions

- name: Apollo cache audit
  run: |
    npx apollo-cache-audit \
      --schema ./schema.graphql \
      --cache-config ./src/apollo/cache.ts \
      --baseline ./apollo-cache-audit.baseline.json \
      --fail-on new \
      --fail-on-custom-without-node \
      --format github

--format github emits ::warning file=...:: annotations rendered inline in the PR diff.

GitLab CI

apollo-cache-audit:
  script:
    - npx apollo-cache-audit --schema schema.graphql --cache-config src/cache.ts --fail-on suspect
  artifacts:
    when: always
    reports:
      junit: apollo-cache-audit.report.json

Examples

Concrete test fixtures live under test/fixtures/. Each one is a minimal schema + cache config you can paste into your own project to see how the audit responds.

Bug-pattern reproductions

These reproduce real-world Apollo cache bugs that the audit detects ahead of time. The schema files include detailed comments explaining the symptom and root cause:

| Fixture | Symptom this reproduces | |---|---| | bug-stale-after-mutation/ | UI shows old value after mutation succeeds; refresh fixes it. Child stats type lacks id | | bug-key-collision/ | "Cache data may be lost when replacing the X field" warning; nested object overwritten by sibling | | bug-cursorless-pagination/ | fetchMore keeps returning the same items in an infinite loop | | invalid-keyfields/ | Apollo throws Invariant Violation: Missing field 'X' while extracting keyFields at runtime |

Configuration shape examples

How the audit handles various cache-config patterns:

| Fixture | What it demonstrates | |---|---| | basic/ | Mixed schema — Node-implementing entity, value object via suffix, candidate flagged | | custom-handled/ | dataIdFromObject switch case for a type that doesn't implement Node (customButNotNode) | | function-keyfields/ | Both keyFields: ['orgId', 'userId'] array form and keyFields: (obj) => ... function form | | spread-policies/ | typePolicies: { ...basePolicies, ...extraPolicies } resolved across object spreads | | interface-name-custom/ | Non-standard Node interface name (INode) via --node-interface | | all-nodes/ | Every type implements Node — audit produces zero findings | | value-objects-only/ | Schema with no entities; audit treats everything as value object |

All fixtures are validated by test/audit.test.ts and test/baseline.test.ts — those files double as executable documentation.

FAQ

Q: How is this different from @graphql-eslint/strict-id-in-types? strict-id-in-types flags every Object without an id field and asks you to disable it for value objects via suffix. This tool inverts the rule: a type only earns a warning if it is reachable as a field from a Node entity — so genuine value objects produce no noise. The suffix list is a small final filter, not the primary mechanism.

Q: My schema has no node(id: ID!) query — can I still use this? Yes. The tool only requires the Node interface (or whatever you name via --node-interface). The Relay query field is unrelated.

Q: Does this work with urql / Relay framework / GraphQL Yoga? This audit targets @apollo/client's InMemoryCache normalization rules specifically. urql's Graphcache has its own keying model; Relay enforces Node by design. We may add adapters in future versions.

Q: My cache config is split across many files. Supported. apollo-cache-audit uses the TypeScript Compiler API (via ts-morph) and follows identifier references and object spread through imports, as long as your tsconfig.json resolves them. Use --ts-config to point at the right project.

Q: Should I always promote every candidate to a Node? No. The list is "candidates for review" — a multi-field nested object that genuinely never needs identity (a Money { amount, currency } value object) should stay un-normalized. Add such types to --ignore-types or accept them in the baseline.

Q: A customButNotNode finding — what does it mean? You added a type to dataIdFromObject or typePolicies.keyFields, meaning at runtime Apollo treats it as an entity, but the schema declares no id / Node membership. This is almost always a missed schema update.

Q: An invalidKeyFields finding — what does it mean? Your typePolicies[T].keyFields references a field name that doesn't exist on T in the schema. Apollo throws InvariantError the first time it tries to normalize a T (e.g. Missing field 'orgId' while extracting keyFields). This is the highest-confidence finding — it's not a heuristic, it's a runtime crash the tool reproduces ahead of time.

Q: What does apolloCompatibleNotNode mean? The type has an id (or _id) field declared in the schema, so Apollo normalizes it via the default identifier. But it doesn't implements Node. Apollo cache will work correctly. The category exists so teams adopting Relay's Global Object Identification spec can see which types still need formal Node membership. Gate with --fail-on-not-node to make this a hard requirement.

Q: What is the recommendation field on candidates for? Each nodePromotionCandidate carries a heuristic suggestion (add-id, mark-as-value-object, or add-suffix-rule), a confidence level (low / medium / high), the list of signals that contributed to the verdict, and a reason string. Multiple weighted signals vote across the three categories; the winner is returned with the margin determining confidence. The recommendation is advisory, not authoritative — schema authors know intent the tool cannot infer. Pass --strict-recommend to drop low-confidence suggestions from output if you only want the heuristic to speak when it's confident.

Q: What signals does the recommendation engine use? Entity-leaning (vote add-id): id-like field names (slug, uuid, ...), timestamp fields (createdAt, ...), foreign-key field names (userId, orgId, ...), parent count ≥ 2, non-Node interface membership. Value-object-leaning (vote mark-as-value-object): value-object field names (amount, lat, lng, ...), small flat shape (≤ 4 leaf fields, single parent). Suffix-leaning (vote add-suffix-rule): name ends in a structural suffix (Stats, Meta, Detail, ...) not already in --ignore-suffixes, or 2+ sibling candidates share the same suffix.

Q: Function-form keyFields? Detected. The fields list will be reported as "fn" since static analysis can't enumerate the keys, but the type is correctly recognized as custom-handled.

Limitations (v0.1)

Apollo Client only. urql / Relay framework / Graphcache out of scope.
Static analysis of cache config: dynamic property names ([CONST]: {...}) and conditional keyFields based on runtime values aren't resolved.
Reference graph is 1-hop. A type reachable only through an intermediate value object (Normalized → ValueObject → Candidate) is currently treated as a value object.
addTypename: false configurations or per-query cacheRedirects (Apollo Client v2 only) are out of scope.
Function-form keyFields are treated as opaque "custom-handled": the tool cannot enumerate which fields the function reads, so it cannot validate them against the schema.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

apollo-cache-audit

Why this exists

Install

Quickstart

CLI

Exit codes

--fail-on semantics

Programmatic API

Baseline workflow

Baseline JSON shape

CI integration

GitHub Actions

GitLab CI

Examples

Bug-pattern reproductions

Configuration shape examples

FAQ

Limitations (v0.1)

License

`--fail-on` semantics