sandbox-seed

v0.2.8

Published

a month ago

Salesforce sandbox seeding CLI + MCP server: SOQL-scoped org-to-org copy with dependency-graph walking, cross-org FK remapping, deterministic PII masking, and a hard AI-never-sees-your-data boundary.

0High
0Medium
0Low

pranavnpm

salesforce sandbox seeding soql mcp mcp-server model-context-protocol data-migration data-masking test-data sfdx salesforce-cli

sandbox-seed

Salesforce sandbox seeding, built for AI agents. SOQL-driven org-to-org record copy with automatic dependency-graph walking, cross-org FK remapping, and a hard AI-never-sees-your-data boundary.

Why this exists

If you've ever tried to seed a Salesforce sandbox from production through an AI agent, you already know the two problems:

Record IDs don't match across orgs. Any "just query source, bulk insert target" recipe — whether you wrote it yourself or you're driving SFDMU or sf data import tree through tool calls — breaks on FK remapping, circular references like Account↔Contact, and master-detail parent ordering.
Agents see everything they touch. Running SOQL through a generic MCP or a shell wrapper streams rows straight into your prompt context. Real customer data, real account balances, real PII, all getting embedded into an LLM.

sandbox-seed is designed around both:

A proper dependency-graph walker with two-phase inserts for every strongly-connected component, cross-org ID remapping, and a mandatory dry-run gate before any write.
A hard boundary in the tool surface itself: record data never leaves disk. The AI sees counts, schemas, file paths, and plan hashes. It does not see the rows.

If you're building AI workflows that touch Salesforce, the AI-boundary contract is the property you actually care about, and it's non-negotiable in this tool.

30-second setup (MCP, the primary surface)

Add this to your MCP host config (~/.cursor/mcp.json, ~/Library/Application Support/Claude/claude_desktop_config.json, or your project's .mcp.json):

{
  "mcpServers": {
    "sandbox-seed": {
      "command": "npx",
      "args": ["-y", "-p", "sandbox-seed", "sandbox-seed-mcp"]
    }
  }
}

Restart your host. Then try:

"Seed my dev-full sandbox with Opportunities that closed this quarter over $50k, from the prod org."

The model will call the seed tool and walk you through the five-step flow (analyze → select → dry-run → confirm → run). No record data ever appears in chat.

What to expect on first run:

analyze on a standard object (Case, Opportunity) against a managed-package-heavy org: ~5–20 seconds cold, ~1–3 seconds warm. Describes fan out in parallel (6 at a time, with backoff on throttling) across the dependency graph (~20–40 describes for Case), and cache at .sandbox-seeding/cache/ with a 24h TTL.
extract rate depends on record count × relationship breadth; small seeds (<500 rows) finish in seconds.
Sessions live at ~/.sandbox-seed/sessions/ and are garbage-collected after 7 days. The cross-run project id-map at ~/.sandbox-seed/id-maps/ is persistent.

For deeper MCP config, tool reference, and host-specific tips, see docs/MCP.md.

Install (CLI)

npm i -g sandbox-seed

Requires Node.js ≥ 20.

The CLI ships inspect (read-only schema/dependency-graph exploration) and a first-class seed command — the same engine and safety gates as the MCP tool, usable from a terminal or CI:

sandbox-seed inspect --object Case
sandbox-seed inspect --object Opportunity --include-counts --format mermaid

# Full seed flow: dry run → interactive confirm → run
sandbox-seed seed --source-org prod --target-org dev-full \
  --object Case --where "IsClosed = false AND CreatedDate = THIS_YEAR" --sample-size 100

# Stop at the dry-run report; execute later after review
sandbox-seed seed ... --mask --dry-run-only
sandbox-seed seed resume <sessionId>

Full CLI reference: docs/CLI.md.

The AI-boundary contract

The single most important property of this tool.

SOQL results are written to an on-disk session store the tool owns. They do not flow back through the MCP response envelope. No tool call ever returns a row.
Every tool response is metadata only: object names, row counts, relationship graphs, file paths, plan hashes.
WHERE clauses must be SOQL typed by a human. The tool refuses to invent predicates from natural language ("the biggest", "recent ones"). Ambiguity surfaces a prompt asking for a real SOQL predicate.
Targets must be sandboxes (Organization.IsSandbox = true). Writes to production orgs are rejected at the tool layer.

The tests under tests/mcp/ encode these rules — start there if you're auditing the boundary. Full writeup in docs/AI_BOUNDARY.md.

Walkthrough — copy the prompts from a real session

The fastest way to learn this tool is to read a real chat transcript and copy the prompts into your own agent. docs/WALKTHROUGH.md contains four full verbatim sessions covering:

A happy-path Case seed (306/308 inserts, 2 target-validation-rule failures)
A clean 1,271-row run with zero errors
Three common first-time mistakes the tool catches (full SELECT strings, limit too low, WHERE matches zero)
How to ask the agent to describe the tool before you call it

One prompt shape you can paste straight into Cursor, Claude Desktop, or any MCP-aware host:

action: "start"
sourceOrg: "<your source alias>"
targetOrg: "<your target sandbox alias>"
object: "Case"
whereClause: "IsClosed = false AND CreatedDate = THIS_YEAR"
sampleSize: 100
disableValidationRulesOnRun: true

USE MCP: sandbox-seed

Then for each subsequent step, paste back the JSON the agent suggests:

{ "action": "analyze", "sessionId": "<session-id-from-start>" }

{
  "action": "select",
  "sessionId": "<session-id>",
  "includeOptionalParents": ["Account", "Contact"],
  "includeOptionalChildren": ["CaseComment", "Task"]
}

{ "action": "dry_run", "sessionId": "<session-id>" }

{ "action": "run", "sessionId": "<session-id>", "confirm": true }

Prerequisites:

Salesforce CLI installed (brew install sfdx-cli or npm i -g @salesforce/cli) and both orgs logged in with sf org login web --alias <name>.
sandbox-seed registered in your MCP host config (see the 30-second setup block above).

Troubleshooting (from real sessions)

"WHERE clause matched 0 records on <object>" — your SOQL matches nothing. Test it in Workbench or sf data query first, or relax the filter.
"WHERE clause matched N records, exceeding limit M" — limit is a safety cap, not a SOQL LIMIT. Either raise limit, tighten the predicate, or use sampleSize: N (deterministic "first N by ORDER BY Id").
"sandbox_seed_seed does not take a full SELECT string" — whereClause is the predicate only. Drop the SELECT … FROM … LIMIT … parts.
FIELD_CUSTOM_VALIDATION_EXCEPTION in execute.log — target-org validation rules rejected some rows. Pass disableValidationRulesOnRun: true to start to snapshot + deactivate them around the insert, then restore.
DUPLICATE_VALUE on re-run — you've already seeded those rows. Objects with an external-id field auto-route through UPSERT on re-runs; otherwise run against a clean sandbox.

What's shipped in 0.2.x

First-class CLI seed command (new in 0.2.8) — sandbox-seed seed / seed resume / seed recover drive the same engine and safety gates as the MCP tool from a terminal or CI (--dry-run-only, --yes, --json).
Parallel describe walk (new in 0.2.8) — cold-cache analyze/inspect fan describes out 6 at a time; full-graph analyze on managed-package-heavy orgs dropped from 30–90s to ~5–20s.
Masking production-blessed (0.2.8) — the real-org acceptance gate passed G1–G6 on three consecutive runs; adds a digit-shaped postal-code preset, masked values cap to the shorter of source/target field length, and the dry-run report warns when a masked field is shorter on the target.
Sticky upsert keys (new in 0.2.8) — the external-id each object UPSERTs on persists per (source, target) pair and is reused (re-validated) by later sessions, so re-seeds keep matching the rows earlier runs created.
MCP seed tool — five-step SF → SF copy flow (analyze → select → dry-run → confirm → run) with cross-org FK remapping, two-phase cycle inserts, validation-rule snapshot/restore, and the AI-boundary contract.
Child + 1 user-selected lookups (new in 0.2.0) — at start, name specific reference fields on direct children of the root; the walker follows each exactly one hop to pull that target object into scope. Multi-path objects (reachable via direct FK and child-lookup) union their ID sets rather than picking one path.
Semi-joins in whereClause now supported (fixed in 0.2.0) — root predicates like Id IN (SELECT … FROM …) work end-to-end. Root IDs are materialized once and spliced into downstream scopes as literal Id IN ('…','…'), sidestepping SOQL's one-level semi-join limit.
CLI inspect command — read-only schema and dependency-graph exploration (tree / mermaid / dot / json).
Salesforce CLI auth integration (reads ~/.sf/).
Deterministic field masking (opt-in) — pass mask: true (and/or maskFields) at start to replace PII with deterministic, format-preserving fakes before insert (emails stay email-shaped, phones phone-shaped, …). Keyed by a per-(source, target) salt, so the same value masks identically everywhere and across re-runs — value joins and external-id UPSERT survive. Detected sensitive fields mask by default; the dry-run report lists exactly what will mask so you can add anything auto-detection missed. Reference fields are never masked (the id-map owns FKs). See docs/AI_BOUNDARY.md.

Not yet shipped: synthetic data generation, CSV import, multi-target fan-out. These are roadmap.

Scope & limitations to know about in 0.2.x

One session = one id-map. The source→target ID map is session-scoped (written to the session dir). Seeds cannot yet be composed across runs — if you seed Accounts in one session and then seed Applications in a second session, the second session does not recognize the Accounts from the first. Required lookups are skipped; nillable lookups are inserted as null. Workaround: seed everything you need in a single session by rooting the seed on the highest shared object (e.g. Account), so the id-map is populated in dependency order within one run. A project-level id-map that composes across runs is on the roadmap.
Child-lookup walking is one hop only, user-selected. 0.2.0 walks user-named reference fields on direct children exactly one hop further. No transitive expansion, no auto-discovery. If you need a two-hop chain (child → parent → grandparent), root the seed on the child instead so its parents are walked transitively.
Ownership and queue/group lookups default to the target user. References to User, Group, or Queue (most commonly OwnerId, but also fields like AssignedToId or QueueId) are not walked into the dependency graph, because those records are org-global, often privileged, and rarely safe to materialize cross-org. At run time these fields are left null on insert, which causes Salesforce to default them to the target-org user performing the seed. The dry-run report surfaces a "Defaulted owner/user/group references" count per object so you can see how many rows this affects before running. If you need to preserve specific owners, set them manually in the target after seeding, or pre-populate the id-map with explicit User mappings.

Authentication

Reads ~/.sf/ auth files if you already use the Salesforce CLI (sf). Zero config.
Falls back to an OAuth device flow if you don't have sf installed.
Target org must be a sandbox. Production targets are refused.

More: docs/AUTH.md.

Status

Pre-release (0.2.8). APIs and flags may change before 1.0. Use in sandboxes only — never point this at a production org as the target (the tool refuses, but don't test the refusal with real money).

Roadmap: [BACKLOG in project notes, soon to be moved into GitHub Issues].

Documentation

docs/LIMITATIONS.md — what the tool doesn't do today (read before adopting)
docs/MCP.md — MCP server setup, tool reference, host-specific notes
docs/CLI.md — CLI command reference
docs/AI_BOUNDARY.md — the boundary contract, in depth
docs/AUTH.md — authentication setup
docs/ARCHITECTURE.md — how the dependency graph, SCC handling, and FK remapping work
CONTRIBUTING.md — development setup, tests, release flow

Contributing

Issues and PRs welcome. See CONTRIBUTING.md for dev setup.

Security issues: please do not file public issues. Email [email protected].

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

sandbox-seed

Why this exists

30-second setup (MCP, the primary surface)

Install (CLI)

The AI-boundary contract

Walkthrough — copy the prompts from a real session

Troubleshooting (from real sessions)

What's shipped in 0.2.x

Scope & limitations to know about in 0.2.x

Authentication

Status

Documentation

Contributing

License