@jackchuka/gql-ingest
v4.2.7
Published
A CLI tool for ingesting data from files into a GraphQL API. Supports CSV, JSON, JSONL, and YAML file formats.
Readme
GQL Ingest
A TypeScript library and CLI tool that reads data from multiple formats (CSV, JSON, YAML, JSONL) and ingests it into GraphQL APIs through configurable mutations.
Features
- ✅ Supported data formats: CSV, JSON, YAML, JSONL
- ✅ Complex nested data support for sophisticated GraphQL mutations
- ✅ External GraphQL mutation definitions (separate .graphql files)
- ✅ Flexible data-to-GraphQL variable mapping via JSON configuration
- ✅ Configurable GraphQL endpoint and headers
- ✅ Parallel processing with dependency management
- ✅ Entity-level and row-level concurrency control
- ✅ Retry capabilities with exponential backoff and configurable error handling
- ✅ Comprehensive metrics and progress tracking
- ✅ Event-based progress monitoring with real-time callbacks
- ✅ Cancellation support via AbortController pattern
Installation
For End Users
# Install globally
npm install -g @jackchuka/gql-ingest
# Or use with npx (no installation required)
npx @jackchuka/gql-ingest -e <url> [-c config.yaml] entity1/entity.json entity2/entity.jsonFor Development
git clone https://github.com/jackchuka/gql-ingest.git
cd gql-ingest
pnpm install
pnpm run buildQuick Start
Initialize a new configuration and start ingesting data in minutes:
# Create a new configuration directory
gql-ingest init ./my-project
# Add a new entity
gql-ingest add users -p ./my-project -f json --fields "id,name,email"
# Run ingestion
gql-ingest -e https://your-api.com/graphql ./my-project/users/entity.jsonUsage
CLI Commands
Initialize Configuration
Create a new configuration directory with example files:
gql-ingest init [path] [options]
Options:
--no-example Skip creating example entity files
--no-config Skip creating config.yaml
-f, --force Overwrite existing files
-q, --quiet Suppress outputThis creates:
my-project/
├── config.yaml
└── example/
├── entity.json # entity definition (name: "example")
├── example.csv
└── example.graphqlAdd Entity
Add a new entity to an existing configuration:
gql-ingest add <entity-name> [options]
Options:
-p, --path <path> Config directory path (default: current directory)
-f, --format <format> Data format (csv, json, yaml, jsonl)
--fields <fields> Comma-separated field names
--mutation <name> GraphQL mutation name
--no-interactive Skip prompts, use defaults only
-q, --quiet Suppress outputInteractive mode prompts for format, fields, and mutation name. Use --no-interactive with flags for CI/CD.
Run Ingestion
Ingest data from entity files into GraphQL API:
gql-ingest [options] <entity files...>
Options:
-e, --endpoint <url> GraphQL endpoint URL (required)
-c, --config <path> Path to config.yaml (optional, for orchestration settings)
-h, --headers <headers> JSON string of headers
-f, --format <format> Override data format detection
-q, --quiet Suppress outputCLI Examples
# Basic usage — pass entity files as positional arguments
gql-ingest \
-e https://your-graphql-api.com/graphql \
./examples/demo/items/entity.json
# With authentication headers
gql-ingest \
-e https://your-graphql-api.com/graphql \
-h '{"Authorization": "Bearer YOUR_TOKEN"}' \
./examples/demo/users/entity.json ./examples/demo/items/entity.json
# With optional config for orchestration (retry, parallelism, dependencies)
gql-ingest \
-e https://your-graphql-api.com/graphql \
-c ./examples/demo/config.yaml \
./examples/demo/users/entity.json ./examples/demo/items/entity.jsonProgrammatic API
GQL Ingest provides a full programmatic API for integration into your Node.js applications.
Installation for API Usage
npm install @jackchuka/gql-ingestBasic API Usage
import { GQLIngest, createConsoleLogger } from "@jackchuka/gql-ingest";
// Initialize the client
const client = new GQLIngest({
endpoint: "https://your-graphql-api.com/graphql",
headers: {
Authorization: "Bearer YOUR_TOKEN",
},
logger: createConsoleLogger({ prefix: "my-app" }), // Optional: enable logging with prefix
});
// Ingest all data from entity files
const result = await client.ingest(["./users/entity.json"]);
// Check if ingestion was successful
if (result.success) {
console.log("Ingestion completed successfully");
console.log("Metrics:", result.metrics);
} else {
console.error("Ingestion failed:", result.errors);
}Processing Multiple Entities
// Process multiple entity files
const result = await client.ingest(["./users/entity.json", "./products/entity.json"]);
// With options
const result = await client.ingest(["./users/entity.json", "./products/entity.json"], {
format: "csv", // Optional: override format detection
});Advanced API Usage
For more control, you can access the underlying components directly:
import {
GraphQLClientWrapper,
DataMapper,
DependencyResolver,
MetricsCollector,
loadConfig,
createConsoleLogger,
} from "@jackchuka/gql-ingest";
// Create your own custom workflow
const logger = createConsoleLogger();
const metrics = new MetricsCollector();
const client = new GraphQLClientWrapper(endpoint, headers, metrics, logger);
const mapper = new DataMapper(client, metrics, logger);
// Load configuration
const config = loadConfig();
// Process entities with custom logic
// ... your custom implementationAPI Methods
GQLIngest Class Methods:
constructor(options: GQLIngestOptions)- Initialize the clientingest(entityPaths: string[], options?: IngestOptions)- Ingest data from entity filesgetMetrics()- Get current processing metricsgetMetricsSummary()- Get formatted metrics summarysetLogger(logger: Logger)- Set custom loggersetHeaders(headers: Record<string, string>)- Update request headerscancel(reason?: string)- Cancel in-progress ingestionprocessing- Property indicating if ingestion is in progress
Event-Based Progress Monitoring
GQLIngest extends EventEmitter, enabling real-time progress tracking and cancellation:
import { GQLIngest } from "@jackchuka/gql-ingest";
const client = new GQLIngest({
endpoint: "https://your-api.com/graphql",
eventOptions: {
emitRowEvents: true, // Emit events for each row
emitProgressEvents: true, // Emit periodic progress
progressInterval: 1000, // Progress every 1 second
},
});
// Listen for events
client.on("started", (p) => console.log(`Starting ${p.totalEntities} entities`));
client.on("progress", (p) => console.log(`${p.progressPercent.toFixed(1)}% complete`));
client.on("entityStart", (p) => console.log(`Processing ${p.entityName}`));
client.on("entityComplete", (p) =>
console.log(`${p.entityName}: ${p.metrics.successfulRows} rows`),
);
client.on("rowSuccess", (p) => console.log(`Row ${p.rowIndex} OK`));
client.on("rowFailure", (p) => console.error(`Row ${p.rowIndex} failed: ${p.error.message}`));
client.on("finished", (p) => console.log(`Done in ${p.durationMs}ms`));
client.on("errored", (p) => console.error(`Error: ${p.error.message}`));
client.on("cancelled", (p) => console.log(`Cancelled: ${p.reason}`));
// Handle graceful shutdown
process.on("SIGINT", () => client.cancel("User interrupted"));
await client.ingest(["./users/entity.json"]);Available Events:
| Event | When Emitted | Key Payload Fields |
| ---------------- | ------------------------ | ------------------------------------------------- |
| started | Ingestion begins | entityNames, totalWaves |
| progress | Periodic interval | progressPercent, successfulRows, failedRows |
| entityStart | Entity processing begins | entityName, totalRows, waveIndex |
| entityComplete | Entity processing ends | entityName, metrics, success |
| rowSuccess | Row mutation succeeds | entityName, rowIndex, row, result |
| rowFailure | Row mutation fails | entityName, rowIndex, error |
| cancelled | Processing cancelled | reason, metrics, elapsedMs |
| finished | Processing completes | metrics, durationMs, allSuccessful |
| errored | Fatal error occurs | error, metrics, elapsedMs |
Cancellation Support
Cancel in-progress ingestion using the cancel() method or external AbortController:
// Method 1: Using cancel()
const client = new GQLIngest({ endpoint: "..." });
process.on("SIGINT", () => client.cancel("User interrupted"));
await client.ingest(["./users/entity.json"]);
// Method 2: Using external AbortController
const controller = new AbortController();
setTimeout(() => controller.abort("Timeout"), 60000);
await client.ingest(["./users/entity.json"], { signal: controller.signal });TypeScript Support
Full TypeScript support is included with comprehensive type definitions:
import type {
GQLIngestOptions,
IngestOptions,
IngestResult,
ProcessingMetrics,
EntityMetrics,
// Event types
EventOptions,
StartedEventPayload,
ProgressEventPayload,
EntityStartEventPayload,
EntityCompleteEventPayload,
RowSuccessEventPayload,
RowFailureEventPayload,
CancelledEventPayload,
FinishedEventPayload,
ErroredEventPayload,
} from "@jackchuka/gql-ingest";Parallel Processing 🚀
GQL Ingest supports advanced parallel processing with dependency management for high-performance data ingestion:
Key Capabilities
- Entity-level parallelism: Process multiple entities (users, products, orders) concurrently
- Row-level parallelism: Process multiple CSV rows within an entity concurrently
- Dependency management: Ensure entities process in the correct order (e.g., users before orders)
- Smart batching: Control exactly how many entities/rows process simultaneously
- Real-time metrics: Track progress, success rates, and performance
Quick Example
# config.yaml - Add to your configuration directory
parallelProcessing:
concurrency: 10 # Process up to 10 CSV rows per entity concurrently
entityConcurrency: 3 # Process up to 3 entities simultaneously
preserveRowOrder: false # Allow rows to complete out of order for speed
# Define dependencies between entities
entityDependencies:
products: ["users"] # Products must wait for users to complete
orders: ["products"] # Orders must wait for products to completePerformance Impact: This configuration can process data 10-50x faster than sequential processing, depending on your GraphQL API's capabilities.
👉 Full Parallel Processing Guide - Detailed configuration options, performance tuning, and examples.
Retry Capabilities 🔄
GQL Ingest includes robust retry functionality to handle transient failures and improve reliability:
Key Features
- Automatic retries: Failed GraphQL mutations are retried automatically
- Exponential backoff: Intelligent delay increases between retry attempts
- Jitter: Randomization prevents thundering herd problems
- Configurable error codes: Control which HTTP status codes trigger retries
- Per-entity overrides: Different retry settings for different entities
- Metrics tracking: Monitor retry success rates and attempt counts
Quick Example
# config.yaml - Add to your configuration directory
retry:
maxAttempts: 5 # Retry up to 5 times (default: 3)
baseDelay: 2000 # Start with 2s delay (default: 1000ms)
maxDelay: 60000 # Cap delays at 60s (default: 30000ms)
exponentialBackoff: true # Double delay each retry (default: true)
retryableStatusCodes: # Which HTTP errors to retry (defaults shown)
- 408 # Request Timeout
- 429 # Too Many Requests
- 500 # Internal Server Error
- 502 # Bad Gateway
- 503 # Service Unavailable
- 504 # Gateway Timeout
# Per-entity retry overrides
entityConfig:
critical-orders:
retry:
maxAttempts: 10 # More retries for critical data
baseDelay: 500 # Faster initial retryReliability Impact: Retry capabilities can improve success rates from 95% to 99.9%+ for APIs with transient failures.
Selective Entity Processing
Pass only the entity files you want to process as positional arguments:
- Process multiple entities:
gql-ingest -e <url> users/entity.json products/entity.json - Process a single entity:
gql-ingest -e <url> items/entity.json - Entities are processed in dependency order automatically when
-c config.yamlis provided - Missing dependencies will trigger a warning but not prevent execution
Note: When using entity dependencies defined in config.yaml, the tool will warn you about any missing dependencies but will still attempt to process the selected entities. Ensure dependent data exists in your GraphQL API before processing entities with unmet dependencies.
Configuration
Each entity is defined by an entity.json file colocated with its data and GraphQL mutation files. The entity name comes from the name field inside the file, not from the filename or directory name. Paths in the entity file resolve relative to the entity file's directory.
The optional -c flag points to a config.yaml file for orchestration settings (retry, parallelism, dependencies).
Example Configuration
examples/demo/items/entity.json (entity definition):
{
"name": "items",
"dataFile": "items.csv",
"dataFormat": "csv",
"graphqlFile": "items.graphql",
"mapping": {
"name": "item_name",
"sku": "item_sku"
}
}examples/demo/items/items.csv:
item_name,item_sku
Item1,item-1-sku
Item2,item-2-skuexamples/demo/items/items.graphql:
mutation CreateItem($name: String!, $sku: String!) {
createItem(input: { name: $name, sku: $sku }) {
id
name
sku
}
}examples/demo/config.yaml (Optional - for parallel processing and retry configuration):
# Parallel processing configuration
parallelProcessing:
concurrency: 5 # Process 5 rows per entity concurrently
entityConcurrency: 2 # Process 2 entities simultaneously
preserveRowOrder: false # Allow faster out-of-order completion
# Global retry configuration
retry:
maxAttempts: 3 # Retry failed requests up to 3 times
baseDelay: 1000 # Start with 1s delay between retries
exponentialBackoff: true # Double delay each retry
# Entity dependencies
entityDependencies:
items: ["users"] # Items depend on users being processed first
# Per-entity overrides (optional)
entityConfig:
users:
retry:
maxAttempts: 5 # More retries for user creation
items:
concurrency: 10 # Higher concurrency for itemsSupported Data Formats 📄
GQL Ingest now supports multiple data formats beyond CSV for more flexible data ingestion, especially for complex nested GraphQL mutations:
Supported Formats
- CSV - Traditional flat file format
- JSON - Perfect for nested/complex data structures
- YAML - Human-friendly alternative to JSON
- JSONL - JSON Lines format for streaming large datasets
Format Selection
The tool automatically detects the format based on file extension, or you can specify it explicitly:
# Auto-detect from entity file configuration
gql-ingest -e <url> ./items/entity.json
# Force specific format
gql-ingest -e <url> --format json ./items/entity.jsonJSON/YAML Format Examples
Direct Mapping (Entire Object)
For complex GraphQL mutations with nested input types, you can map the entire data object:
products/products.json (data file):
[
{
"name": "Premium T-Shirt",
"type": "PHYSICAL",
"options": [
{
"name": "Color",
"values": ["Red", "Blue", "Green"]
},
{
"name": "Size",
"values": ["S", "M", "L", "XL"]
}
],
"variants": [
{
"name": "Red Small",
"sku": "TS-RED-S",
"optionMappings": [
{ "name": "Color", "value": "Red" },
{ "name": "Size", "value": "S" }
]
}
]
}
]products/entity.json (entity definition):
{
"name": "products",
"dataFile": "products.json",
"dataFormat": "json",
"graphqlFile": "newProduct.graphql",
"mapping": {
"input": "$" // Map entire object to input variable
}
}Path-Based Mapping
For transforming flat JSON into nested structures:
products/products-flat.json (data file):
[
{
"product_name": "Notebook",
"product_type": "PHYSICAL",
"brand": "ACME"
}
]products/entity.json (entity definition with path-based mapping):
{
"name": "products",
"dataFile": "products-flat.json",
"graphqlFile": "newProduct.graphql",
"mapping": {
"input": {
"name": "$.product_name",
"type": "$.product_type",
"brandCode": "$.brand"
}
}
}Cross-Entity References
When one entity needs values produced by another (e.g., a server-generated ID), use outputCapture and $ref:
users/entity.json (captures the created ID):
{
"name": "users",
"dataFile": "users.csv",
"graphqlFile": "users.graphql",
"mapping": { "name": "user_name", "email": "user_email" },
"outputCapture": {
"key": "$.user_name",
"fields": { "id": "$.createUser.id" }
}
}outputCapture.key— JSONPath into the input row, used as the lookup keyoutputCapture.fields— map of field names to JSONPaths into the mutation response
items/entity.json (references the captured user ID):
{
"name": "items",
"dataFile": "items.csv",
"graphqlFile": "items.graphql",
"mapping": {
"name": "item_name",
"sku": "item_sku",
"createdBy": { "$ref": "users", "key": "$.owner_name", "field": "id" }
}
}$ref— the entity name that captured the valuekey— JSONPath into the current row to get the lookup keyfield— the captured field name to retrieve
The referenced entity must be processed first. Entity files are processed in the order they are passed, but you can use entityDependencies in config.yaml to make the ordering explicit:
entityDependencies:
items: ["users"]YAML Format
YAML provides a more readable alternative:
products/products.yaml:
- name: Premium T-Shirt
type: PHYSICAL
options:
- name: Color
values: [Red, Blue, Green]
- name: Size
values: [S, M, L, XL]
variants:
- name: Red Small
sku: TS-RED-S
optionMappings:
- name: Color
value: Red
- name: Size
value: SDevelopment
Scripts
pnpm run build # Build CLI bundle with esbuild
pnpm run build:types # Generate TypeScript declarations
pnpm run build:all # Build bundle + types
pnpm run dev # Run in development mode
pnpm run test # Run test suiteHow It Works
- Entity Loading: The tool reads the entity JSON files passed as arguments
- Path Resolution: Data files and GraphQL mutations are resolved relative to each entity file's directory
- Dependency Resolution: If
-c config.yamlis provided, analyzesentityDependenciesto create execution waves - Parallel Processing: For each dependency wave:
- Processes up to
entityConcurrencyentities simultaneously - Within each entity, processes up to
concurrencyrows concurrently - Waits for the entire wave to complete before starting the next wave
- Processes up to
- GraphQL Execution: For each data row:
- Loads the GraphQL mutation definition
- Maps data fields to GraphQL variables using the mapping configuration
- Executes the mutation against the GraphQL endpoint
- Error Handling & Retries:
- Failed mutations are automatically retried with exponential backoff
- Non-retryable errors (e.g., validation failures) are logged and skipped
- Configurable retry policies per entity type
- Metrics & Monitoring:
- Real-time progress tracking and success/failure rates
- Retry attempt counts and success rates
- Detailed per-entity performance breakdown
License
MIT
