@jackchuka/gql-ingest

v3.1.4

Published

3 days ago

A CLI tool for ingesting data from files into a GraphQL API. Supports CSV, JSON, JSONL, and YAML file formats.

0High
0Medium
0Low

jackchuka

api cli csv graphql ingest

GQL Ingest

A TypeScript library and CLI tool that reads data from multiple formats (CSV, JSON, YAML, JSONL) and ingests it into GraphQL APIs through configurable mutations.

Features

✅ Supported data formats: CSV, JSON, YAML, JSONL
✅ Complex nested data support for sophisticated GraphQL mutations
✅ External GraphQL mutation definitions (separate .graphql files)
✅ Flexible data-to-GraphQL variable mapping via JSON configuration
✅ Configurable GraphQL endpoint and headers
✅ Parallel processing with dependency management
✅ Entity-level and row-level concurrency control
✅ Retry capabilities with exponential backoff and configurable error handling
✅ Comprehensive metrics and progress tracking
✅ Event-based progress monitoring with real-time callbacks
✅ Cancellation support via AbortController pattern

Installation

For End Users

# Install globally
npm install -g @jackchuka/gql-ingest

# Or use with npx (no installation required)
npx @jackchuka/gql-ingest --endpoint <url> --config <path>

For Development

git clone https://github.com/jackchuka/gql-ingest.git
cd gql-ingest
pnpm install
pnpm run build

Quick Start

Initialize a new configuration and start ingesting data in minutes:

# Create a new configuration directory
gql-ingest init ./my-config

# Add a new entity
gql-ingest add users -p ./my-config -f json --fields "id,name,email"

# Run ingestion
gql-ingest -e https://your-api.com/graphql -c ./my-config

Usage

CLI Commands

Initialize Configuration

Create a new configuration directory with example files:

gql-ingest init [path] [options]

Options:
  --no-example  Skip creating example entity files
  --no-config   Skip creating config.yaml
  -f, --force   Overwrite existing files
  -q, --quiet   Suppress output

This creates:

data/ - Data files directory
graphql/ - GraphQL mutation files
mappings/ - Mapping configuration files
config.yaml - Processing configuration
Example entity files (by default)

Add Entity

Add a new entity to an existing configuration:

gql-ingest add <entity-name> [options]

Options:
  -p, --path <path>      Config directory path (default: current directory)
  -f, --format <format>  Data format (csv, json, yaml, jsonl)
  --fields <fields>      Comma-separated field names
  --mutation <name>      GraphQL mutation name
  --no-interactive       Skip prompts, use defaults only
  -q, --quiet            Suppress output

Interactive mode prompts for format, fields, and mutation name. Use --no-interactive with flags for CI/CD.

Run Ingestion

Ingest data from configuration into GraphQL API:

gql-ingest [options]

Options:
  -e, --endpoint <url>     GraphQL endpoint URL (required)
  -c, --config <path>      Path to configuration directory (required)
  -n, --entities <list>    Comma-separated list of entities to process
  -h, --headers <headers>  JSON string of headers
  -f, --format <format>    Override data format detection
  -q, --quiet              Suppress output

CLI Examples

# Basic usage
gql-ingest \
  -e https://your-graphql-api.com/graphql \
  -c ./examples/demo

# With authentication headers
gql-ingest \
  -e https://your-graphql-api.com/graphql \
  -c ./examples/demo \
  -h '{"Authorization": "Bearer YOUR_TOKEN"}'

# Process specific entities only
gql-ingest \
  -e https://your-graphql-api.com/graphql \
  -c ./examples/demo \
  -n users,products

Programmatic API

GQL Ingest provides a full programmatic API for integration into your Node.js applications.

Installation for API Usage

npm install @jackchuka/gql-ingest

Basic API Usage

import { GQLIngest, createConsoleLogger } from "@jackchuka/gql-ingest";

// Initialize the client
const client = new GQLIngest({
  endpoint: "https://your-graphql-api.com/graphql",
  headers: {
    Authorization: "Bearer YOUR_TOKEN",
  },
  logger: createConsoleLogger({ prefix: "my-app" }), // Optional: enable logging with prefix
});

// Ingest all data from a configuration
const result = await client.ingest("./config");

// Check if ingestion was successful
if (result.success) {
  console.log("Ingestion completed successfully");
  console.log("Metrics:", result.metrics);
} else {
  console.error("Ingestion failed:", result.errors);
}

Processing Specific Entities

// Process only specific entities
const result = await client.ingestEntities("./config", ["users", "products"]);

// Or using the ingest method with options
const result = await client.ingest("./config", {
  entities: ["users", "products"],
  format: "csv", // Optional: override format detection
});

Advanced API Usage

For more control, you can access the underlying components directly:

import {
  GraphQLClientWrapper,
  DataMapper,
  DependencyResolver,
  MetricsCollector,
  loadConfig,
  createConsoleLogger,
} from "@jackchuka/gql-ingest";

// Create your own custom workflow
const logger = createConsoleLogger();
const metrics = new MetricsCollector();
const client = new GraphQLClientWrapper(endpoint, headers, metrics, logger);
const mapper = new DataMapper(client, basePath, metrics, logger);

// Load configuration
const config = loadConfig("./config");

// Process entities with custom logic
// ... your custom implementation

API Methods

GQLIngest Class Methods:

constructor(options: GQLIngestOptions) - Initialize the client
ingest(configPath: string, options?: IngestOptions) - Ingest data from a configuration
ingestEntities(configPath: string, entities: string[]) - Process specific entities
getMetrics() - Get current processing metrics
getMetricsSummary() - Get formatted metrics summary
setLogger(logger: Logger) - Set custom logger
setHeaders(headers: Record<string, string>) - Update request headers
cancel(reason?: string) - Cancel in-progress ingestion
processing - Property indicating if ingestion is in progress

Event-Based Progress Monitoring

GQLIngest extends EventEmitter, enabling real-time progress tracking and cancellation:

import { GQLIngest } from "@jackchuka/gql-ingest";

const client = new GQLIngest({
  endpoint: "https://your-api.com/graphql",
  eventOptions: {
    emitRowEvents: true, // Emit events for each row
    emitProgressEvents: true, // Emit periodic progress
    progressInterval: 1000, // Progress every 1 second
  },
});

// Listen for events
client.on("started", (p) => console.log(`Starting ${p.totalEntities} entities`));
client.on("progress", (p) => console.log(`${p.progressPercent.toFixed(1)}% complete`));
client.on("entityStart", (p) => console.log(`Processing ${p.entityName}`));
client.on("entityComplete", (p) =>
  console.log(`${p.entityName}: ${p.metrics.successfulRows} rows`),
);
client.on("rowSuccess", (p) => console.log(`Row ${p.rowIndex} OK`));
client.on("rowFailure", (p) => console.error(`Row ${p.rowIndex} failed: ${p.error.message}`));
client.on("finished", (p) => console.log(`Done in ${p.durationMs}ms`));
client.on("errored", (p) => console.error(`Error: ${p.error.message}`));
client.on("cancelled", (p) => console.log(`Cancelled: ${p.reason}`));

// Handle graceful shutdown
process.on("SIGINT", () => client.cancel("User interrupted"));

await client.ingest("./config");

Available Events:

| Event | When Emitted | Key Payload Fields | | ---------------- | ------------------------ | ------------------------------------------------- | | started | Ingestion begins | configPath, entityNames, totalWaves | | progress | Periodic interval | progressPercent, successfulRows, failedRows | | entityStart | Entity processing begins | entityName, totalRows, waveIndex | | entityComplete | Entity processing ends | entityName, metrics, success | | rowSuccess | Row mutation succeeds | entityName, rowIndex, row, result | | rowFailure | Row mutation fails | entityName, rowIndex, error | | cancelled | Processing cancelled | reason, metrics, elapsedMs | | finished | Processing completes | metrics, durationMs, allSuccessful | | errored | Fatal error occurs | error, metrics, elapsedMs |

Cancellation Support

Cancel in-progress ingestion using the cancel() method or external AbortController:

// Method 1: Using cancel()
const client = new GQLIngest({ endpoint: "..." });
process.on("SIGINT", () => client.cancel("User interrupted"));
await client.ingest("./config");

// Method 2: Using external AbortController
const controller = new AbortController();
setTimeout(() => controller.abort("Timeout"), 60000);
await client.ingest("./config", { signal: controller.signal });

TypeScript Support

Full TypeScript support is included with comprehensive type definitions:

import type {
  GQLIngestOptions,
  IngestOptions,
  IngestResult,
  ProcessingMetrics,
  EntityMetrics,
  // Event types
  EventOptions,
  StartedEventPayload,
  ProgressEventPayload,
  EntityStartEventPayload,
  EntityCompleteEventPayload,
  RowSuccessEventPayload,
  RowFailureEventPayload,
  CancelledEventPayload,
  FinishedEventPayload,
  ErroredEventPayload,
} from "@jackchuka/gql-ingest";

Parallel Processing 🚀

GQL Ingest supports advanced parallel processing with dependency management for high-performance data ingestion:

Key Capabilities

Entity-level parallelism: Process multiple entities (users, products, orders) concurrently
Row-level parallelism: Process multiple CSV rows within an entity concurrently
Dependency management: Ensure entities process in the correct order (e.g., users before orders)
Smart batching: Control exactly how many entities/rows process simultaneously
Real-time metrics: Track progress, success rates, and performance

Quick Example

# config.yaml - Add to your configuration directory
parallelProcessing:
  concurrency: 10 # Process up to 10 CSV rows per entity concurrently
  entityConcurrency: 3 # Process up to 3 entities simultaneously
  preserveRowOrder: false # Allow rows to complete out of order for speed

# Define dependencies between entities
entityDependencies:
  products: ["users"] # Products must wait for users to complete
  orders: ["products"] # Orders must wait for products to complete

Performance Impact: This configuration can process data 10-50x faster than sequential processing, depending on your GraphQL API's capabilities.

👉 Full Parallel Processing Guide - Detailed configuration options, performance tuning, and examples.

Retry Capabilities 🔄

GQL Ingest includes robust retry functionality to handle transient failures and improve reliability:

Key Features

Automatic retries: Failed GraphQL mutations are retried automatically
Exponential backoff: Intelligent delay increases between retry attempts
Jitter: Randomization prevents thundering herd problems
Configurable error codes: Control which HTTP status codes trigger retries
Per-entity overrides: Different retry settings for different entities
Metrics tracking: Monitor retry success rates and attempt counts

Quick Example

# config.yaml - Add to your configuration directory
retry:
  maxAttempts: 5 # Retry up to 5 times (default: 3)
  baseDelay: 2000 # Start with 2s delay (default: 1000ms)
  maxDelay: 60000 # Cap delays at 60s (default: 30000ms)
  exponentialBackoff: true # Double delay each retry (default: true)
  retryableStatusCodes: # Which HTTP errors to retry (defaults shown)
    - 408 # Request Timeout
    - 429 # Too Many Requests
    - 500 # Internal Server Error
    - 502 # Bad Gateway
    - 503 # Service Unavailable
    - 504 # Gateway Timeout

# Per-entity retry overrides
entityConfig:
  critical-orders:
    retry:
      maxAttempts: 10 # More retries for critical data
      baseDelay: 500 # Faster initial retry

Reliability Impact: Retry capabilities can improve success rates from 95% to 99.9%+ for APIs with transient failures.

Selective Entity Processing

The --entities flag allows you to process specific entities instead of all discovered mappings:

Process multiple entities: --entities users,products,orders
Process a single entity: --entities items
Entities are processed in dependency order automatically
Missing dependencies will trigger a warning but not prevent execution

Note: When using --entities with entity dependencies defined in config.yaml, the tool will warn you about any missing dependencies but will still attempt to process the selected entities. Ensure dependent data exists in your GraphQL API before processing entities with unmet dependencies.

Configuration

The --config flag points to a configuration directory containing these necessary files:

mappings/ - JSON files that map CSV columns to GraphQL variables
config.yaml - (Optional) Parallel processing and dependency configuration

Each entity has three corresponding files across these directories with matching names.

Example Configuration

examples/demo/mappings/items.json:

{
  "dataFile": "data/items.csv",
  "dataFormat": "csv",
  "graphqlFile": "graphql/items.graphql",
  "mapping": {
    "name": "item_name",
    "sku": "item_sku"
  }
}

examples/demo/data/items.csv:

item_name,item_sku
Item1,item-1-sku
Item2,item-2-sku

examples/demo/graphql/items.graphql:

mutation CreateItem($name: String!, $sku: String!) {
  createItem(input: { name: $name, sku: $sku }) {
    id
    name
    sku
  }
}

examples/demo/config.yaml (Optional - for parallel processing and retry configuration):

# Parallel processing configuration
parallelProcessing:
  concurrency: 5 # Process 5 rows per entity concurrently
  entityConcurrency: 2 # Process 2 entities simultaneously
  preserveRowOrder: false # Allow faster out-of-order completion

# Global retry configuration
retry:
  maxAttempts: 3 # Retry failed requests up to 3 times
  baseDelay: 1000 # Start with 1s delay between retries
  exponentialBackoff: true # Double delay each retry

# Entity dependencies
entityDependencies:
  items: ["users"] # Items depend on users being processed first

# Per-entity overrides (optional)
entityConfig:
  users:
    retry:
      maxAttempts: 5 # More retries for user creation
  items:
    concurrency: 10 # Higher concurrency for items

Supported Data Formats 📄

GQL Ingest now supports multiple data formats beyond CSV for more flexible data ingestion, especially for complex nested GraphQL mutations:

Supported Formats

CSV - Traditional flat file format
JSON - Perfect for nested/complex data structures
YAML - Human-friendly alternative to JSON
JSONL - JSON Lines format for streaming large datasets

Format Selection

The tool automatically detects the format based on file extension, or you can specify it explicitly:

# Auto-detect from mapping configuration
gql-ingest --endpoint <url> --config ./config

# Force specific format
gql-ingest --endpoint <url> --config ./config --format json

JSON/YAML Format Examples

Direct Mapping (Entire Object)

For complex GraphQL mutations with nested input types, you can map the entire data object:

data/products.json:

[
  {
    "name": "Premium T-Shirt",
    "type": "PHYSICAL",
    "options": [
      {
        "name": "Color",
        "values": ["Red", "Blue", "Green"]
      },
      {
        "name": "Size",
        "values": ["S", "M", "L", "XL"]
      }
    ],
    "variants": [
      {
        "name": "Red Small",
        "sku": "TS-RED-S",
        "optionMappings": [
          { "name": "Color", "value": "Red" },
          { "name": "Size", "value": "S" }
        ]
      }
    ]
  }
]

mappings/products.json:

{
  "dataFile": "data/products.json",
  "dataFormat": "json",
  "graphqlFile": "graphql/newProduct.graphql",
  "mapping": {
    "input": "$" // Map entire object to input variable
  }
}

Path-Based Mapping

For transforming flat JSON into nested structures:

data/products-flat.json:

[
  {
    "product_name": "Notebook",
    "product_type": "PHYSICAL",
    "brand": "ACME"
  }
]

mappings/products-flat.json:

{
  "dataFile": "data/products-flat.json",
  "graphqlFile": "graphql/newProduct.graphql",
  "mapping": {
    "input": {
      "name": "$.product_name",
      "type": "$.product_type",
      "brandCode": "$.brand"
    }
  }
}

YAML Format

YAML provides a more readable alternative:

data/products.yaml:

- name: Premium T-Shirt
  type: PHYSICAL
  options:
    - name: Color
      values: [Red, Blue, Green]
    - name: Size
      values: [S, M, L, XL]
  variants:
    - name: Red Small
      sku: TS-RED-S
      optionMappings:
        - name: Color
          value: Red
        - name: Size
          value: S

Development

Scripts

pnpm run build       # Build CLI bundle with esbuild
pnpm run build:types # Generate TypeScript declarations
pnpm run build:all   # Build bundle + types
pnpm run dev         # Run in development mode
pnpm run test        # Run test suite

How It Works

Discovery: The tool scans the mappings/ directory for .json files
Dependency Resolution: Analyzes entityDependencies to create execution waves
Parallel Processing: For each dependency wave:
- Processes up to entityConcurrency entities simultaneously
- Within each entity, processes up to concurrency CSV rows concurrently
- Waits for the entire wave to complete before starting the next wave
GraphQL Execution: For each CSV row:
- Loads the GraphQL mutation definition
- Maps CSV columns to GraphQL variables using the mapping configuration
- Executes the mutation against the GraphQL endpoint
Error Handling & Retries:
- Failed mutations are automatically retried with exponential backoff
- Non-retryable errors (e.g., validation failures) are logged and skipped
- Configurable retry policies per entity type
Metrics & Monitoring:
- Real-time progress tracking and success/failure rates
- Retry attempt counts and success rates
- Detailed per-entity performance breakdown

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

GQL Ingest

Features

Installation

For End Users

For Development

Quick Start

Usage

CLI Commands

Initialize Configuration

Add Entity

Run Ingestion

CLI Examples

Programmatic API

Installation for API Usage

Basic API Usage

Processing Specific Entities

Advanced API Usage

API Methods

Event-Based Progress Monitoring

Cancellation Support

TypeScript Support

Parallel Processing 🚀

Key Capabilities

Quick Example

Retry Capabilities 🔄

Key Features

Quick Example

Selective Entity Processing

Configuration

Example Configuration

Supported Data Formats 📄

Supported Formats

Format Selection

JSON/YAML Format Examples

Direct Mapping (Entire Object)

Path-Based Mapping

YAML Format

Development

Scripts

How It Works

License