erp-datagen
v0.4.3
Published
Generate realistic synthetic procurement data for SAP ECC, D365, and JDE — for testing, AI training, and demos
Maintainers
Readme
erp-datagen
Generate realistic synthetic procurement data for SAP ECC, Microsoft D365, and JDE — for testing, AI training, and demos.
The problem
Developers and data engineers building on ERP systems constantly need realistic test data. Real production data cannot be shared. Existing tools do not understand procurement domain structure — duplicate vendor names, multi-currency POs, three-way match scenarios, credit memos, invoice reversals, and the messy edge cases that actually matter.
This tool does.
Supported ERP systems
| ERP | Tables generated | |-----|-----------------| | SAP ECC | LFA1, LFB1, LFM1, EKKO, EKPO, MKPF, MSEG, RBKP, RSEG, BSAK, BSIK, EKBE, BSET, BKPF, BSEG, COEP, REGUH, REGUP, MARA, MAKT, ESSR, ESLL (22 tables) | | D365 F&O | VendTable, PurchTable, PurchLine, VendPackingSlipJour, VendInvoiceJour, VendInvoiceTrans, VendTrans (7 tables) | | JDE E1 | F0101, F4301, F4311, F43121, F0411, F0413 (6 tables) |
Requirements
- Node.js >= 18
Install
# Run without installing
npx erp-datagen --help
# Or install globally
npm install -g erp-datagen
# Or clone and run locally
git clone https://github.com/kundanshar-cell/erp-datagen.git
cd erp-datagen
npm installTwo ways to use it
1. Generate a single table
Use this when you need one specific table in isolation — vendors, PO headers, invoices, etc.
npx erp-datagen generate --erp=<erp> --entity=<entity> [options]| Option | Description | Default |
|---|---|---|
| --erp | ERP system: sap-ecc, jde, d365 | required |
| --entity | Table to generate (see list below) | required |
| --rows | Number of rows | 100 |
| --output | Format: csv, json, jsonl | csv |
| --file | Write to file instead of stdout | stdout |
| --missing-rate | Proportion of optional fields left blank (0–1) | 0 |
| --seed | Random seed for reproducible output | none |
Examples:
# 500 SAP vendors as CSV
npx erp-datagen generate --erp=sap-ecc --entity=vendors --rows=500
# 200 SAP PO headers as JSON, written to file
npx erp-datagen generate --erp=sap-ecc --entity=po-headers --rows=200 --output=json --file=./ekko.json
# JDE PO lines with 20% missing fields (simulates messy source data)
npx erp-datagen generate --erp=jde --entity=po-lines --rows=100 --missing-rate=0.2
# Reproducible output — same seed always produces same data
npx erp-datagen generate --erp=sap-ecc --entity=vendors --rows=100 --seed=42Supported entities per ERP
| ERP | --entity value | Table |
|-----|-----------------|-------|
| sap-ecc | vendors | LFA1 |
| sap-ecc | po-headers | EKKO |
| sap-ecc | po-lines | EKPO |
| sap-ecc | gr-headers | MKPF |
| sap-ecc | gr-lines | MSEG |
| sap-ecc | invoice-headers | RBKP |
| sap-ecc | invoice-lines | RSEG |
| jde | vendors | F0101 |
| jde | po-headers | F4301 |
| jde | po-lines | F4311 |
| jde | gr-lines | F43121 |
| jde | invoices | F0411 |
| d365 | vendors | VendTable |
| d365 | po-headers | PurchTable |
| d365 | po-lines | PurchLine |
| d365 | gr-headers | VendPackingSlipJour |
| d365 | invoice-headers | VendInvoiceJour |
| d365 | invoice-lines | VendInvoiceTrans |
2. Generate a full linked dataset (scenarios)
Scenarios generate all tables for an ERP linked by real document keys — vendor → PO → goods receipt → invoice → payment — in one command. One file per table, written to an output directory.
npx erp-datagen scenario --erp=<erp> --name=<scenario> [options]| Option | Description | Default |
|---|---|---|
| --erp | ERP system: sap-ecc, jde, d365 | required |
| --name | Scenario name (see below) | required |
| --rows | Approximate number of PO lines to anchor the dataset | 100 |
| --output | Format: csv, json, jsonl | csv |
| --output-dir | Directory to write files | ./output |
| --missing-rate | Proportion of optional fields left blank (0–1) | 0 |
| --seed | Random seed for reproducible output | none |
Scenarios
full-p2p
Generates the complete procure-to-pay chain for any ERP. All tables are linked by real document keys — the same referential integrity you would find in a live system.
# SAP ECC — 1000 PO lines, all 22 tables, CSV
npx erp-datagen scenario --erp=sap-ecc --name=full-p2p --rows=1000 --output-dir=./output
# JDE — full P2P as JSON
npx erp-datagen scenario --erp=jde --name=full-p2p --rows=500 --output=json --output-dir=./output/jde
# D365 — messy data with 30% missing fields
npx erp-datagen scenario --erp=d365 --name=full-p2p --rows=1000 --missing-rate=0.3 --output-dir=./output/d365What gets generated (SAP ECC, 1000 rows):
LFA1_vendors.csv ~100 rows Vendor master
LFB1_vendor_company.csv ~100 rows Vendor per company code
LFM1_vendor_purchasing.csv ~100 rows Vendor purchasing data
EKKO_po_headers.csv ~200 rows Purchase order headers
EKPO_po_lines.csv ~1000 rows Purchase order lines
MKPF_gr_headers.csv ~140 rows Goods receipt headers
MSEG_gr_lines.csv ~520 rows Goods receipt lines
RBKP_invoice_headers.csv ~155 rows Invoice headers
RSEG_invoice_lines.csv ~570 rows Invoice lines
BSAK_cleared_items.csv ~108 rows Cleared AP items (paid)
BSIK_open_items.csv ~47 rows Open AP items (unpaid)
EKBE_po_history.csv ~690 rows PO history (GR + invoice events)
BSET_tax_lines.csv ~120 rows Tax document lines
BKPF_fi_headers.csv ~400 rows FI document headers
BSEG_fi_lines.csv ~900 rows FI document lines
COEP_cost_lines.csv ~280 rows CO cost elements
REGUH_payment_runs.csv ~80 rows Payment run headers
REGUP_payment_items.csv ~108 rows Payment run items
MARA_material_master.csv ~300 rows Material master
MAKT_material_desc.csv ~300 rows Material descriptions
ESSR_service_sheets.csv ~40 rows Service entry sheets
ESLL_service_lines.csv ~80 rows Service line itemsspend-cube (SAP ECC only)
Generates the same 22 SAP ECC tables as full-p2p, but with two additions designed for spend analytics training:
- Every invoice and GR row carries a
SCENARIOlabel — so downstream models can learn to classify spend types - Six company codes with deliberate spend profiles — each company has a fixed PO vs non-PO ratio to represent different procurement maturity levels
# SAP ECC spend cube — 500 rows
npx erp-datagen scenario --erp=sap-ecc --name=spend-cube --rows=500 --output=json --output-dir=./output/spendCompany spend profiles:
| Company | Country | Currency | PO% | Non-PO% | Story | |---------|---------|----------|-----|---------|-------| | 1000 | Germany | EUR | 85% | 15% | Mature, SAP-native procurement | | GB01 | UK | GBP | 35% | 65% | Maverick spend — the problem company | | 2000 | USA | USD | 50% | 50% | Transitioning, partial compliance | | US01 | USA | USD | 60% | 40% | Compliance improving | | IN01 | India | INR | 80% | 20% | High PO discipline | | 3000 | Europe | EUR | 75% | 25% | Regional shared services centre |
SCENARIO labels on every row:
| SCENARIO | What it represents | |---|---| | PO_NORMAL | Standard PO → GR → Invoice → Payment | | PO_SERVICE | Service PO — no goods receipt, ESSR/ESLL instead | | PO_FRAMEWORK | Framework order drawdown | | PO_CONSIGNMENT | Consignment settlement | | NON_PO_STANDARD | Non-PO invoice — rent, utilities, subscriptions | | NON_PO_CREDIT | Credit memo against a non-PO invoice | | CREDIT_MEMO | KG — price dispute, quality claim, returns | | DEBIT_MEMO | KA — vendor underbilled, price corrected upward | | INVOICE_REVERSAL | Invoice cancelled and reposted (wrong vendor or amount) | | SUBSEQUENT_CREDIT | Price corrected down after invoice was posted | | SUBSEQUENT_DEBIT | Price corrected up after invoice was posted | | SPLIT_INVOICE | One PO line invoiced across two separate invoices | | GR_REVERSAL | Movement 102 — wrong delivery returned to stock | | RETURN_TO_VENDOR | Movement 122 — physical return, vendor credit expected | | PARTIAL_GR | Goods receipt for less than the PO quantity | | PO_LINE_CANCELLED | PO line cancelled (LOEKZ=L) — committed spend removed |
Edge cases included
- Duplicate vendor names with different formats (
ACME Ltd,Acme Limited,ACME LIMITED) — for dedup testing - Configurable missing fields (
--missing-rate) to simulate messy source data - Multi-currency: GBP, USD, EUR, INR, SGD, JPY — with realistic exchange rates
- Multi-language vendor names (English, German, French, Japanese, Hindi)
- Realistic document numbering per ERP convention
- Three-way match (PO → GR → Invoice) with referential integrity enforced
- Partial deliveries — GR quantity less than PO quantity
- Invoice price variance against PO price
- Credit memos, debit memos, invoice reversals
- Subsequent credits and debits referencing original invoices
- GR reversals (movement 102) and returns to vendor (movement 122)
- Service lines — 2-way match only (no goods receipt)
- Blocked and deletion-flagged vendors
- Cancelled PO lines
- Realistic VAT/tax amounts per country
Output formats
| Format | Flag | Use case |
|--------|------|----------|
| CSV | --output=csv | Excel, database import, BI tools |
| JSON | --output=json | APIs, application testing |
| JSONL | --output=jsonl | AI/ML training pipelines, streaming |
SQL and Parquet formats are on the roadmap.
Roadmap
- [x] SAP ECC — 22 linked tables
- [x] JDE E1 — 6 linked tables
- [x] D365 F&O — 7 linked tables
- [x] full-p2p scenario — all ERPs
- [x] spend-cube scenario — SAP ECC with 16 spend scenario labels
- [x] JSONL output for AI training pipelines
- [x] Reproducible output with
--seed - [ ] spend-cube scenario for JDE and D365
- [ ] SQL and Parquet output formats
- [ ] Oracle Fusion Procurement
- [ ] Coupa supplier and PO entities
- [ ] Web UI for no-code data generation
Who is this for
- Developers building ERP integrations who need realistic test data without touching production
- Data engineers building procurement analytics pipelines
- AI/ML teams training models on procurement data — classification, extraction, dedup
- Consultants demoing ERP tools without production data
Contributing
PRs welcome — especially for Oracle Fusion, Coupa, and Ariba schemas. See CONTRIBUTING.md for guidelines.
Author
Built by Kundan Sharma — IT & Digital Solution Architect specialising in procurement data transformation and agentic AI in enterprise supply chains.
15+ years designing and delivering digital transformation programmes across enterprise.
If this saved you time, leave a star.
License
MIT — see LICENSE for details.
