@symbiosedb/auto-seed
v1.0.1
Published
Auto-seed functionality for SymbioseDB - generate realistic mock data for all database types
Maintainers
Readme
Auto-Seed: Smart Mock Data Generation
Intelligent mock data generation across all 4 database types with cross-DB relationship awareness
Generate realistic test data automatically for PostgreSQL, Vector, Graph, and Blockchain databases with full foreign key awareness and cross-database consistency.
✨ Features
- ✅ Foreign Key Intelligence - Automatically picks existing IDs for FK columns
- ✅ Cross-Database Aware - Maintains ID consistency across SQL/Vector/Graph/Blockchain
- ✅ Dependency Resolution - Automatically determines correct seed order using topological sort
- ✅ Realistic Data - Context-aware generation (emails, names, addresses, etc.)
- ✅ Reproducible - Seed support for consistent test data
- ✅ Locale Support - Generate region-specific data (en_US, fr_FR, etc.)
- ✅ 100% Test Coverage - 38/38 tests passing, strict TDD methodology
🚀 Quick Start
import { SeedOrchestrator } from '@symbiosedb/auto-seed';
const orchestrator = new SeedOrchestrator();
// Define your schema
const usersSchema = {
dbType: 'sql',
tableName: 'users',
columns: [
{ name: 'id', type: 'uuid', nullable: false, isPrimaryKey: true },
{ name: 'email', type: 'string', nullable: false },
{ name: 'name', type: 'string', nullable: false },
],
primaryKeys: [],
foreignKeys: [],
uniqueConstraints: [],
};
// Seed 100 users
const result = await orchestrator.seedTable(usersSchema, 100);
console.log(result);
// {
// tableName: 'users',
// dbType: 'sql',
// recordsCreated: 100,
// duration: 45, // milliseconds
// preview: [
// { id: '123e4567-e89b-12d3-a456-426614174000', email: '[email protected]', name: 'John Doe' },
// { id: '223e4567-e89b-12d3-a456-426614174001', email: '[email protected]', name: 'Jane Smith' },
// // ... first 5 records
// ]
// }📦 Components
1. SchemaAnalyzer
Analyzes table/collection schemas to detect constraints and relationships.
import { SchemaAnalyzer } from '@symbiosedb/auto-seed';
const analyzer = new SchemaAnalyzer();
// Detect primary keys
const primaryKeys = analyzer.detectPrimaryKeys(schema);
// Detect foreign keys
const foreignKeys = analyzer.detectForeignKeys(schema);
// Detect unique constraints
const constraints = analyzer.detectUniqueConstraints(schema);
// Works with all 4 DB types: SQL, Vector, Graph, Blockchain2. DependencyGraph
Builds dependency graph and determines correct seed order.
import { DependencyGraph } from '@symbiosedb/auto-seed';
const graph = new DependencyGraph();
// Add tables
graph.addTable('users', usersSchema);
graph.addTable('posts', postsSchema);
graph.addTable('comments', commentsSchema);
// Get topological order (users → posts → comments)
const order = graph.getTopologicalOrder();
// ['users', 'posts', 'comments']
// Detect circular dependencies
const hasCycles = graph.hasCycles(); // false3. CrossDBRegistry
Tracks generated IDs across all 4 database types for consistency.
import { CrossDBRegistry } from '@symbiosedb/auto-seed';
const registry = new CrossDBRegistry();
// Register IDs for SQL users
registry.registerID('users', 'sql', 'user-1');
registry.registerID('users', 'sql', 'user-2');
// When seeding Vector collection, pick random user ID
const userId = registry.getRandomID('users'); // 'user-1' or 'user-2'
// Ensure same user has same ID across all DBs
registry.registerID('user_embeddings', 'vector', 'emb-1');
registry.registerID('user_nodes', 'graph', 'user-1'); // Same ID!4. SmartDataGenerator
Generates realistic data with context awareness and FK intelligence.
import { SmartDataGenerator } from '@symbiosedb/auto-seed';
const generator = new SmartDataGenerator();
// Context-aware generation
const emailColumn = { name: 'email', type: 'string', nullable: false };
const email = generator.generateValue(emailColumn, registry);
// '[email protected]'
const nameColumn = { name: 'name', type: 'string', nullable: false };
const name = generator.generateValue(nameColumn, registry);
// 'John Doe'
// FK-aware generation
const postsSchema = {
// ... schema with user_id FK
foreignKeys: [
{ columnName: 'user_id', referencedTable: 'users', referencedColumn: 'id' }
]
};
const post = generator.generateRecord(postsSchema, registry);
// { id: '...', user_id: 'user-1', title: '...' }
// user_id automatically picked from registry!
// Locale support
generator.setLocale('fr_FR'); // French cities, names, etc.
// Reproducible with seed
generator.setSeed(12345);Supported column types:
uuid→ UUIDsstring→ Context-aware (email, name, phone, address, etc.)text→ Paragraphs, descriptionsinteger,bigint,float,decimal→ Numbersboolean→ True/falsedate,timestamp→ Datesjson→ JSON objectsvector,embedding→ Arrays of random numbers (configurable dimensions)
Context-aware column names:
email→ faker.internet.email()name,first_name,last_name→ faker.person.*phone→ faker.phone.number()address,city,country→ faker.location.*title→ faker.lorem.sentence()description,content→ faker.lorem.paragraph()price,amount→ faker.commerce.price()
5. SeedOrchestrator
Coordinates multi-table seeding with automatic dependency resolution.
import { SeedOrchestrator } from '@symbiosedb/auto-seed';
const orchestrator = new SeedOrchestrator();
// Seed multiple tables in correct order
const results = await orchestrator.seedMultipleTables([
{ tableName: 'comments', dbType: 'sql', count: 100, schema: commentsSchema },
{ tableName: 'users', dbType: 'sql', count: 10, schema: usersSchema },
{ tableName: 'posts', dbType: 'sql', count: 50, schema: postsSchema },
]);
// Automatically seeds in order: users → posts → comments
// results[0].tableName === 'users'
// results[1].tableName === 'posts'
// results[2].tableName === 'comments'
// Seed with related tables automatically
orchestrator.registerSchema(usersSchema);
orchestrator.registerSchema(postsSchema);
const results = await orchestrator.seedRelatedTables(postsSchema, 20);
// Automatically seeds users first, then posts
// Options
await orchestrator.seedTable(schema, 100, {
locale: 'fr_FR', // French data
seed: 12345, // Reproducible
reset: true, // Clear existing data first
});🔗 Cross-Database Seeding
Auto-Seed intelligently handles relationships across all 4 database types:
// SQL: users table
const usersSchema = {
dbType: 'sql',
tableName: 'users',
columns: [
{ name: 'id', type: 'uuid', nullable: false, isPrimaryKey: true },
{ name: 'name', type: 'string', nullable: false },
],
primaryKeys: [],
foreignKeys: [],
uniqueConstraints: [],
};
// Vector: user embeddings (references SQL users)
const embeddingsSchema = {
dbType: 'vector',
tableName: 'user_embeddings',
columns: [
{ name: 'id', type: 'uuid', nullable: false, isPrimaryKey: true },
{ name: 'user_id', type: 'uuid', nullable: false }, // Cross-DB FK!
{ name: 'embedding', type: 'vector', nullable: false, dimensions: 128 },
],
primaryKeys: [],
foreignKeys: [
{
columnName: 'user_id',
referencedTable: 'users',
referencedColumn: 'id',
referencedDBType: 'sql', // Cross-DB reference!
},
],
uniqueConstraints: [],
};
// Graph: user nodes (same IDs as SQL)
const userNodesSchema = {
dbType: 'graph',
tableName: 'user_nodes',
columns: [
{ name: 'id', type: 'uuid', nullable: false, isPrimaryKey: true },
{ name: 'label', type: 'string', nullable: false },
],
primaryKeys: [],
foreignKeys: [],
uniqueConstraints: [],
};
// Blockchain: user creation attestations
const attestationsSchema = {
dbType: 'blockchain',
tableName: 'user_attestations',
columns: [
{ name: 'id', type: 'uuid', nullable: false, isPrimaryKey: true },
{ name: 'user_id', type: 'uuid', nullable: false },
{ name: 'hash', type: 'string', nullable: false },
],
primaryKeys: [],
foreignKeys: [],
uniqueConstraints: [],
};
// Seed all 4 DBs with consistent IDs
const results = await orchestrator.seedMultipleTables([
{ tableName: 'users', dbType: 'sql', count: 10, schema: usersSchema },
{ tableName: 'user_embeddings', dbType: 'vector', count: 10, schema: embeddingsSchema },
{ tableName: 'user_nodes', dbType: 'graph', count: 10, schema: userNodesSchema },
{ tableName: 'user_attestations', dbType: 'blockchain', count: 10, schema: attestationsSchema },
]);
// Result:
// - 10 users in SQL with IDs: user-1, user-2, ..., user-10
// - 10 embeddings in Vector with user_id referencing user-1 to user-10
// - 10 nodes in Graph with same IDs: user-1, user-2, ..., user-10
// - 10 attestations in Blockchain referencing user-1 to user-10
// All have consistent IDs across all 4 databases!📊 Test Coverage
44/44 tests passing (100%)
| Component | Tests | Status | |-----------|-------|--------| | SchemaAnalyzer | 8 | ✅ | | DependencyGraph | 5 | ✅ | | CrossDBRegistry | 6 | ✅ | | SmartDataGenerator | 10 | ✅ | | SeedOrchestrator | 9 | ✅ | | Integration Tests | 6 | ✅ | | Total | 44 | ✅ |
All tests follow strict TDD methodology (RED → GREEN → REFACTOR).
🔗 Integration Testing
The Auto-Seed system includes comprehensive integration tests that verify cross-database seeding scenarios:
SQL + Vector Integration
// Seed SQL users, then Vector embeddings with matching user_id FKs
await orchestrator.seedTable(usersSchema, 10);
await orchestrator.seedTable(embeddingsSchema, 10);
// ✅ All embeddings reference valid user IDs from SQL tableSQL + Graph Integration
// Seed SQL users, then Graph nodes with same user IDs
await orchestrator.seedTable(usersSchema, 5);
await orchestrator.seedTable(graphNodesSchema, 5);
// ✅ All graph nodes use same user IDs as SQL tableSQL + Blockchain Integration
// Seed SQL transactions, then Blockchain attestations
await orchestrator.seedTable(transactionsSchema, 20);
await orchestrator.seedTable(attestationsSchema, 20);
// ✅ All attestations reference valid transaction IDsAll 4 DB Types Integration
// Seed users across all 4 database types with consistent IDs
await orchestrator.seedTable(sqlUsersSchema, 5); // SQL
await orchestrator.seedTable(vectorEmbeddings, 5); // Vector
await orchestrator.seedTable(graphNodes, 5); // Graph
await orchestrator.seedTable(blockchainAttestations, 5); // Blockchain
// ✅ Same user IDs across ALL 4 databasesComplex FK Constraints
// departments → employees → tasks (3-level FK chain)
await orchestrator.seedTable(departmentsSchema, 3); // 3 departments
await orchestrator.seedTable(employeesSchema, 10); // 10 employees
await orchestrator.seedTable(tasksSchema, 20); // 20 tasks
// ✅ FK integrity maintained: tasks → employees → departmentsCircular Dependency Detection
// table_a references table_b, table_b references table_a
orchestrator.registerSchema(tableASchema);
orchestrator.registerSchema(tableBSchema);
const graph = orchestrator['dependencyGraph'];
graph.hasCycles(); // returns true
// ✅ Cycle detected, topological sort returns empty array
// Manual intervention required (seed with nullable FKs first)🧪 Running Tests
# Run all tests
npm test
# Run with coverage
npm run test:coverage
# Watch mode
npm run test:watch🏗️ Architecture
Auto-Seed System
├─ SchemaAnalyzer (FK detection, constraint analysis)
├─ DependencyGraph (Topological sort, cycle detection)
├─ CrossDBRegistry (ID tracking across all 4 DBs)
├─ SmartDataGenerator (Realistic data with FK awareness)
└─ SeedOrchestrator (Multi-table coordination)How it works:
- Schema Analysis - Detect primary keys, foreign keys, constraints
- Dependency Resolution - Build dependency graph, determine seed order
- Data Generation - Generate realistic data respecting FK constraints
- ID Tracking - Register all generated IDs in cross-DB registry
- Cross-DB Consistency - Ensure same entity has same ID across all 4 DBs
📝 TypeScript Types
export type DBType = 'sql' | 'vector' | 'graph' | 'blockchain';
export type DataType =
| 'uuid'
| 'integer'
| 'bigint'
| 'float'
| 'decimal'
| 'string'
| 'text'
| 'boolean'
| 'date'
| 'timestamp'
| 'json'
| 'vector'
| 'embedding';
export interface TableSchema {
dbType: DBType;
tableName: string;
columns: Column[];
primaryKeys: Column[];
foreignKeys: ForeignKey[];
uniqueConstraints: Constraint[];
}
export interface SeedResult {
tableName: string;
dbType: DBType;
recordsCreated: number;
duration: number; // milliseconds
preview: Record<string, any>[]; // First 5 records
errors?: string[];
}🎯 Use Cases
1. Unit Testing
// Seed test data for unit tests
beforeEach(async () => {
await orchestrator.seedTable(usersSchema, 10, { seed: 12345 });
await orchestrator.seedTable(postsSchema, 50, { seed: 12345 });
});2. Integration Testing
// Seed realistic data for integration tests
const results = await orchestrator.seedMultipleTables([
{ tableName: 'users', dbType: 'sql', count: 100, schema: usersSchema },
{ tableName: 'posts', dbType: 'sql', count: 500, schema: postsSchema },
{ tableName: 'comments', dbType: 'sql', count: 2000, schema: commentsSchema },
]);3. Demo Environments
// Seed demo database with realistic data
await orchestrator.seedTable(usersSchema, 1000, { locale: 'en_US' });
await orchestrator.seedTable(productsSchema, 500, { locale: 'en_US' });
await orchestrator.seedTable(ordersSchema, 5000, { locale: 'en_US' });4. Performance Testing
// Seed large datasets for performance testing
await orchestrator.seedTable(schema, 1000000, { reset: true });📖 API Reference
See PLAN.md for comprehensive implementation plan and architecture details.
🚀 Next Steps (Phase 2)
- REST API endpoints (
POST /api/.../tables/:tableID/mock) - CLI command (
symbiosedb seed <table> --count 100) - Studio UI component ("Generate Mock Data" button)
📄 License
MIT
Built with SymbioseDB - The Beautiful Database for Everything™
Auto-Seed Phase 1 Complete ✓ (November 2024)
