aggo-ai
v1.0.0
Published
AI plugin for aggo.js - Natural language to MongoDB pipeline conversion using OpenAI
Maintainers
Readme
@aggo/plugin-ai
🤖 AI-powered natural language to MongoDB pipeline conversion for aggo.js
Convert natural language queries into MongoDB aggregation pipelines using OpenAI, with automatic schema inference and optimized execution via aggo.
Installation
npm install @aggo/plugin-aiRequirements:
- Node.js 18+
- OpenAI API key
aggoas a peer dependency
Quick Start
Environment Setup
export OPENAI_API_KEY="your-openai-api-key"CLI Usage
# Basic natural language query
cat sales.jsonl | npx aggo-ai "total revenue by product category"
# Show inferred schema
cat data.jsonl | npx aggo-ai --schema-only
# Generate pipeline without executing
npx aggo-ai "average rating by genre" --file movies.jsonl --show-pipeline
# Use specific OpenAI model
cat logs.jsonl | npx aggo-ai "error count by service" --model gpt-4
# Get detailed explanation
npx aggo-ai "top 10 customers by order value" --file orders.jsonl --explainProgrammatic Usage
import { aiQuery, getSchema, generatePipeline } from '@aggo/plugin-ai';
const data = [
{ name: 'Alice', age: 30, department: 'Engineering', salary: 95000 },
{ name: 'Bob', age: 25, department: 'Marketing', salary: 75000 },
{ name: 'Carol', age: 35, department: 'Engineering', salary: 110000 },
];
// Execute natural language query
const result = await aiQuery(data, 'average salary by department');
console.log(result.results);
// Output: [
// { _id: 'Engineering', avgSalary: 102500 },
// { _id: 'Marketing', avgSalary: 75000 }
// ]
// Get schema information
const schema = getSchema(data);
console.log(schema);
// Output: {
// name: 'string',
// age: 'integer',
// department: 'string',
// salary: 'integer'
// }
// Generate pipeline only
const pipeline = await generatePipeline(
'count employees by department',
schema
);
console.log(pipeline.pipeline);
// Output: [
// { $group: { _id: '$department', count: { $sum: 1 } } }
// ]CLI Options
Usage: aggo-ai [query] [options]
Options:
-f, --file <path> Read data from file instead of stdin
--schema-only Show inferred schema without querying
--show-pipeline Print generated pipeline but don't run it
--limit-sample <n> Control rows sampled for schema inference
--model <model> Override default OpenAI model
--explain Include explanation of the generated pipeline
--pretty Pretty-print JSON output
--api-key <key> OpenAI API key (or use OPENAI_API_KEY env var)
-h, --help Show help messageAPI Reference
aiQuery(documents, query, options)
Executes a natural language query against documents.
Parameters:
documents: Array of documents to queryquery: Natural language query stringoptions: Configuration options
Returns: AIQueryResult with pipeline, results, schema, and performance metrics
getSchema(documents, options)
Infers schema from documents without executing a query.
Parameters:
documents: Array of documents to analyzeoptions: Schema inference options
Returns: Simplified schema object
generatePipeline(query, schema, samples, options)
Generates MongoDB pipeline from natural language without executing.
Parameters:
query: Natural language queryschema: Data schema or documents to infer fromsamples: Sample documents for contextoptions: OpenAI configuration
Returns: Generated pipeline with metadata
Natural Language Examples
| Query | Generated Pipeline |
| ---------------------------------------- | --------------------------------------------------------------------------------------------- |
| "sum revenue" | [{"$group": {"_id": null, "total": {"$sum": "$revenue"}}}] |
| "average score by category" | [{"$group": {"_id": "$category", "avg": {"$avg": "$score"}}}] |
| "top 5 users by points" | [{"$sort": {"points": -1}}, {"$limit": 5}] |
| "count orders where status is shipped" | [{"$match": {"status": "shipped"}}, {"$count": "total"}] |
| "revenue by month for 2023" | [{"$match": {"year": 2023}}, {"$group": {"_id": "$month", "revenue": {"$sum": "$amount"}}}] |
Configuration
OpenAI Models
Supported models (default: gpt-4-turbo-preview):
gpt-4-turbo-preview- Best accuracy, higher costgpt-4- Good balance of accuracy and costgpt-3.5-turbo- Faster, lower cost, less accurate
Schema Inference
Options for getSchema() and aiQuery():
sampleSize: Number of documents to sample (default: 100)maxDepth: Maximum depth for nested objects (default: 5)
Performance
The plugin provides detailed performance metrics:
const result = await aiQuery(data, 'query');
console.log(result.performance);
// {
// schemaInferenceMs: 15,
// pipelineGenerationMs: 850,
// executionMs: 5,
// totalMs: 870
// }Error Handling
Common errors and solutions:
Missing API Key
❌ Error: OpenAI API key is required
💡 Set OPENAI_API_KEY environment variable or use --api-key optionInvalid Query
❌ Error: Unable to generate valid pipeline
💡 Try rephrasing your query or check the schema with --schema-onlyAPI Limits
❌ Error: OpenAI API quota exceeded
💡 Check your OpenAI account usage and billingAdvanced Usage
Custom Model Configuration
const result = await aiQuery(data, 'query', {
model: 'gpt-3.5-turbo',
temperature: 0.2,
maxTokens: 800,
});Schema-First Development
// First, understand your data
const schema = getSchema(documents);
console.log('Available fields:', Object.keys(schema));
// Then craft specific queries
const result = await aiQuery(documents, 'average price by category');Batch Processing
const queries = ['total sales', 'average order value', 'top selling products'];
const results = await Promise.all(
queries.map(query => aiQuery(documents, query))
);Contributing
See CONTRIBUTING_PLUGINS.md for plugin development guidelines.
License
MIT
