openapi2csv
v1.2.0
Published
Memory-efficient Node.js tool to convert large OpenAPI specifications into CSV format for RAG systems
Downloads
12
Readme
openapi2csv
A Node.js utility that converts large OpenAPI specification files into CSV format, specifically designed for use with RAG (Retrieval-Augmented Generation) systems. The tool handles large specifications efficiently through batch processing and smart schema selection.
Features
- Processes large OpenAPI specifications (tested with 30MB+ files)
- Memory-efficient batch processing
- Smart schema selection (only includes relevant schemas per endpoint)
- Handles both JSON and YAML OpenAPI specifications
- Configurable batch size for memory optimization
- Automatic Node.js heap size management
- Progress tracking and detailed logging
Installation
Global Installation (Recommended)
npm install -g openapi2csvLocal Installation
- Clone the repository:
git clone https://github.com/javimosch/openapi2csv.git
cd openapi2csv- Install dependencies:
npm installUsage
Using Global Command
openapi2csv -i ./spec.jsonUsing Local Installation
npm start -- -i ./spec.jsonAll available options:
openapi2csv [options]
Options:
-i, --input <file> Input OpenAPI specification file (JSON or YAML)
-o, --output <dir> Output directory for CSV files (default: "./output")
-f, --format <format> Input format: json or yaml (default: "json")
--output-format <format> Output format: default or csv-to-rag (default: "default")
-b, --batch-size <number> Batch size for processing (default: 100)
-d, --delimiter <char> CSV delimiter character (default: ";")
-dh, --delimiter-header <char> CSV delimiter for header row (defaults to data delimiter)
-c, --control Pre-check for data delimiter conflicts and abort if found
-v, --verbose Enable verbose logging
### Output Format Options
- **default**: The standard format with the following columns:
- ENDPOINT
- METHOD
- SUMMARY
- DESCRIPTION
- PARAMETERS
- REQUEST_BODY
- RESPONSES
- TAGS
- SECURITY
- SERVERS
- SCHEMAS
- **csv-to-rag**: Optimized format for RAG systems with the following columns:
- code
- metadata_small
- metadata_big_1
- metadata_big_2
- metadata_big_3
### Custom Delimiters
The tool supports any delimiter character or string for CSV output:
```bash
# Use pipe delimiter
openapi2csv -i spec.json -d "|"
# Use tab delimiter
openapi2csv -i spec.json -d "\t"
# Use different delimiters for header vs data
openapi2csv -i spec.json -d "|" -dh ","
# Use multi-character delimiter
openapi2csv -i spec.json -d "###"Delimiter Conflict Detection
Use the --control option to pre-check for delimiter conflicts in your data:
# Check for conflicts before processing
openapi2csv -i spec.json -d "|" --control
# If conflicts are found, you'll see detailed information:
# DELIMITER CONFLICT DETECTED!
# Found 6 conflict(s) with delimiter "|":
# 1. Location: parameter description
# Path: GET /api/path.parameters[0]
# Value: "Use vehicle|driver|round"
# Use a safe delimiter instead
openapi2csv -i spec.json -d "###" --controlDelimiter Option
- You can specify a custom delimiter using the
--delimiteroption. The default is;.
Output Format
The tool generates a CSV file (api_spec.csv) with the following columns:
ENDPOINT: The API endpoint pathMETHOD: HTTP method (GET, POST, etc.)SUMMARY: Brief description of the endpointDESCRIPTION: Detailed description of the endpointPARAMETERS: JSON stringified object containing all parametersREQUEST_BODY: JSON stringified schema of request bodyRESPONSES: JSON stringified object containing possible responsesTAGS: Array of endpoint tagsSECURITY: JSON stringified security requirementsSERVERS: JSON stringified server configurationsSCHEMAS: JSON stringified relevant schemas
For large objects (>1MB), the tool provides a summary instead of the full object:
{
"note": "Object too large, showing summary",
"type": "object",
"length": 42
}Memory Management
The tool automatically manages memory usage through:
- Batch processing of endpoints
- Limiting JSON string sizes to 1MB
- Automatic Node.js heap size increase (8GB)
- Smart schema selection
Error Handling
The tool includes comprehensive error handling:
- Graceful handling of large objects
- Detailed error messages and stack traces
- Safe JSON stringification
- Progress tracking for debugging
Requirements
- Node.js v14 or higher
- Sufficient system memory (recommended: 8GB+)
Dependencies
- commander: CLI argument parsing
- fs-extra: Enhanced file system operations
- csv-writer: CSV file generation
- js-yaml: YAML parsing support
License
MIT
Contributing
- Fork the repository
- Create your feature branch
- Commit your changes
- Push to the branch
- Create a new Pull Request
