dataweave
v1.0.1
Published
AI-assisted CLI for modern data pipelines with DBT, Dagster, and Supabase integration
Maintainers
Readme
🌊 Dataweave
AI-Assisted CLI for Modern Data Pipelines
Dataweave is an intelligent command-line interface that accelerates data pipeline development by combining DBT (data transformation), Dagster (orchestration), and Supabase (backend) with AI-powered code generation and scaffolding.
✨ Features
🚀 Project Scaffolding
- Initialize complete data pipeline projects with one command
- Automatic setup of DBT, Dagster, and Supabase integrations
- Configurable project templates and structures
🤖 AI-Powered Development
- Generate DBT models from natural language descriptions
- Create Dagster assets with intelligent dependency mapping
- AI-driven code explanation and optimization suggestions
- Automatic documentation generation
🔧 Modern Stack Integration
- DBT: Model generation, testing, schema management
- Dagster: Asset creation, job scheduling, pipeline validation
- Supabase: Database integration, migration management
- Cross-tool workflows: DBT models → Dagster assets
📦 Developer Experience
- Comprehensive CLI with 20+ commands
- Intelligent error handling and validation
- Built-in testing and coverage tools
- Professional terminal UI with progress indicators
🚀 Installation
Global Installation (Recommended)
npm install -g dataweaveDirect Usage with npx
npx dataweave init my-data-project🎯 Quick Start
1. Initialize a New Project
# Create a full-featured data pipeline project
dataweave init my-pipeline
# Initialize with specific features
dataweave init my-pipeline --no-supabase --template minimal2. Generate DBT Models
cd my-pipeline
# Create a basic model
dataweave dbt:model:new user_metrics --materialized table
# Generate with AI assistance
dataweave ai:generate:dbt "Create a model that calculates monthly active users"
# Generate with custom SQL
dataweave dbt:model:new revenue_model --sql "select sum(amount) from orders"3. Create Dagster Assets
# Generate a data processing asset
dataweave dagster:asset:new data_processor --deps "raw_users,raw_orders"
# Create with AI assistance
dataweave ai:generate:dagster "Build an asset that processes customer data"
# Generate DBT-Dagster integration
dataweave dagster:dbt:asset user_metrics4. AI-Powered Development
# Explain existing code
dataweave ai:explain data/dbt/models/marts/fct_orders.sql
# Get optimization suggestions
dataweave ai:optimize data/dagster/assets/user_processor.py
# Generate documentation
dataweave ai:document user_metrics📋 Command Reference
Project Management
dataweave init [name] # Initialize new project
dataweave info # Display project informationDBT Integration
dataweave dbt:model:new <name> # Generate new DBT model
dataweave dbt:run [model] # Run DBT models
dataweave dbt:test [model] # Test DBT models
dataweave dbt:compile [model] # Compile DBT models
dataweave dbt:docs # Generate documentation
dataweave dbt:introspect # Analyze database schemaDagster Integration
dataweave dagster:asset:new <name> # Create Dagster asset
dataweave dagster:job:new <name> # Create Dagster job
dataweave dagster:dbt:asset <model> # DBT-Dagster integration
dataweave dagster:dev # Start development server
dataweave dagster:validate # Validate pipeline configAI-Powered Features
dataweave ai:generate:dbt <prompt> # Generate DBT model with AI
dataweave ai:generate:dagster <prompt> # Generate Dagster asset with AI
dataweave ai:explain <file> # Explain code with AI
dataweave ai:optimize <file> # Get optimization suggestions
dataweave ai:document <model> # Generate documentation🏗️ Project Structure
Dataweave creates a comprehensive project structure:
my-pipeline/
├── .dataweave/ # Configuration
├── data/
│ ├── dbt/ # DBT models, tests, docs
│ │ ├── models/
│ │ │ ├── staging/ # Raw data models
│ │ │ ├── intermediate/ # Business logic
│ │ │ └── marts/ # Final data products
│ │ ├── macros/ # Reusable SQL
│ │ └── tests/ # Data tests
│ ├── dagster/ # Orchestration
│ │ ├── assets/ # Data assets
│ │ ├── jobs/ # Pipeline jobs
│ │ ├── schedules/ # Automation
│ │ └── sensors/ # Event triggers
│ └── assets/ # Shared resources
├── supabase/ # Database & backend
│ ├── migrations/ # Schema changes
│ └── functions/ # Edge functions
├── config/ # Configuration files
└── README.md # Project documentation🔧 Configuration
Environment Variables
# Database connection
DATABASE_URL=postgresql://user:pass@host:5432/db
# Supabase integration
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_ANON_KEY=your-anon-key
# AI/LLM integration
OPENAI_API_KEY=your-openai-key
# Dagster configuration
DAGSTER_HOME=./data/dagsterProject Configuration
{
"name": "my-pipeline",
"version": "1.0.0",
"dbt": {
"enabled": true,
"profile": "dataweave",
"target": "dev"
},
"dagster": {
"enabled": true,
"workspace": "./data/dagster"
},
"supabase": {
"enabled": true
},
"ai": {
"enabled": true,
"provider": "openai",
"model": "gpt-4"
}
}🧪 Testing
Dataweave includes comprehensive testing tools:
# Run all tests
npm test
# Run specific test types
npm run test:unit
npm run test:integration
npm run test:coverage
# Manual testing
./test-runner.sh📚 Documentation
- Getting Started Guide - Comprehensive setup and usage
- API Reference - Complete command documentation
- Testing Guide - Testing strategies and examples
- Contributing Guide - Development guidelines
🌟 Examples
E-commerce Analytics Pipeline
# Initialize project
dataweave init ecommerce-analytics
cd ecommerce-analytics
# Generate staging models
dataweave dbt:model:new stg_customers --materialized view
dataweave dbt:model:new stg_orders --materialized view
# Create business logic
dataweave ai:generate:dbt "Calculate customer lifetime value" --name customer_ltv
# Build orchestration
dataweave dagster:asset:new customer_segmentation --deps "stg_customers,customer_ltv"
# Run the pipeline
dataweave dbt:run
dataweave dagster:devReal-time Analytics
# AI-powered model generation
dataweave ai:generate:dbt "Create hourly active user metrics with real-time updates"
# Event-driven processing
dataweave dagster:asset:new event_processor --schedule "*/5 * * * *"
# Supabase integration
dataweave supabase:connect🤝 Contributing
We welcome contributions! Please see our Contributing Guide for details.
Development Setup
# Clone the repository
git clone https://github.com/yourusername/dataweave.git
cd dataweave
# Install dependencies
npm install
# Run tests
npm test
# Build the CLI
npm run build📝 Changelog
See CHANGELOG.md for release history.
🆘 Support
- GitHub Issues: Report bugs or request features
- Documentation: Complete guides and examples
- Community: Join our discussions
📄 License
MIT © Dataweave Contributors
Built with ❤️ for the modern data stack
Accelerating data pipeline development through intelligent automation
