@arela/uploader

v1.0.12

Published

10 days ago

CLI to upload files/directories to Arela

0High
0Medium
0Low

inspiracode

arela cli uploader record-keeping

arela-uploader

CLI tool to upload files and directories to Arela API or Supabase Storage with automatic file processing, detection, and organization.

✨ What's New in v0.4.0

🏢 Simplified Multi-Tenant API: Only 3 targets: default, agencia, cliente
🔀 Cross-Tenant Mode: Read from one API, write to another with --source-api and --target-api
⚙️ Dynamic Client Config: Switch clients by updating .env - no code changes needed!
👁️ Enhanced Watch Mode: Full cross-tenant support in automatic processing pipeline
⚡ Optimized Connections: HTTP Agent with connection pooling for high performance

🚀 OPTIMIZED 4-PHASE WORKFLOW

New in v0.2.0: The tool now supports an optimized 4-phase workflow designed for maximum performance when processing large file collections:

Phase 1: Filesystem Stats Collection 📊

arela --stats-only

⚡ ULTRA FAST: Only reads filesystem metadata (no file content)
📈 Bulk database operations: Processes 1000+ files per batch
🔄 Upsert optimization: Handles duplicates efficiently
💾 Minimal memory usage: No file content loading

Phase 2: PDF Detection 🔍

arela --detect-pdfs

🎯 Targeted processing: Only processes PDF files from database
� Pedimento-simplificado detection: Extracts RFC, pedimento numbers, and metadata
🔄 Batched processing: Handles large datasets efficiently
📊 Progress tracking: Real-time detection statistics

Phase 3: Path Propagation �📁

arela --propagate-arela-path

🎯 Smart path copying: Propagates arela_path from pedimento documents to related files
📦 Batch updates: Processes files in groups for optimal database performance
🔗 Relationship mapping: Links supporting documents to their pedimento

Phase 4: RFC-based Upload 🚀

arela --upload-by-rfc

🎯 Targeted uploads: Only uploads files for specified RFCs
📋 Supporting documents: Includes all related files, not just pedimentos
🏗️ Structure preservation: Maintains proper folder hierarchy

Combined Workflow 🎯

# Run all 4 phases in sequence (recommended)
arela --run-all-phases

# Or run phases individually for more control
arela --stats-only           # Phase 1: Collect filesystem stats
arela --detect-pdfs          # Phase 2: Detect pedimento documents  
arela --propagate-arela-path # Phase 3: Propagate paths to related files
arela --upload-by-rfc        # Phase 4: Upload by RFC

Performance Benefits

Before optimization (single phase with detection):

🐌 Read every file for detection
💾 High memory usage
🔄 Slow database operations
❌ Process unsupported files

After optimization (4-phase approach):

⚡ 10x faster: Phase 1 only reads filesystem metadata
📊 Bulk operations: Database inserts up to 1000 records per batch
🎯 Targeted processing: Phase 2 only processes PDFs needing detection
💾 Memory efficient: No unnecessary file content loading
🔄 Optimized I/O: Separates filesystem, database, and network operations

Features

📁 Upload entire directories or individual files
🤖 Automatic file detection and organization (API mode)
🗂️ Smart year/pedimento auto-detection from file paths
🏗️ Custom folder structure support
🔄 Automatic file renaming to handle problematic characters
📝 Comprehensive logging (local and remote)
⚡ Retry mechanism for failed uploads
🎯 Skip duplicate files automatically
📊 Progress bars and detailed summaries
📂 Preserve directory structure with auto-organization
🚀 Batch processing with configurable concurrency
🔧 Performance optimizations with caching
📋 Upload files by specific RFC values
🔍 Propagate arela_path from pedimento documents to related files
⚡ 4-Phase optimized workflow for maximum performance
👁️ Watch Mode - Monitor directories for changes and upload automatically
- Multiple watch strategies (batch, individual, full-structure)
- Multi-tenant and cross-tenant support ⭐ NEW
- Debounce and polling support
- Auto-processing pipeline
- Dry-run mode for testing
- Pattern-based file ignoring

🏢 Multi-Tenant API Support

Connect to different API instances: default, agencia, or cliente.

# Upload to client API
arela upload --api cliente --upload-by-rfc

# Collect stats on agencia API
arela stats --api agencia

# Watch mode with specific API target
arela watch --api cliente

Cross-Tenant Mode

Process files from one tenant and upload to another:

# Read data from agencia, upload files to client
arela watch --source-api agencia --target-api cliente

# Same for upload command
arela upload --source-api agencia --target-api cliente --upload-by-rfc

How Cross-Tenant Works: | Phase | Description | API Used | |-------|-------------|----------| | Phase 1 | Stats Collection | --source-api | | Phase 2 | PDF Detection | --source-api | | Phase 3 | Path Propagation | --source-api | | Phase 4 | File Upload | --target-api |

Available API Targets

Only 3 API targets are available: default, agencia, cliente

Configure in your .env file:

# Default API (--api default or no flag)
ARELA_API_URL=http://localhost:3010
ARELA_API_TOKEN=your_token

# Agencia API (--api agencia)
ARELA_API_AGENCIA_URL=http://localhost:4012
ARELA_API_AGENCIA_TOKEN=your_agencia_token

# Cliente API (--api cliente)
# Configure the URL/Token for the specific client you need
ARELA_API_CLIENTE_URL=http://localhost:4014
ARELA_API_CLIENTE_TOKEN=your_cliente_token

# Examples for different clients:
# Cliente AUM9207011CA: ARELA_API_CLIENTE_URL=http://localhost:4014
# Cliente KTJ931117P55: ARELA_API_CLIENTE_URL=http://localhost:4013

💡 Tip: To switch between clients, just update ARELA_API_CLIENTE_URL and ARELA_API_CLIENTE_TOKEN in your .env file. No code changes needed!

Installation

npm install -g @arela/uploader

Usage

🚀 Optimized 4-Phase Workflow (Recommended)

# Run all phases automatically (most efficient)
arela upload --run-all-phases --batch-size 20

# Or run phases individually for fine-grained control
arela stats                           # Phase 1: Filesystem stats only
arela detect                          # Phase 2: PDF detection
arela detect --propagate-arela-path   # Phase 3: Path propagation
arela upload --upload-by-rfc          # Phase 4: RFC-based upload

Available Commands

1. upload - Upload files to Arela

# Basic upload with auto-processing (API Mode)
arela upload --batch-size 10

# Upload with auto-detection of year/pedimento from file paths
arela upload --auto-detect-structure --batch-size 10

# Upload with custom folder structure
arela upload --folder-structure "2024/4023260" --batch-size 10

# Upload to Supabase directly (skip API)
arela upload --force-supabase --prefix "my-folder"

# Upload files by specific RFC values
arela upload --upload-by-rfc --batch-size 5

# Upload RFC files with custom folder prefix
arela upload --upload-by-rfc --folder-structure "palco" --batch-size 5

# Upload RFC files with nested folder structure
arela upload --upload-by-rfc --folder-structure "2024/Q1/processed" --batch-size 15

# Upload with performance statistics
arela upload --batch-size 10 --show-stats

# Upload with client path tracking
arela upload --client-path "/client/documents" --batch-size 10

2. stats - Collect file statistics without uploading

# Collect filesystem statistics only (Phase 1)
arela stats --batch-size 10

# Stats with custom folder organization
arela stats --folder-structure "2023/3019796" --batch-size 10

# Stats with client path tracking
arela stats --client-path "/client/documents" --batch-size 10

3. detect - Run document detection and path propagation

# Run PDF detection on existing database records (Phase 2)
arela detect --batch-size 10

# Propagate arela_path from pedimento records to related files (Phase 3)
arela detect --propagate-arela-path

4. watch - Monitor directories and upload automatically ⭐ NEW

# Watch directories for changes with automatic upload
arela watch --directories "/path/to/watch1,/path/to/watch2"

# Watch with specific API target (single tenant)
arela watch --api cliente

# Watch with cross-tenant mode (read from agencia, upload to client)
arela watch --source-api agencia --target-api cliente

# Watch with custom upload strategy (default: batch)
arela watch --directories "/path/to/watch" --strategy individual
arela watch --directories "/path/to/watch" --strategy full-structure

# Watch with custom debounce delay (default: 1000ms)
arela watch --directories "/path/to/watch" --debounce 2000

# Watch with automatic 4-step pipeline
arela watch --directories "/path/to/watch" --auto-processing --batch-size 10

# Watch with polling instead of native file system events
arela watch --directories "/path/to/watch" --poll 5000

# Watch with pattern ignoring
arela watch --directories "/path/to/watch" --ignore "node_modules,*.log,*.tmp"

# Watch in dry-run mode (simulate without uploading)
arela watch --directories "/path/to/watch" --dry-run

# Watch with verbose logging
arela watch --directories "/path/to/watch" --verbose

Watch Strategies:

batch (default): Groups files and uploads periodically
individual: Uploads each file immediately as it changes
full-structure: Preserves directory structure during upload

Multi-Tenant Options:

--api <target>: Use a single API for all operations
--source-api <target>: API for reading/processing (phases 1-3)
--target-api <target>: API for uploading (phase 4)

5. query - Query database for file status

# Show files ready for upload
arela query --ready-files

6. config - Show current configuration

# Display all configuration settings
arela config

Legacy Syntax (Still Supported)

The old flag-based syntax is still supported for backward compatibility:

# These are equivalent to the commands above
arela --stats-only                    # Same as: arela stats
arela --detect-pdfs                   # Same as: arela detect
arela --propagate-arela-path          # Same as: arela detect --propagate-arela-path
arela --upload-by-rfc                 # Same as: arela upload --upload-by-rfc

Phase Control

--stats-only: Phase 1 - Only collect filesystem stats (no file reading)
--detect-pdfs: Phase 2 - Process PDF files for pedimento-simplificado detection
--propagate-arela-path: Phase 3 - Propagate arela_path from pedimento records to related files
--upload-by-rfc: Phase 4 - Upload files based on RFC values from UPLOAD_RFCS
--run-all-phases: All Phases - Run complete optimized workflow

Global Options (all commands)

-v, --verbose: Enable verbose logging
--clear-log: Clear the log file before starting
-h, --help: Display help information
--version: Display version number

Upload Command Options

-b, --batch-size <size>: API batch size (default: 10)
--folder-structure <structure>: Custom folder structure (e.g., "2024/4023260")
--client-path <path>: Client path for metadata tracking
--auto-detect-structure: Automatically detect year/pedimento from file paths
--auto-detect: Enable automatic document type detection
--auto-organize: Enable automatic file organization
--force-supabase: Force direct Supabase upload (skip API)
--skip-processed: Skip files already processed
--show-stats: Show performance statistics
--upload-by-rfc: Upload files based on RFC values from UPLOAD_RFCS
--run-all-phases: Run all processing phases sequentially

Stats Command Options

-b, --batch-size <size>: Batch size for processing (default: 10)
--client-path <path>: Client path for metadata tracking
--show-stats: Show performance statistics

Detect Command Options

-b, --batch-size <size>: Batch size for PDF detection (default: 10)
--propagate-arela-path: Propagate arela_path from pedimento records to related files

Watch Command Options

-d, --directories <paths>: Comma-separated directories to watch (required)
-s, --strategy <strategy>: Upload strategy (default: batch)
- batch: Groups files and uploads periodically
- individual: Uploads each file immediately
- full-structure: Preserves directory structure
--api <target>: Use a single API target for all operations
--source-api <target>: API for reading/processing (phases 1-3)
--target-api <target>: API for uploading (phase 4)
--debounce <ms>: Debounce delay in milliseconds (default: 1000)
-b, --batch-size <size>: Batch size for uploads (default: 10)
--poll <ms>: Use polling instead of native file system events (interval in ms)
--ignore <patterns>: Comma-separated patterns to ignore
--auto-detect: Enable automatic document type detection
--auto-organize: Enable automatic file organization
--auto-processing: Enable automatic 4-step pipeline (stats, detect, propagate, upload)
--dry-run: Simulate changes without uploading
--verbose: Enable verbose logging

Environment Variables

Create a .env file in your project root:

# Default API (--api default or no flag)
ARELA_API_URL=http://localhost:3010
ARELA_API_TOKEN=your_api_token

# Agencia API (--api agencia)
ARELA_API_AGENCIA_URL=http://localhost:4012
ARELA_API_AGENCIA_TOKEN=your_agencia_token

# Cliente API (--api cliente)
# Configure for the specific client you need
ARELA_API_CLIENTE_URL=http://localhost:4014
ARELA_API_CLIENTE_TOKEN=your_cliente_token

# For Direct Supabase Mode (fallback)
SUPABASE_URL=your_supabase_url
SUPABASE_KEY=your_supabase_anon_key
SUPABASE_BUCKET=your_bucket_name

# Required for both modes
UPLOAD_BASE_PATH=/path/to/your/files
UPLOAD_SOURCES=folder1|folder2|file.pdf

# RFC-based Upload Configuration
# Pipe-separated list of RFCs to upload files for
UPLOAD_RFCS=MMJ0810145N1|ABC1234567XY|DEF9876543ZZ

# Watch Mode Configuration (JSON format)
WATCH_DIRECTORY_CONFIGS={"../../Documents/2022":"palco","../../Documents/2023":"palco"}

Environment Variable Details:

ARELA_API_URL: Base URL for default API service
ARELA_API_AGENCIA_URL: URL for agencia API
ARELA_API_CLIENTE_URL: URL for client API (configure per client)
ARELA_API_TOKEN: Authentication token for default API
ARELA_API_AGENCIA_TOKEN: Token for agencia API
ARELA_API_CLIENTE_TOKEN: Token for client API
SUPABASE_URL: Your Supabase project URL
SUPABASE_KEY: Supabase anonymous key for direct uploads
SUPABASE_BUCKET: Target bucket name in Supabase Storage
UPLOAD_BASE_PATH: Root directory containing files to upload
UPLOAD_SOURCES: Pipe-separated list of folders/files to process
UPLOAD_RFCS: Pipe-separated list of RFC values for targeted uploads
WATCH_DIRECTORY_CONFIGS: JSON mapping directories to folder structures

RFC-Based File Upload

The --upload-by-rfc feature allows you to upload files to the Arela API based on specific RFC values. This is useful when you want to upload only files associated with certain companies or entities.

How it works:

Configure RFCs: Set the UPLOAD_RFCS environment variable with pipe-separated RFC values
Query Database: The tool searches the Supabase database for files matching the specified RFCs
Include Supporting Documents: Finds all files sharing the same arela_path as the RFC matches (not just the pedimento files)
Apply Folder Structure: Optionally applies custom folder prefix using --folder-structure
Group and Upload: Files are grouped by their final destination path and uploaded with proper structure

Folder Structure Options:

Default Behavior (no --folder-structure):

Uses original arela_path: CAD890407NK7/2023/3429/070/230734293000421/

With Custom Prefix (--folder-structure "palco"):

Results in: palco/CAD890407NK7/2023/3429/070/230734293000421/

With Nested Prefix (--folder-structure "2024/client1/pedimentos"):

Results in: 2024/client1/pedimentos/CAD890407NK7/2023/3429/070/230734293000421/

Prerequisites:

Files must have been previously processed (have entries in the uploader table)
Files must have rfc field populated (from document detection)
Files must have arela_path populated (from pedimento processing)
Original files must still exist at their original_path locations

Example:

# Set RFCs in environment
export UPLOAD_RFCS="MMJ0810145N1|ABC1234567XY|DEF9876543ZZ"

# Upload files for these RFCs (original folder structure)
arela --upload-by-rfc --batch-size 5 --show-stats

# Upload with custom folder prefix
arela --upload-by-rfc --folder-structure "palco" --batch-size 10

# Upload with nested organization
arela --upload-by-rfc --folder-structure "2024/Q1/processed" --batch-size 15

The tool will:

Find all database records matching the specified RFCs
Include ALL supporting documents that share the same arela_path
Apply the optional folder structure prefix if specified
Group files by their final destination folder structure
Upload each group maintaining the correct Arela folder hierarchy
Provide detailed progress and summary statistics
Handle large datasets with automatic pagination (no 1000-file limit)

File Processing Modes

API Mode (Default)

When ARELA_API_URL and ARELA_API_TOKEN are configured:

✅ Automatic file detection and classification
✅ Intelligent file organization
✅ Smart year/pedimento auto-detection from paths
✅ Custom folder structure support
✅ Batch processing with progress tracking
✅ Advanced error handling and retry logic
✅ Performance optimizations with file sanitization caching

Auto-Detection Features

The tool can automatically detect year and pedimento numbers from file paths using multiple patterns:

Pattern 1: Direct Structure

/path/to/2024/4023260/file.pdf
/path/to/pedimentos/2024/4023260/file.pdf

Pattern 2: Named Patterns

/path/to/docs/año2024/ped4023260/file.pdf
/path/to/files/year2024/pedimento4023260/file.pdf

Pattern 3: Loose Detection

Year: Any 4-digit number starting with "202" (2020-2029)
Pedimento: Any 4-8 consecutive digits in path

Use --auto-detect-structure to enable automatic detection:

arela --auto-detect-structure --batch-size 10

Custom Folder Structure

Specify a custom organization pattern:

# Static structure
arela --folder-structure "2024/4023260" --batch-size 10

# Client-based structure  
arela --folder-structure "cliente1/pedimentos" --batch-size 10

Directory Structure Preservation

Use --preserve-structure to maintain your original folder structure even with auto-organization:

# Without --preserve-structure
# Files organized by API: bucket/filename.pdf

# With --preserve-structure  
# Files keep structure: bucket/2024/4023260/filename.pdf
arela --preserve-structure --batch-size 10

Supabase Direct Mode (Fallback)

When API is unavailable or --force-supabase is used:

✅ Direct upload to Supabase Storage
✅ File sanitization and renaming
✅ Basic progress tracking
✅ Optimized sanitization with pre-compiled regex patterns
✅ Performance caching for file name sanitization

Performance Features

Database Pagination

No Upload Limits: Handles datasets larger than 1000 files through automatic pagination
Efficient Querying: Uses Supabase .range() method to fetch data in batches
Memory Optimization: Processes large datasets without memory overflow

File Processing

Pre-compiled Regex: Sanitization patterns are compiled once for optimal performance
Caching System: File name sanitization results are cached to avoid re-processing
Batch Processing: Configurable batch sizes for optimal upload throughput

RFC Upload Optimizations

Smart Querying: Three-step query process to efficiently find related files
Supporting Document Inclusion: Automatically includes all related documents, not just pedimentos
Path Concatenation: Efficiently combines custom folder structures with arela_paths

File Sanitization

The tool automatically handles problematic characters using advanced sanitization:

Character Replacements:

Accents: á→a, é→e, í→i, ó→o, ú→u, ñ→n, ç→c
Korean characters: 멕→meok, 시→si, 코→ko, 용→yong, others→kr
Special symbols: &→and, {}[]~^|"<>?*: →-
Email symbols: @→(removed), spaces→-
Multiple dashes: collapsed to single dash
Leading/trailing: dashes and dots removed

Performance Features:

Pre-compiled regex patterns for faster processing
Sanitization result caching to avoid re-processing
Unicode normalization (NFD) for consistent handling

Examples

| Original | Renamed | |----------|---------| | Facturas Importación.pdf | Facturas-Importacion.pdf | | File{with}brackets.pdf | File-with-brackets.pdf | | Document ^& symbols.pdf | Document-and-symbols.pdf | | CI & PL-20221212(멕시코용).xls | CI-and-PL-20221212.xls | | [email protected]_file.xml | impresoranereprint.com_file.xml | | 07-3429-3000430 HC.pdf | 07-3429-3000430-HC.pdf | | FACTURA IN 3000430.pdf | FACTURA-IN-3000430.pdf |

Logging and Monitoring

The tool maintains comprehensive logs both locally and remotely:

Local Logging (arela-upload.log):

Upload status (SUCCESS/ERROR/SKIPPED/SANITIZED)
File paths and sanitization changes
Error messages and timestamps
Rename operations with before/after names
Processing statistics and performance metrics

Log Entry Examples:

[2025-09-04T01:17:00.141Z] SUCCESS: /Users/.../file.xml -> 2023/2003180/file.xml
[2025-09-04T01:17:00.822Z] SANITIZED: file name.pdf → file-name.pdf
[2025-09-04T01:17:00.856Z] SKIPPED: /Users/.../duplicate.pdf (already exists)

Remote Logging:

Integration with Supabase database for centralized logging
Upload tracking and audit trails
Error reporting and monitoring

Performance Features

Version 2.0.0 introduces several performance optimizations:

Pre-compiled Regex Patterns: Sanitization patterns are compiled once and reused
Sanitization Caching: File name sanitization results are cached to avoid reprocessing
Batch Processing: Configurable batch sizes for optimal API usage
Concurrent Processing: Adjustable concurrency levels for file processing
Smart Skip Logic: Efficiently skips already processed files using log analysis
Memory Optimization: Large file outputs are truncated to prevent memory issues

Version History

v0.4.0 - Current Release 🆕

✨ Simplified Multi-Tenant API: Only 3 targets: default, agencia, cliente
✨ Cross-Tenant Mode: Read from one API, upload to another
✨ Dynamic Client Config: Change client by updating .env (no code changes)
✨ New --api flag for single API target
✨ New --source-api flag for source API (phases 1-3)
✨ New --target-api flag for target API (phase 4)
✨ WATCH_DIRECTORY_CONFIGS environment variable for watch mode
🔧 Enhanced pipeline routing for cross-tenant operations
📝 Simplified documentation for multi-tenant configuration

v0.3.0 - Watch Mode Release

✨ Added watch command with chokidar integration
✨ Automatic 4-step pipeline (stats → detect → propagate → upload)
✨ Multiple upload strategies (batch, individual, full-structure)
✨ Configurable debounce and polling options
🔧 Signal handling for graceful shutdown

v0.2.0 - Pipeline Automation

✨ Added smart year/pedimento auto-detection from file paths
✨ Custom folder structure support with --folder-structure option
✨ Client path tracking with --client-path option
✨ Performance optimizations with regex pre-compilation
✨ Sanitization result caching for improved speed
✨ Enhanced file sanitization with Korean character support
✨ Improved email character handling in file names
✨ Better error handling and logging
📝 Comprehensive logging with SANITIZED status
🔧 Memory optimization for large file processing

v0.1.0 - Initial Release

📦 Basic upload functionality
🔌 API and Supabase direct mode support
📂 RFC-based file upload

Troubleshooting

Connection Issues:

Verify ARELA_API_URL and ARELA_API_TOKEN are correct
Check network connectivity to the API endpoint
The tool will automatically fallback to Supabase direct mode if API is unavailable

Performance Issues:

Adjust --batch-size for optimal API performance (default: 10)
Modify --concurrency to control parallel processing (default: 10)
Use --show-stats to monitor sanitization cache performance

File Issues:

Check file permissions in UPLOAD_BASE_PATH
Verify UPLOAD_SOURCES paths exist and are accessible
Review arela-upload.log for detailed error information

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

ISC License - see LICENSE file for details.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

arela-uploader

✨ What's New in v0.4.0

🚀 OPTIMIZED 4-PHASE WORKFLOW

Phase 1: Filesystem Stats Collection 📊

Phase 2: PDF Detection 🔍

Phase 3: Path Propagation �📁

Phase 4: RFC-based Upload 🚀

Combined Workflow 🎯

Performance Benefits

Features

🏢 Multi-Tenant API Support

Cross-Tenant Mode

Available API Targets

Installation

Usage

🚀 Optimized 4-Phase Workflow (Recommended)

Available Commands

1. upload - Upload files to Arela

2. stats - Collect file statistics without uploading

3. detect - Run document detection and path propagation

4. watch - Monitor directories and upload automatically ⭐ NEW

5. query - Query database for file status

6. config - Show current configuration

Legacy Syntax (Still Supported)

Phase Control

Global Options (all commands)

Upload Command Options

Stats Command Options

Detect Command Options

Watch Command Options

Environment Variables

RFC-Based File Upload

How it works:

Folder Structure Options:

Prerequisites:

Example:

File Processing Modes

API Mode (Default)

Auto-Detection Features

Custom Folder Structure

Directory Structure Preservation

Supabase Direct Mode (Fallback)

Performance Features

Database Pagination

File Processing

RFC Upload Optimizations

File Sanitization

Examples

Logging and Monitoring

Performance Features

Version History

Troubleshooting

Contributing

License