aml-regression-tests

v2.0.0

Published

3 months ago

AML regression tests repository for comparing behavior between stable and latest versions of the AML compiler.

0High
0Medium
0Low

huynhtehoa

@holistics/aml-regression-tests

AML regression tests repository for comparing behavior between stable and latest versions of the AML compiler.

High-level Architecture Design

RFC 2632: Regression testing for AML

Overview

This test suite compares the compilation results of AML files between two versions:

Stable version: @holistics/[email protected] with @holistics/[email protected]
Latest version: @holistics/[email protected] with workspace @holistics/aml-std

The tests ensure that changes to the AML compiler don't break existing customer code by:

Fetching AML files from customer repositories (via git service or local filesystem)
Compiling the same files with both stable and latest versions
Comparing the compilation results (excluding known differences like __type__ fields)
Testing serialized cache backward compatibility - ensuring latest compiler works with stable cache
Generating detailed diff reports for any discrepancies

Test Types

The regression test suite now includes two distinct test types per tenant:

1. Regression Tests (`compare results`)

Purpose: Ensure latest compiler produces same results as stable compiler
Comparison: stable results vs latest results
Detects: Breaking changes in compilation logic

2. Cache Compatibility Tests (`check serialized cache compatibility`)

Purpose: Ensure latest compiler works consistently with stable's serialized cache
Comparison: latest results (fresh) vs latest results (using stable cache)
Detects: Serialized cache format changes or backward compatibility issues

Key Insight: A tenant can pass regression tests but fail cache compatibility (or vice versa), providing targeted debugging information.

Detailed implementation: docs/serialized-cache-compatibility-plan.md

How It Works

Test Flow Architecture

sequenceDiagram
    participant User
    participant TestController
    participant Vitest
    participant Database
    participant GitService
    participant Artifacts
    participant Slack

    alt Controller Mode
        User->>TestController: node testController.ts
        TestController->>Slack: Send start notification
        TestController->>TestController: getRepoTenantPaths()
        TestController->>Database: Query tenant repositories
        Database-->>TestController: Tenant data

        loop For each tenant (sequential)
            TestController->>Vitest: Spawn vitest process for tenant
            alt Git Mode
                Vitest->>GitService: Fetch AML files
                GitService-->>Vitest: AML file contents
            else Local Mode
                Vitest->>Vitest: Read local files
            end
            Vitest->>Vitest: compileLatest(files)
            Vitest->>Vitest: compileStable(files) → {results, cache}
            Vitest->>Vitest: compileLatestWithStableCache(files, cache)
            Vitest->>Vitest: isEqualExclude(stable, latest)
            Vitest->>Vitest: isEqualExclude(latest, latest-with-cache)
            Vitest->>Artifacts: Write diff files (on mismatch)
            Vitest->>Artifacts: Write tenant results JSON
        end

        TestController->>Slack: Aggregate results & notify end
        TestController->>Slack: Upload artifact files

    else Direct Mode
        User->>Vitest: pnpm vitest run
        Note over Vitest: Same tenant processing loop
        Vitest->>Artifacts: Write single output file
    end

Core Components

Connector (tests/connector.ts):
- Database queries to find tenant repositories
- Git service integration for fetching AML files
- Local filesystem reading for development
Helpers (tests/helpers.ts):
- AML compilation using stable and latest versions
- Serialized cache compatibility testing with compileLatestWithStableCache()
- Deep object comparison with exclusion rules
- Diff collection integration with artifact system
Artifact Management (tests/artifact.ts):
- Consolidated diff collection and per-tenant flushing
- Directory structure management (diffs/ subfolder)
- Memory-efficient diff aggregation across test runs
Test Controller (tests/testController.ts):
- Parallel test execution management (bounded via p-limit)
- Process spawning and coordination
- Result aggregation with failure reporting
Slack Integration (tests/slack.ts):
- Test start/end notifications
- Granular reporting with separate regression and cache compatibility results
- Result reporting with pass/fail counts per test type
- Test report uploads (individual JSON files)
- Compressed diff archive uploads (diffs.zip)

Installation

pnpm install

Dependencies Added for Artifact Management

adm-zip: Compression library for creating diffs.zip archives
Enables clean Slack uploads with consolidated diff files

Configuration

Create environment config file:

cp .env.sample .env

Operating Modes

The system operates in two distinct modes with different dependencies and use cases:

1. Local Development Mode

Use Case: Testing changes during development with local AML files.

# Enable local mode
READ_LOCAL=true

# Local directory paths containing AML files
LOCAL_PATHS=["./local/tenant1", "./local/tenant2"]

Features:

Reads AML files directly from local filesystem
No database or git service dependencies
Tenant IDs derived from directory names
Fast iteration for development
Ideal for testing compiler changes

2. Production/Git Mode (Default)

Use Case: Testing against customer repositories via database and git service.

# Database connection (required for this mode)
DB_USERNAME=your_username
DB_PASSWORD=your_password
DB_HOST=localhost
DB_PORT=5432
DB_DATABASE=your_database

# Git service configuration
GIT_SERVICE_URL=http://0.0.0.0:8080
REPO_PATH_PREFIX=/opt/holistics/git_data/repositories/

# Test targeting
TENANT_IDS=["123", "456", "789"]    # Specific tenant IDs
# OR
RUN_ALL=true                        # All active tenants

# Concurrency (optional)
NUMBER_OF_TEST_PROCESSES=4          # Spawn up to 4 tenants in parallel

Features:

Queries customer repositories from database
Fetches AML files via git service using commit IDs
Supports both specific tenant testing and bulk processing
Production-like testing environment

Scaling & Distribution

Multiple instances can run simultaneously with different tenant ranges:

# Instance 1: Handles tenants 1-100
OFFSET=0
LIMIT=100

# Instance 2: Handles tenants 101-200
OFFSET=100
LIMIT=100

# Instance 3: Handles tenants 201-300
OFFSET=200
LIMIT=100

Within a single controller process, the pool size is controlled by NUMBER_OF_TEST_PROCESSES. Increase it to drive more concurrent tenants per node, or drop it back to 1 when memory-constrained.

Key Features:

Only applies when RUN_ALL=true (ignored with TENANT_IDS)
Database query uses ORDER BY tenant_id for consistent tenant allocation
Enables distributed processing across multiple containers/machines
Each instance processes tenants sequentially for memory efficiency
Artifact files labeled: container-${OFFSET}-${LIMIT}_tenant_${tenantId}.json

Scaling Example

# Deploy 3 instances for distributed processing
# Instance 1: Processes 100 tenants sequentially
OFFSET=0 LIMIT=100

# Instance 2: Processes 100 tenants sequentially
OFFSET=100 LIMIT=100

# Instance 3: Processes 100 tenants sequentially
OFFSET=200 LIMIT=100

Output & Notifications

Artifacts Configuration

# Directory for test results and diff files
ARTIFACT_DIR=./test-results

# Output file name (optional for single process mode)
ARTIFACT_FILE=test-output.json

Slack Integration (Optional)

SLACK_CHANNEL_ID=C1234567890
SLACK_TOKEN=xoxb-your-slack-token

Notification Features:

Test start/end notifications with container labels
Pass/fail summary with tenant ID lists
Automatic artifact file uploads for detailed analysis

Diff Generation Options

# Use system 'diff' command for better diff quality (default: true)
OFFLINE_DIFF=true

Offline Diff Features:

Enhanced diff quality: Uses system diff command instead of JavaScript diff library
Better performance: More efficient for large JSON files
Unified format: Generates standard unified diff format
Automatic fallback: Falls back to JavaScript diff if system diff unavailable
Local temp files: Creates temporary files in .tmp/ directory within repo (auto-cleaned)

How to Run Tests

1. Direct Vitest Execution

Best for: Development, debugging, small tenant sets

# Configure your mode (see Operating Modes section above)
export READ_LOCAL=true  # or configure production mode

# Run single process
pnpm vitest run

Characteristics:

Direct vitest execution with detailed console output
Single output file: ${ARTIFACT_FILE} (default: test-output.json)
Easier debugging and development iteration
Suitable for small tenant sets or development

2. Test Controller Execution

Best for: Large tenant sets, production testing

# Configure your mode
export RUN_ALL=true

# Run via test controller (processes tenants sequentially)
node tests/testController.ts

Characteristics:

Processes tenants sequentially for memory efficiency
Multiple output files: container-${OFFSET}-${LIMIT}_tenant_${tenantId}.json
Automatic Slack notifications (if configured)
Avoids memory accumulation issues

3. Docker Execution

Best for: Production-like environment, CI/CD

# Setup credentials
echo '@holistics:registry=https://npm.pkg.github.com/
//npm.pkg.github.com/:_authToken=<token>' > .npmrc

# Build and run
docker compose build
docker compose up --abort-on-container-exit

Characteristics:

Includes git service container automatically
Isolated environment with consistent dependencies
Volume mounting for accessing results
Production-like configuration

Execution Method Comparison

| Aspect | Direct Vitest | Test Controller | Docker | |--------|---------------|-----------------|--------| | Command | pnpm vitest run | node tests/testController.ts | docker compose up | | Processes | 1 vitest process | 1 vitest process per tenant | Sequential processing | | Output Files | Single file | Multiple files | Multiple files | | Git Service | External dependency | External dependency | Included | | Best For | Development/Debug | Large scale testing | Production/CI | | Slack Notifications | ❌ | ✅ | ✅ |

Test Results & Artifacts

Output Structure

Each test run generates:

Test Output JSON: Vitest results with tenant metadata
- Location: ${ARTIFACT_DIR}/${ARTIFACT_FILE} or ${ARTIFACT_DIR}/container-${OFFSET}-${LIMIT}_process_${N}.json
- Contains: Test results, pass/fail status, tenant IDs
Consolidated Diff Files: Aggregated comparison reports per tenant
- Location: ${ARTIFACT_DIR}/diffs/diff_${tenantId}_${repoName}.json
- Contains: Multiple diff entries per tenant, consolidated metadata
- Structure: All failed comparisons for a tenant grouped in single file
Compressed Diff Archive: Zip file for Slack upload
- Location: ${ARTIFACT_DIR}/diffs.zip
- Contains: All files from diffs/ folder compressed
- Purpose: Single file upload to Slack for easy download/sharing

Example Consolidated Diff File Structure

{
  "tenantId": "123",
  "repoName": "789",
  "repoPath": "tenant123/projects/456/789", 
  "commitId": "abc123def456",
  "timestamp": "2025-01-23T10:30:00.000Z",
  "diffs": [
    {
      "label": "/models/users.aml",
      "expected": { /* stable compilation result */ },
      "actual": { /* latest compilation result */ },
      "unifiedDiff": "--- expected\n+++ actual\n@@ -1,4 +1,4 @@\n..."
    },
    {
      "label": "/datasets/orders.aml", 
      "expected": { /* stable compilation result */ },
      "actual": { /* latest compilation result */ },
      "unifiedDiff": "--- expected\n+++ actual\n@@ -10,2 +10,3 @@\n..."
    }
  ]
}

Key Features:

Consolidated: Multiple failed comparisons per tenant in single file
Organized: All diffs for a tenant/repository grouped together
Compressed: diffs.zip contains all tenant diff files for Slack upload
Memory-efficient: Diffs collected in memory and flushed per tenant

Slack Notifications

When configured, the system sends:

Start Notification: Container label and test initiation
End Notification:
- Granular pass/fail summary with separate regression and cache compatibility results
- Format: ✅ Regression Passed: [tenant1, tenant2] and ✅ Cache Compatible: [tenant1, tenant3]
- Test Reports: Individual JSON files uploaded for quick inspection
- Diff Archive: Single diffs.zip uploaded containing all diff files
- Clean thread organization (test reports + compressed diffs)

Troubleshooting

Common Issues

Git Service Connection Failed
```
Error: connect ECONNREFUSED 127.0.0.1:8080
```
- Ensure git service is running
- Check GIT_SERVICE_URL configuration
- For Docker: verify service dependency
Database Connection Failed
```
Error: password authentication failed
```
- Verify database credentials in .env
- Ensure database is accessible
- For Docker: check host.docker.internal connectivity
Out of Memory Errors
```
JavaScript heap out of memory
```
- The testController uses --max-old-space-size=8192 and processes tenants sequentially
- Consider testing fewer tenants per batch using OFFSET and LIMIT
- Each tenant gets a fresh process to avoid memory accumulation
Permission Errors (Docker)
```
Error: EACCES: permission denied
```
- Ensure proper file permissions on mounted volumes
- Consider using user: "1000:1000" in docker-compose.yml

Debug Mode

For detailed debugging:

# Enable verbose output
export DEBUG=1

# Run single tenant with local files
export READ_LOCAL=true
export LOCAL_PATHS='["./debug-tenant"]'

pnpm vitest run --reporter=verbose

Performance Tuning

For large tenant sets: Use OFFSET and LIMIT to process in batches across multiple instances
For distributed processing: Deploy multiple instances with different tenant ranges
For memory efficiency: Sequential processing eliminates memory accumulation issues

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@holistics/aml-regression-tests

High-level Architecture Design

Overview

Test Types

1. Regression Tests (compare results)

2. Cache Compatibility Tests (check serialized cache compatibility)

How It Works

Test Flow Architecture

Core Components

Installation

Dependencies Added for Artifact Management

Configuration

Operating Modes

1. Local Development Mode

2. Production/Git Mode (Default)

Scaling & Distribution

Scaling Example

Output & Notifications

Artifacts Configuration

Slack Integration (Optional)

Diff Generation Options

How to Run Tests

1. Direct Vitest Execution

2. Test Controller Execution

3. Docker Execution

Execution Method Comparison

Test Results & Artifacts

Output Structure

Example Consolidated Diff File Structure

Slack Notifications

Troubleshooting

Common Issues

Debug Mode

Performance Tuning

1. Regression Tests (`compare results`)

2. Cache Compatibility Tests (`check serialized cache compatibility`)