npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

signalk-parquet

v0.6.5-beta.2

Published

SignalK plugin and webapp that archives SK data to Parquet files with a regimen control system, advanced querying, Claude integrated AI analysis, spatial capabilities, and REST API.

Readme

SignalK Parquet Data Store

A comprehensive SignalK plugin and webapp that saves SignalK data directly to Parquet files with manual and automated regimen-based archiving and advanced querying features, including a REST API built on the SignalK History API, Claude AI history data analysis, and spatial geographic analysis capabilities.

Features

Core Data Management

  • Smart Data Types: Intelligent Parquet schema detection preserves native data types (DOUBLE, BOOLEAN) instead of forcing everything to strings
  • Multiple File Formats: Support for Parquet, JSON, and CSV output formats (querying in parquet only)
  • Daily Consolidation: Automatic daily file consolidation with S3 upload capabilities
  • Near Real-time Buffering: Efficient data buffering with configurable thresholds

Data Validation & Schema Repair

  • NEW Schema Validation: Comprehensive validation of Parquet file schemas against SignalK metadata standards
  • NEW Automated Repair: One-click repair of schema violations with proper data type conversion
  • NEW Type Correction: Automatic conversion of incorrectly stored data types (e.g., numeric strings → DOUBLE, boolean strings → BOOLEAN)
  • NEW Metadata Integration: Uses SignalK metadata (units, types) to determine correct data types for marine measurements
  • NEW Safe Operations: Creates backups before repair and quarantines corrupted files for safety
  • NEW Progress Tracking: Real-time progress monitoring with cancellation support for large datasets

Benefits of Proper Data Types

Using correct data types in Parquet files provides significant advantages:

  • Storage Efficiency: Numeric data stored as DOUBLE uses ~50% less space than string representations
  • Query Performance: Native numeric operations are 5-10x faster than string parsing during analysis
  • Data Integrity: Type validation prevents data corruption and ensures consistent analysis results
  • Analytics Compatibility: Proper types enable advanced statistical analysis and machine learning applications
  • Compression: Parquet's columnar compression works optimally with correctly typed data

Validation Process

The validation system checks each Parquet file for:

  • Field Type Consistency: Ensures numeric marine data (position, speed, depth) is stored as DOUBLE
  • Boolean Representation: Validates true/false values are stored as BOOLEAN, not strings
  • Metadata Alignment: Compares file schemas against SignalK metadata for units like meters, volts, amperes
  • Schema Standards: Enforces data best practices for long-term data integrity

Advanced Querying

  • SignalK History API Compliance: Full compliance with SignalK History API specifications
    • Standard Time Parameters: All 5 standard query patterns supported
    • Time-Filtered Discovery: Paths and contexts filtered by time range
    • Optional Analytics: Moving averages (EMA/SMA) available on demand
  • 🔄 NEW: Automatic Unit Conversion: Optional integration with signalk-units-preference plugin
    • Server-side conversion to user's preferred units (knots, km/h, °F, °C, etc.)
    • Add ?convertUnits=true to any history query
    • Respects all unit preferences configured in units-preference plugin
    • Configurable cache (1-60 minutes) balances performance vs. responsiveness
    • Conversion metadata included in response
  • 🌍 NEW: Timezone Conversion: Convert UTC timestamps to local or specified timezone
    • Add ?convertTimesToLocal=true to convert timestamps to local time
    • Optional &timezone=America/New_York for custom IANA timezone
    • Automatic daylight saving time handling
    • Clean ISO 8601 format with offset (e.g., 2025-10-20T12:34:04-04:00)
  • Flexible Time Querying: Multiple ways to specify time ranges
    • Query from now, from specific times, or between time ranges
    • Duration-based windows (1h, 30m, 2d) for easy relative queries
    • Forward and backward time querying support
  • Time Alignment: Automatic alignment of data from different sensors using time bucketing
  • DuckDB Integration: Direct SQL querying of Parquet files with type-safe operations
  • 🌍 Spatial Analysis: Advanced geographic analysis with DuckDB spatial extension
    • Track Analysis: Calculate vessel tracks, distances, and movement patterns
    • Proximity Detection: Multi-vessel distance calculations and collision risk analysis
    • Geographic Visualization: Generate movement boundaries, centroids, and spatial statistics
    • Route Planning: Historical track analysis for route optimization and performance analysis

Management & Control

  • Command Management: Register, execute, and manage SignalK commands with automatic path configuration
  • Regimen-Based Data Collection: Control data collection with command-based regimens
  • Multi-Vessel Support: Wildcard vessel contexts (vessels.*) with MMSI-based exclusion filtering
  • Source Filtering: Filter data by SignalK source labels (bypasses server arbitration for raw data access)
  • Comprehensive REST API: Full programmatic control of queries and configuration

User Interface & Integration

  • Responsive Web Interface: Complete web-based management interface
  • S3 Integration: Upload files to Amazon S3 with configurable timing and conflict resolution
  • Context Support: Support for multiple vessel contexts with exclusion controls

Regimen System (Advanced)

  • Operational Context Tracking: Define regimens for operational states (mooring, anchoring, racing, passage-making)
  • Command-Based Episodes: Track state transitions using SignalK commands as regimen triggers
  • Keyword Mapping: Associate keywords with commands for intelligent Claude AI context matching
  • Episode Boundary Detection: Sophisticated SQL-based detection of operational periods using CTEs and window functions
  • Contextual Data Collection: Link SignalK paths to regimens for targeted data analysis during specific operations
  • Web Interface Management: Create, edit, and manage regimens and command keywords through the web UI

NEW Threshold Automation

  • NEW Per-Command Conditions: Each regimen/command can define one or more thresholds that watch a single SignalK path.
  • NEW True-Only Actions: On every path update the condition is evaluated; when it is true the command is set to the threshold's activateOnMatch state (ON/OFF). False evaluations leave the command untouched, so use a second threshold if you want a different level to switch it back.
  • NEW Stable Triggers: Optional hysteresis (seconds) suppresses re-firing while the condition remains true, preventing rapid toggling in noisy data.
  • NEW Multiple Thresholds Per Path: Unique monitor keys allow several thresholds to observe the same SignalK path without cancelling each other.
  • NEW Unit Handling: Threshold values must match the live SignalK units (e.g., fractional 0–1 SoC values). Angular thresholds are entered in degrees in the UI and stored as radians automatically.
  • NEW Automation State Machine: When enabling automation, command is set to OFF then all thresholds are immediately evaluated. When disabling automation, threshold monitoring stops and command state remains unchanged. Default state is hardcoded to OFF on server side.

Claude AI Integration

  • AI-Powered Analysis: Advanced maritime data analysis using Claude AI models (Opus 4, Sonnet 4)
  • Regimen-Based Analysis: Context-aware episode detection for operational states (mooring, anchoring, sailing)
  • Command Integration: Keyword-based regimen matching with customizable command configurations
  • Episode Detection: Sophisticated boundary detection for operational transitions )
  • Multi-Vessel Support: Real-time data access from self vessel and other vessels via SignalK
  • Conversation Continuity: Follow-up questions with preserved context and specialized tools
  • Timezone Intelligence: Automatic UTC-to-local time conversion based on system timezone
  • Custom Analysis: Create custom analysis prompts for specific operational needs

Requirements

Core Requirements

  • SignalK Server v1.x or v2.x
  • Node.js 18+ (included with SignalK)

Optional Plugin Integration

  • signalk-units-preference (v0.7.0+): Required for automatic unit conversion feature
    • Install from: https://github.com/motamman/signalk-units-preference
    • Provides server-side unit conversion based on user preferences
    • The history API will work without this plugin, but convertUnits=true will have no effect

Installation

Install from GitHub

# Navigate to folder
cd ~/.signalk/node_modules/

# Install from npm (after publishing)
npm install signalk-parquet

# Or install from GitHub
npm install motamman/signalk-parquet
cd ~/.signalk/node_modules/signalk-parquet
npm run build

# Restart SignalK
sudo systemctl restart signalk

⚠️ IMPORTANT IF UPGRADING FROM 0.5.0-beta.3: Consolidation Bug Fix

THIS FIXES A RECURSIVE BUG THAT WAS CREATING NESTED PROCESSED DIRECTORIES AND REPEATEDLY PROCESSING THE SAME FILES. THIS SHOULD FIX THAT PROBLEM BUT ANY processed FOLDERS NESTED INSIDE A processed FOLDER SHOULD BE MANUALLY DELETED.

Cleaning Up Nested Processed Directories

No action is likely needed if upgrading from 0.5.0-beta.4 or better. If you're upgrading from a previous version, you may have nested processed directories that need cleanup:

# Check for nested processed directories
find data -name "*processed*" -type d | head -20

# See the deepest nesting levels
find data -name "*processed*" -type d | awk -F'/' '{print NF-1, $0}' | sort -nr | head -5

# Count files in nested processed directories
find data -path "*/processed/processed/*" -type f | wc -l

# Remove ALL nested processed directories (RECOMMENDED)
find data -name "processed" -type d -exec rm -rf {} +

# Verify cleanup completed
find data -path "*/processed/processed/*" -type f | wc -l  # Should show 0

Note: The processed directories only contain files that were moved during consolidation - removing them does not delete your original data.

Development Setup

# Clone or copy the signalk-parquet directory
cd signalk-parquet

# Install dependencies
npm install

# Build the TypeScript code
npm run build

# Copy to SignalK plugins directory
cp -r . ~/.signalk/node_modules/signalk-parquet/

# Restart SignalK
sudo systemctl restart signalk

Production Build

# Build for production
npm run build

# The compiled JavaScript will be in the dist/ directory

Configuration

Plugin Configuration

Navigate to SignalK Admin → Server → Plugin Config → SignalK Parquet Data Store

Configure basic plugin settings (path configuration is managed separately in the web interface):

| Setting | Description | Default | |---------|-------------|---------| | Buffer Size | Number of records to buffer before writing | 1000 | | Save Interval | How often to save buffered data (seconds) | 30 | | Output Directory | Directory to save data files | SignalK data directory | | Filename Prefix | Prefix for generated filenames | signalk_data | | File Format | Output format (parquet, json, csv) | parquet | | Retention Days | Days to keep processed files | 7 | | Unit Conversion Cache Duration 🆕 | How long to cache unit conversions before reloading (minutes) | 5 |

Note: The Unit Conversion Cache Duration setting controls how quickly changes to unit preferences (in the signalk-units-preference plugin) are reflected in the history API. Lower values (1-2 minutes) reflect changes faster but use more resources. Higher values (30-60 minutes) reduce overhead but take longer to reflect changes. The default of 5 minutes provides a good balance for most users.

S3 Upload Configuration

Configure S3 upload settings in the plugin configuration:

| Setting | Description | Default | |---------|-------------|---------| | Enable S3 Upload | Enable uploading to Amazon S3 | false | | Upload Timing | When to upload (realtime/consolidation) | consolidation | | S3 Bucket | Name of S3 bucket | - | | AWS Region | AWS region for S3 bucket | us-east-1 | | Key Prefix | S3 object key prefix | - | | Access Key ID | AWS credentials (optional) | - | | Secret Access Key | AWS credentials (optional) | - | | Delete After Upload | Delete local files after upload | false |

Claude AI Configuration

Configure Claude AI integration in the plugin configuration for advanced data analysis:

| Setting | Description | Default | |---------|-------------|---------| | Enable Claude Integration | Enable AI-powered data analysis | false | | API Key | Anthropic Claude API key (required) | - | | Model | Claude model to use for analysis | claude-3-7-sonnet-20250219 | | Max Tokens | Maximum tokens for AI responses | 4000 | | Temperature | AI creativity level (0-1) | 0.3 |

Supported Claude Models

| Model | Description | Use Case | |-------|-------------|----------| | claude-opus-4-1-20250805 | Latest Opus model - highest intelligence | Complex analysis, detailed insights | | claude-opus-4-20250514 | Opus model - very high intelligence | Advanced analysis | | claude-sonnet-4-20250514 | Sonnet model - balanced performance | Recommended default |

Getting a Claude API Key

  1. Visit Anthropic Console
  2. Create an account or sign in
  3. Navigate to API Keys section
  4. Generate a new API key
  5. Copy the key and paste it in the plugin configuration

Note: Claude AI analysis requires an active Anthropic API subscription. Usage is billed based on tokens consumed during analysis.

Path Configuration

Important: Path configuration is managed exclusively through the web interface, not in the SignalK admin interface. This provides a more intuitive interface for managing data collection paths.

Accessing Path Configuration

  1. Navigate to: http://localhost:3000/plugins/signalk-parquet
  2. Click the ⚙️ Path Configuration tab

Adding Data Paths

Use the web interface to configure which SignalK paths to collect:

  1. Click ➕ Add New Path
  2. Configure the path settings:
    • SignalK Path: The SignalK data path (e.g., navigation.position)
    • Always Enabled: Collect data regardless of regimen state
    • Regimen Control: Command name that controls collection
    • Source Filter: Only collect from specific sources
    • Context: SignalK context (vessels.self, vessels.*, or specific vessel)
    • Exclude MMSI: For vessels.* context, exclude specific MMSI numbers
  3. Click ✅ Add Path

Managing Existing Paths

  • Edit Path: Click ✏️ Edit button to modify path settings
  • Delete Path: Click 🗑️ Remove button to delete a path
  • Refresh: Click 🔄 Refresh Paths to reload configuration
  • Show/Hide Commands: Toggle button to show/hide command paths in the table

Command Management

The plugin streamlines command management with automatic path configuration:

  1. Register Command: Commands are automatically registered with enabled path configurations
  2. Start Command: Click Start button to activate a command regimen
  3. Stop Command: Click Stop button to deactivate a command regimen
  4. Remove Command: Click Remove button to delete a command and its path configuration

This eliminates the previous 3-step process of registering commands, adding paths, and enabling them separately.

Path Configuration Storage

Path configurations are stored separately from plugin configuration in:

~/.signalk/signalk-parquet/webapp-config.json

This allows for:

  • Independent management of path configurations
  • Better separation of concerns
  • Easier backup and migration of path settings
  • More intuitive web-based configuration interface

Regimen-Based Control

Regimens allow you to control data collection based on SignalK commands:

Example: Weather data collection with source filtering

{
  "path": "environment.wind.angleApparent",
  "enabled": false,
  "regimen": "captureWeather",
  "source": "mqtt-weatherflow-udp",
  "context": "vessels.self"
}

Note: Source filtering accesses raw data before SignalK server arbitration, allowing collection of data from specific sources that might otherwise be filtered out.

Multi-Vessel Example: Collect navigation data from all vessels except specific MMSI numbers

{
  "path": "navigation.position",
  "enabled": true,
  "context": "vessels.*",
  "excludeMMSI": ["123456789", "987654321"]
}

Command Path: Command paths are automatically created when registering commands

{
  "path": "commands.captureWeather",
  "enabled": true,
  "context": "vessels.self"
}

This path will only collect data when the command commands.captureWeather is active.

TypeScript Architecture

Type Safety

The plugin uses comprehensive TypeScript interfaces:

interface PluginConfig {
  bufferSize: number;
  saveIntervalSeconds: number;
  outputDirectory: string;
  filenamePrefix: string;
  fileFormat: 'json' | 'csv' | 'parquet';
  paths: PathConfig[];
  s3Upload: S3UploadConfig;
}

interface PathConfig {
  path: string;
  enabled: boolean;
  regimen?: string;
  source?: string;
  context: string;
  excludeMMSI?: string[];
}

interface DataRecord {
  received_timestamp: string;
  signalk_timestamp: string;
  context: string;
  path: string;
  value: any;
  source_label?: string;
  meta?: string;
}

Plugin State Management

The plugin maintains typed state:

interface PluginState {
  unsubscribes: Array<() => void>;
  dataBuffers: Map<string, DataRecord[]>;
  activeRegimens: Set<string>;
  subscribedPaths: Set<string>;
  parquetWriter?: ParquetWriter;
  s3Client?: any;
  currentConfig?: PluginConfig;
}

Express Router Types

API routes are fully typed:

router.get('/api/paths', 
  (_: TypedRequest, res: TypedResponse<PathsApiResponse>) => {
    // Typed request/response handling
  }
);

Data Output Structure

File Organization

output_directory/
├── vessels/
│   └── self/
│       ├── navigation/
│       │   ├── position/
│       │   │   ├── signalk_data_20250716T120000.parquet
│       │   │   └── signalk_data_20250716_consolidated.parquet
│       │   └── speedOverGround/
│       └── environment/
│           └── wind/
│               └── angleApparent/
└── processed/
    └── [moved files after consolidation]

Data Schema

Each record contains:

| Field | Type | Description | |-------|------|-------------| | received_timestamp | string | When the plugin received the data | | signalk_timestamp | string | Original SignalK timestamp | | context | string | SignalK context (e.g., vessels.self) | | path | string | SignalK path | | value | DOUBLE/BOOLEAN/INT64/UTF8 | Smart typed values - numbers stored as DOUBLE, booleans as BOOLEAN, etc. | | value_json | string | JSON representation for complex values | | source | string | Complete source information | | source_label | string | Source label | | source_type | string | Source type | | source_pgn | number | PGN number (if applicable) | | meta | string | Metadata information |

Smart Data Types

The plugin now intelligently detects and preserves native data types:

  • Numbers: Stored as DOUBLE (floating point) or INT64 (integers)
  • Booleans: Stored as BOOLEAN
  • Strings: Stored as UTF8
  • Objects: Serialized to JSON and stored as UTF8
  • Mixed Types: Falls back to UTF8 when a path contains multiple data types

This provides better compression, faster queries, and proper type safety for data analysis.

Web Interface

Features

  • Path Configuration: Manage data collection paths with multi-vessel support
  • Command Management: Streamlined command registration and control
  • Data Exploration: Browse available data paths
  • SQL Queries: Execute DuckDB queries against Parquet files
  • History API: Query historical data using SignalK History API endpoints
  • S3 Status: Test S3 connectivity and configuration
  • Responsive Design: Works on desktop and mobile
  • MMSI Filtering: Exclude specific vessels from wildcard contexts

API Endpoints

| Endpoint | Method | Description | |----------|--------|-------------| | /api/paths | GET | List available data paths | | /api/files/:path | GET | List files for a path | | /api/sample/:path | GET | Sample data from a path | | /api/query | POST | Execute SQL query | | /api/config/paths | GET/POST/PUT/DELETE | Manage path configurations | | /api/test-s3 | POST | Test S3 connection | | /api/health | GET | Health check | | Claude AI Analysis API | | | | /api/analyze | POST | Perform AI analysis on data | | /api/analyze/templates | GET | Get available analysis templates | | /api/analyze/followup | POST | Follow-up analysis questions | | /api/analyze/history | GET | Get analysis history | | /api/analyze/test-connection | POST | Test Claude API connection | | SignalK History API | | | | /signalk/v1/history/values | GET | SignalK History API - Get historical values | | /signalk/v1/history/contexts | GET | SignalK History API - Get available contexts | | /signalk/v1/history/paths | GET | SignalK History API - Get available paths |

DuckDB Integration

Query Examples

Basic Queries

-- Get latest 10 records from navigation position
SELECT * FROM read_parquet('/path/to/navigation/position/*.parquet', union_by_name=true)
ORDER BY received_timestamp DESC LIMIT 10;

-- Count total records
SELECT COUNT(*) FROM read_parquet('/path/to/navigation/position/*.parquet', union_by_name=true);

-- Filter by source
SELECT * FROM read_parquet('/path/to/environment/wind/*.parquet', union_by_name=true)
WHERE source_label = 'mqtt-weatherflow-udp'
ORDER BY received_timestamp DESC LIMIT 100;

-- Aggregate by hour
SELECT
  DATE_TRUNC('hour', received_timestamp::timestamp) as hour,
  AVG(value::double) as avg_value,
  COUNT(*) as record_count
FROM read_parquet('/path/to/data/*.parquet', union_by_name=true)
GROUP BY hour
ORDER BY hour;

🌍 Spatial Analysis Queries

-- Calculate distance traveled over time
WITH ordered_positions AS (
  SELECT
    signalk_timestamp,
    ST_Point(value_longitude, value_latitude) as position,
    LAG(ST_Point(value_longitude, value_latitude)) OVER (ORDER BY signalk_timestamp) as prev_position
  FROM read_parquet('data/vessels/urn_mrn_imo_mmsi_368396230/navigation/position/*.parquet', union_by_name=true)
  WHERE signalk_timestamp >= '2025-09-27T16:00:00Z'
    AND signalk_timestamp <= '2025-09-27T23:59:59Z'
    AND value_latitude IS NOT NULL AND value_longitude IS NOT NULL
),
distances AS (
  SELECT *,
    CASE
      WHEN prev_position IS NOT NULL
      THEN ST_Distance_Sphere(position, prev_position)
      ELSE 0
    END as distance_meters
  FROM ordered_positions
)
SELECT
  strftime(date_trunc('hour', signalk_timestamp::TIMESTAMP), '%Y-%m-%dT%H:%M:%SZ') as time_bucket,
  AVG(value_latitude) as avg_lat,
  AVG(value_longitude) as avg_lon,
  ST_AsText(ST_Centroid(ST_Collect(position))) as centroid,
  SUM(distance_meters) as total_distance_meters,
  COUNT(*) as position_records,
  ST_AsText(ST_ConvexHull(ST_Collect(position))) as movement_area
FROM distances
GROUP BY time_bucket
ORDER BY time_bucket;

-- Multi-vessel proximity analysis
SELECT
  v1.context as vessel1,
  v2.context as vessel2,
  ST_Distance_Sphere(
    ST_Point(v1.value_longitude, v1.value_latitude),
    ST_Point(v2.value_longitude, v2.value_latitude)
  ) as distance_meters,
  v1.signalk_timestamp
FROM read_parquet('data/vessels/*/navigation/position/*.parquet', union_by_name=true) v1
JOIN read_parquet('data/vessels/*/navigation/position/*.parquet', union_by_name=true) v2
  ON v1.signalk_timestamp = v2.signalk_timestamp AND v1.context != v2.context
WHERE v1.signalk_timestamp >= '2025-09-27T00:00:00Z'
  AND ST_Distance_Sphere(
    ST_Point(v1.value_longitude, v1.value_latitude),
    ST_Point(v2.value_longitude, v2.value_latitude)
  ) < 1000  -- Within 1km
ORDER BY distance_meters;

-- Advanced movement analysis with bounding boxes
WITH ordered_positions AS (
  SELECT
    signalk_timestamp,
    ST_Point(value_longitude, value_latitude) as position,
    value_latitude,
    value_longitude,
    LAG(ST_Point(value_longitude, value_latitude)) OVER (ORDER BY signalk_timestamp) as prev_position,
    strftime(date_trunc('hour', signalk_timestamp::TIMESTAMP), '%Y-%m-%dT%H:%M:%SZ') as time_bucket
  FROM read_parquet('data/vessels/urn_mrn_imo_mmsi_368396230/navigation/position/*.parquet', union_by_name=true)
  WHERE signalk_timestamp >= '2025-09-27T16:00:00Z'
    AND signalk_timestamp <= '2025-09-27T23:59:59Z'
    AND value_latitude IS NOT NULL AND value_longitude IS NOT NULL
),
distances AS (
  SELECT *,
    CASE
      WHEN prev_position IS NOT NULL
      THEN ST_Distance_Sphere(position, prev_position)
      ELSE 0
    END as distance_meters
  FROM ordered_positions
)
SELECT
  time_bucket,
  AVG(value_latitude) as avg_lat,
  AVG(value_longitude) as avg_lon,
  -- Calculate bounding box manually
  MIN(value_latitude) as min_lat,
  MAX(value_latitude) as max_lat,
  MIN(value_longitude) as min_lon,
  MAX(value_longitude) as max_lon,
  -- Distance and movement metrics
  SUM(distance_meters) as total_distance_meters,
  ROUND(SUM(distance_meters) / 1000.0, 2) as total_distance_km,
  COUNT(*) as position_records,
  -- Movement area approximation using bounding box
  (MAX(value_latitude) - MIN(value_latitude)) * 111320 *
  (MAX(value_longitude) - MIN(value_longitude)) * 111320 *
  COS(RADIANS(AVG(value_latitude))) as approx_area_m2
FROM distances
GROUP BY time_bucket
ORDER BY time_bucket;

Available Spatial Functions

  • ST_Point(longitude, latitude) - Create point geometries
  • ST_Distance_Sphere(point1, point2) - Calculate distances in meters
  • ST_AsText(geometry) - Convert to Well-Known Text format
  • ST_Centroid(ST_Collect(points)) - Find center of multiple points
  • ST_ConvexHull(ST_Collect(points)) - Create movement boundary polygons

History API Integration

The plugin provides full SignalK History API compliance, allowing you to query historical data using standard SignalK API endpoints with enhanced performance and filtering capabilities.

Available Endpoints

| Endpoint | Description | Parameters | |----------|-------------|------------| | /signalk/v1/history/values | Get historical values for specified paths | Standard patterns (see below)Optional: resolution, refresh, includeMovingAverages, useUTC | | /signalk/v1/history/contexts | Get available vessel contexts for time range | Time Range: Any standard pattern (see below)Returns only contexts with data in specified range | | /signalk/v1/history/paths | Get available SignalK paths for time range | Time Range: Any standard pattern (see below)Returns only paths with data in specified range |

Standard Time Range Patterns

The History API supports 5 standard SignalK time query patterns:

| Pattern | Parameters | Description | Example | |---------|-----------|-------------|---------| | 1 | duration | Query back from now | ?duration=1h | | 2 | from + duration | Query forward from start | ?from=2025-01-01T00:00:00Z&duration=1h | | 3 | to + duration | Query backward to end | ?to=2025-01-01T12:00:00Z&duration=1h | | 4 | from | From start to now | ?from=2025-01-01T00:00:00Z | | 5 | from + to | Specific range | ?from=2025-01-01T00:00:00Z&to=2025-01-02T00:00:00Z |

Legacy Support: The start parameter (used with duration) is deprecated but still supported for backward compatibility. A console warning will be shown. Use standard patterns instead.

Query Parameters

| Parameter | Description | Format | Examples | |-----------|-------------|---------|----------| | Required for /values: | | | | | paths | SignalK paths with optional aggregation | path:method,path:method | navigation.position:first,wind.speed:average | | Time Range: | Use one of the 5 standard patterns above | | | | duration | Time period | [number][unit] | 1h, 30m, 15s, 2d | | from | Start time (ISO 8601) | ISO datetime | 2025-01-01T00:00:00Z | | to | End time (ISO 8601) | ISO datetime | 2025-01-01T06:00:00Z | | Optional: | | | | | context | Vessel context | vessels.self or vessels.<id> | vessels.self (default) | | resolution | Time bucket size in milliseconds | Number | 60000 (1 minute buckets) | | refresh | Enable auto-refresh (pattern 1 only) | true or 1 | refresh=true | | includeMovingAverages | Include EMA/SMA calculations | true or 1 | includeMovingAverages=true | | useUTC | Treat datetime inputs as UTC | true or 1 | useUTC=true | | convertUnits | 🆕 Convert to preferred units (requires signalk-units-preference plugin) | true or 1 | convertUnits=true | | convertTimesToLocal | 🆕 Convert timestamps to local/specified timezone | true or 1 | convertTimesToLocal=true | | timezone | 🆕 IANA timezone ID (used with convertTimesToLocal) | IANA timezone | timezone=America/New_York | | Deprecated: | | | | | start | ⚠️ Use standard patterns instead | now or ISO datetime | Deprecated, use duration or from/to |

Query Examples

Pattern 1: Duration Only (Query back from now)

# Last hour of wind data
curl "http://localhost:3000/signalk/v1/history/values?duration=1h&paths=environment.wind.speedApparent"

# Last 30 minutes with moving averages
curl "http://localhost:3000/signalk/v1/history/values?duration=30m&paths=environment.wind.speedApparent&includeMovingAverages=true"

# Real-time with auto-refresh
curl "http://localhost:3000/signalk/v1/history/values?duration=15m&paths=navigation.position&refresh=true"

Pattern 2: From + Duration (Query forward)

# 6 hours forward from specific time
curl "http://localhost:3000/signalk/v1/history/values?from=2025-01-01T00:00:00Z&duration=6h&paths=navigation.position"

Pattern 3: To + Duration (Query backward)

# 2 hours backward to specific time
curl "http://localhost:3000/signalk/v1/history/values?to=2025-01-01T12:00:00Z&duration=2h&paths=environment.wind.speedApparent"

Pattern 4: From Only (From start to now)

# From specific time until now
curl "http://localhost:3000/signalk/v1/history/values?from=2025-01-01T00:00:00Z&paths=navigation.speedOverGround"

Pattern 5: From + To (Specific range)

# Specific 24-hour period
curl "http://localhost:3000/signalk/v1/history/values?from=2025-01-01T00:00:00Z&to=2025-01-02T00:00:00Z&paths=navigation.position"

Advanced Query Examples

Multiple paths with time alignment:

curl "http://localhost:3000/signalk/v1/history/values?duration=6h&paths=environment.wind.angleApparent,environment.wind.speedApparent,navigation.position&resolution=60000"

Multiple aggregations of same path:

curl "http://localhost:3000/signalk/v1/history/values?from=2025-01-01T00:00:00Z&to=2025-01-01T06:00:00Z&paths=environment.wind.speedApparent:average,environment.wind.speedApparent:min,environment.wind.speedApparent:max&resolution=60000"

With moving averages for trend analysis:

curl "http://localhost:3000/signalk/v1/history/values?duration=24h&paths=electrical.batteries.512.voltage&includeMovingAverages=true&resolution=300000"

Different temporal samples:

curl "http://localhost:3000/signalk/v1/history/values?duration=1h&paths=navigation.position:first,navigation.position:middle_index,navigation.position:last&resolution=60000"

Context and Path Discovery

Get contexts with data in last hour:

curl "http://localhost:3000/signalk/v1/history/contexts?duration=1h"

Get contexts for specific time range:

curl "http://localhost:3000/signalk/v1/history/contexts?from=2025-01-01T00:00:00Z&to=2025-01-07T00:00:00Z"

Get available paths with recent data:

curl "http://localhost:3000/signalk/v1/history/paths?duration=24h"

Get all paths (no time filter):

curl "http://localhost:3000/signalk/v1/history/paths"

Unit Conversion (NEW in v0.6.0)

Convert to user's preferred units:

# Speed in knots (if configured in signalk-units-preference)
curl "http://localhost:3000/signalk/v1/history/values?duration=2d&paths=navigation.speedOverGround&convertUnits=true"

# Wind speed in preferred units (knots, km/h, or mph)
curl "http://localhost:3000/signalk/v1/history/values?duration=1h&paths=environment.wind.speedApparent&convertUnits=true"

# Temperature in preferred units (°C or °F)
curl "http://localhost:3000/signalk/v1/history/values?duration=24h&paths=environment.outside.temperature&convertUnits=true"

Response includes conversion metadata:

{
  "values": [{"path": "navigation.speedOverGround", "method": "average"}],
  "data": [["2025-10-20T16:12:14Z", 5.2]],
  "units": {
    "converted": true,
    "conversions": [{
      "path": "navigation.speedOverGround",
      "baseUnit": "m/s",
      "targetUnit": "knots",
      "symbol": "kn"
    }]
  }
}

Timezone Conversion (NEW in v0.6.0)

Convert to server's local time:

curl "http://localhost:3000/signalk/v1/history/values?duration=2d&paths=environment.wind.speedApparent&convertTimesToLocal=true"

Convert to specific timezone:

# New York time (Eastern)
curl "http://localhost:3000/signalk/v1/history/values?duration=2d&paths=navigation.position&convertTimesToLocal=true&timezone=America/New_York"

# London time
curl "http://localhost:3000/signalk/v1/history/values?duration=1h&paths=environment.wind.speedApparent&convertTimesToLocal=true&timezone=Europe/London"

# Tokyo time
curl "http://localhost:3000/signalk/v1/history/values?duration=1h&paths=navigation.speedOverGround&convertTimesToLocal=true&timezone=Asia/Tokyo"

Response includes timezone metadata:

{
  "range": {
    "from": "2025-10-20T12:12:19-04:00",
    "to": "2025-10-20T13:12:19-04:00"
  },
  "data": [
    ["2025-10-20T12:12:14-04:00", 5.84],
    ["2025-10-20T12:12:28-04:00", 5.26]
  ],
  "timezone": {
    "converted": true,
    "targetTimezone": "America/New_York",
    "offset": "-04:00",
    "description": "Converted to user-specified timezone: America/New_York (-04:00)"
  }
}

Combine both conversions:

# Convert values to knots AND timestamps to New York time
curl "http://localhost:3000/signalk/v1/history/values?duration=2d&paths=navigation.speedOverGround,environment.wind.speedApparent&convertUnits=true&convertTimesToLocal=true&timezone=America/New_York"

Common IANA Timezone IDs:

  • America/New_York - Eastern Time (US)
  • America/Chicago - Central Time (US)
  • America/Denver - Mountain Time (US)
  • America/Los_Angeles - Pacific Time (US)
  • Europe/London - UK
  • Europe/Paris - Central European Time
  • Asia/Tokyo - Japan
  • Pacific/Auckland - New Zealand
  • Australia/Sydney - Australian Eastern Time

Duration Formats

  • 30s - 30 seconds
  • 15m - 15 minutes
  • 2h - 2 hours
  • 1d - 1 day

Timezone Handling (NEW)

Local time conversion (default behavior):

# 8:00 AM local time → automatically converted to UTC
curl "http://localhost:3000/signalk/v1/history/values?context=vessels.self&start=2025-08-13T08:00:00&duration=1h&paths=navigation.position"

UTC time mode:

# 8:00 AM UTC (not converted)
curl "http://localhost:3000/signalk/v1/history/values?context=vessels.self&start=2025-08-13T08:00:00&duration=1h&paths=navigation.position&useUTC=true"

Explicit timezone (always respected):

# Explicit UTC timezone
curl "http://localhost:3000/signalk/v1/history/values?context=vessels.self&start=2025-08-13T08:00:00Z&duration=1h&paths=navigation.position"

# Explicit timezone offset
curl "http://localhost:3000/signalk/v1/history/values?context=vessels.self&start=2025-08-13T08:00:00-04:00&duration=1h&paths=navigation.position"

Timezone behavior:

  • Default (useUTC=false): Datetime strings without timezone info are treated as local time and automatically converted to UTC
  • UTC mode (useUTC=true): Datetime strings without timezone info are treated as UTC time
  • Explicit timezone: Strings with Z, +HH:MM, or -HH:MM are always parsed as-is regardless of useUTC setting
  • start=now: Always uses current UTC time regardless of useUTC setting

Get available contexts:

curl "http://localhost:3000/signalk/v1/history/contexts"

Time Alignment and Bucketing

The History API automatically aligns data from different paths using time bucketing to solve the common problem of misaligned timestamps. This enables:

  • Plotting: Data points align properly on charts
  • Correlation: Compare values from different sensors at the same time
  • Export: Clean, aligned datasets for analysis

Key Features:

  • Smart Type Handling: Automatically handles numeric values (wind speed) and JSON objects (position)
  • Robust Aggregation: Uses proper SQL type casting to prevent type errors
  • Configurable Resolution: Time bucket size in milliseconds (default: auto-calculated based on time range)
  • Multiple Aggregation Methods: average for numeric data, first for complex objects

Parameters:

  • resolution - Time bucket size in milliseconds (default: auto-calculated)
  • Aggregation methods: average, min, max, first, last, mid, middle_index

Aggregation Methods:

  • average - Average value in time bucket (default for numeric data)
  • min - Minimum value in time bucket
  • max - Maximum value in time bucket
  • first - First value in time bucket (default for objects)
  • last - Last value in time bucket
  • mid - Median value (average of middle values for even counts)
  • middle_index - Middle value by index (first of two middle values for even counts)

When to Use Each Method:

  • Numeric data (wind speed, voltage, etc.): Use average, min, max for statistics
  • Position data: Use first, last, middle_index for specific readings
  • String/object data: Avoid mid (unpredictable), prefer first, last, middle_index
  • Multiple stats: Query same path with different methods (e.g., wind:average,wind:max)

Response Format

The History API returns time-aligned data in standard SignalK format.

Default Response (without moving averages)

{
  "context": "vessels.self",
  "range": {
    "from": "2025-01-01T00:00:00Z",
    "to": "2025-01-01T06:00:00Z"
  },
  "values": [
    {
      "path": "environment.wind.speedApparent",
      "method": "average"
    },
    {
      "path": "navigation.position",
      "method": "first"
    }
  ],
  "data": [
    ["2025-01-01T00:00:00Z", 12.5, {"latitude": 37.7749, "longitude": -122.4194}],
    ["2025-01-01T00:01:00Z", 13.2, {"latitude": 37.7750, "longitude": -122.4195}],
    ["2025-01-01T00:02:00Z", 11.8, {"latitude": 37.7751, "longitude": -122.4196}]
  ]
}

With Moving Averages (includeMovingAverages=true)

{
  "context": "vessels.self",
  "range": {
    "from": "2025-01-01T00:00:00Z",
    "to": "2025-01-01T06:00:00Z"
  },
  "values": [
    {
      "path": "environment.wind.speedApparent",
      "method": "average"
    },
    {
      "path": "environment.wind.speedApparent.ema",
      "method": "ema"
    },
    {
      "path": "environment.wind.speedApparent.sma",
      "method": "sma"
    },
    {
      "path": "navigation.position",
      "method": "first"
    }
  ],
  "data": [
    ["2025-01-01T00:00:00Z", 12.5, 12.5, 12.5, {"latitude": 37.7749, "longitude": -122.4194}],
    ["2025-01-01T00:01:00Z", 13.2, 12.64, 12.85, {"latitude": 37.7750, "longitude": -122.4195}],
    ["2025-01-01T00:02:00Z", 11.8, 12.45, 12.5, {"latitude": 37.7751, "longitude": -122.4196}]
  ]
}

Notes:

  • Each data array element is [timestamp, value1, value2, ...] corresponding to the paths in the values array
  • Moving averages (EMA/SMA) are opt-in - add includeMovingAverages=true to include them
  • EMA/SMA are only calculated for numeric values; non-numeric values (objects, strings) show null for their EMA/SMA columns
  • Without includeMovingAverages, response size is ~66% smaller

Claude AI Analysis

The plugin integrates Claude AI to provide intelligent analysis of maritime data, offering insights that would be difficult to extract through traditional querying methods.

Advanced Charting and Visualization

Claude AI can generate interactive charts and visualizations directly from your data using Plotly.js specifications. Charts are automatically embedded in analysis responses when analysis would benefit from visualization.

Supported Chart Types:

  • Line Charts: Time series trends for navigation, environmental, and performance data
  • Bar Charts: Categorical analysis and frequency distributions
  • Scatter Plots: Correlation analysis between different parameters
  • Wind Roses/Radar Charts: Professional wind direction and speed frequency analysis
  • Multiple Series Charts: Compare multiple data streams on the same chart
  • Polar Charts: Wind patterns, compass headings, and directional data

Marine-Specific Chart Features:

  • Wind Analysis: Automated wind rose generation with Beaufort scale categories
  • Navigation Plots: Course over ground, speed trends, and position tracking
  • Environmental Monitoring: Temperature, pressure, and weather pattern visualization
  • Performance Analysis: Fuel efficiency, battery usage, and system performance charts
  • Multi-Vessel Comparisons: Side-by-side analysis of multiple vessels

Chart Data Integrity:

  • All chart data is sourced directly from database queries - no fabricated or estimated data
  • Charts display exact data points from query results with full traceability
  • Automatic validation ensures chart data matches query output
  • Time-aligned data from History API ensures accurate multi-parameter visualization

Example Chart Generation: When you ask Claude to "analyze wind patterns over the last 48 hours", it will:

  1. Query your wind direction and speed data
  2. Generate a wind rose chart showing frequency by compass direction
  3. Color-code by wind speed categories (calm, light breeze, strong breeze, etc.)
  4. Display the chart as interactive Plotly.js visualization in the web interface

Charts are automatically included when analysis benefits from visualization, or you can explicitly request specific chart types like "create a line chart" or "show me a wind rose".

PLANNED Analysis Templates NB: NOT YET IMPLEMENTED

EXAMPLES OF POSSIBLE Pre-built analysis templates provide ready-to-use analysis for common maritime operations:

Navigation & Routing Templates

  • Navigation Summary: Comprehensive analysis of navigation patterns and route efficiency
  • Route Optimization: Identify opportunities to optimize routes for efficiency and safety
  • Anchoring Analysis: Analyze anchoring patterns, duration, and safety considerations

Weather & Environment Templates

  • Weather Impact Analysis: Analyze how weather conditions affect vessel performance
  • Wind Pattern Analysis: Detailed wind analysis for sailing optimization

Electrical System Templates

  • Battery Health Assessment: Comprehensive battery performance and charging pattern analysis
  • Power Consumption Analysis: Analyze electrical power usage patterns and efficiency

Safety & Monitoring Templates

  • Safety Anomaly Detection: Detect unusual patterns that might indicate safety concerns
  • Equipment Health Monitoring: Monitor equipment performance and predict maintenance needs

Performance & Efficiency Templates

  • Fuel Efficiency Analysis: Analyze fuel consumption patterns and identify efficiency opportunities
  • Overall Performance Trends: Comprehensive vessel performance analysis over time

Using Claude AI Analysis

Via Web Interface

  1. Navigate to the plugin's web interface
  2. Go to the 🧠 AI Analysis tab
  3. Select a data path to analyze
  4. Choose an analysis template or create custom analysis
  5. Configure time range and analysis parameters
  6. Click Analyze Data to generate insights

Via API

Test Claude Connection:

curl -X POST http://localhost:3000/plugins/signalk-parquet/api/analyze/test-connection

Get Available Templates:

curl http://localhost:3000/plugins/signalk-parquet/api/analyze/templates

Custom Analysis:

curl -X POST http://localhost:3000/plugins/signalk-parquet/api/analyze \
  -H "Content-Type: application/json" \
  -d '{
    "dataPath": "environment.wind.speedTrue,navigation.speedOverGround",
    "analysisType": "custom",
    "customPrompt": "Analyze the relationship between wind speed and vessel speed. Identify optimal wind conditions for best performance.",
    "timeRange": {
      "start": "2025-01-01T00:00:00Z",
      "end": "2025-01-07T00:00:00Z"
    },
    "aggregationMethod": "average",
    "resolution": "3600000"
  }'

Analysis Response Format

Claude AI analysis returns structured insights:

{
  "id": "analysis_1234567890_abcdef123",
  "analysis": "Main analysis text with detailed insights",
  "insights": [
    "Key insight 1",
    "Key insight 2",
    "Key insight 3"
  ],
  "recommendations": [
    "Actionable recommendation 1",
    "Actionable recommendation 2"
  ],
  "anomalies": [
    {
      "timestamp": "2025-01-01T12:00:00Z",
      "value": 25.5,
      "expectedRange": {"min": 10.0, "max": 20.0},
      "severity": "medium",
      "description": "Wind speed higher than normal range",
      "confidence": 0.87
    }
  ],
  "confidence": 0.92,
  "dataQuality": "High quality data with 98% completeness",
  "timestamp": "2025-01-01T15:30:00Z",
  "metadata": {
    "dataPath": "environment.wind.speedTrue",
    "analysisType": "summary",
    "recordCount": 1440,
    "timeRange": {
      "start": "2025-01-01T00:00:00Z",
      "end": "2025-01-02T00:00:00Z"
    }
  }
}

Analysis History

All Claude AI analyses are automatically saved and can be retrieved:

Get Analysis History:

curl http://localhost:3000/plugins/signalk-parquet/api/analyze/history?limit=10

History files are stored in: data/analysis-history/analysis_*.json

Best Practices

  1. Data Quality: Ensure good data coverage for more reliable analysis
  2. Time Ranges: Use appropriate time ranges - longer for trends, shorter for anomalies
  3. Path Selection: Combine related paths for correlation analysis
  4. Template Usage: Start with templates then customize prompts as needed
  5. API Limits: Be mindful of Anthropic API token limits and costs
  6. Model Selection: Use Opus for complex analysis, Sonnet for general use, Haiku for quick insights

Troubleshooting Claude AI

Common Issues:

  • "Claude not enabled": Check plugin configuration and enable Claude integration
  • "API key missing": Add valid Anthropic API key in plugin settings
  • "Analysis timeout": Reduce data size or use faster model (Haiku)
  • "Token limit exceeded": Reduce time range or use data sampling

Debug Claude Integration:

# Test API connection
curl -X POST http://localhost:3000/plugins/signalk-parquet/api/analyze/test-connection

# Check plugin logs for Claude-specific messages
journalctl -u signalk -f | grep -i claude

Moving Averages (EMA & SMA)

The plugin calculates Exponential Moving Average (EMA) and Simple Moving Average (SMA) for numeric values when explicitly requested via the includeMovingAverages parameter, providing enhanced trend analysis capabilities.

How to Enable

History API:

# Add includeMovingAverages=true to any query
curl "http://localhost:3000/signalk/v1/history/values?duration=1h&paths=environment.wind.speedApparent&includeMovingAverages=true"

Default Behavior (v0.5.6+):

  • Moving averages are opt-in - not included by default
  • Reduces response size by ~66% when not needed
  • Better API compliance with SignalK specification

Legacy Behavior (pre-v0.5.6):

  • Moving averages were automatically included for all queries
  • To maintain old behavior, add includeMovingAverages=true to all requests

Calculation Details

Exponential Moving Average (EMA)

  • Period: ~10 equivalent (α = 0.2)
  • Formula: EMA = α × currentValue + (1 - α) × previousEMA
  • Characteristic: Responds faster to recent changes, emphasizes recent data
  • Use Case: Trend detection, rapid response to data changes

Simple Moving Average (SMA)

  • Period: 10 data points
  • Formula: Average of the last 10 values
  • Characteristic: Smooths out fluctuations, equal weight to all values in window
  • Use Case: Noise reduction, general trend analysis

Data Flow & Continuity

// Initial Data Load (isIncremental: false)
Point 1: Value=5.0, EMA=5.0,   SMA=5.0
Point 2: Value=6.0, EMA=5.2,   SMA=5.5
Point 3: Value=4.0, EMA=5.0,   SMA=5.0

// Incremental Updates (isIncremental: true)
Point 4: Value=7.0, EMA=5.4,   SMA=5.5  // Continues from previous EMA
Point 5: Value=5.5, EMA=5.42,  SMA=5.5  // Rolling 10-point SMA window

Key Features

  • 🎛️ Opt-In: Add includeMovingAverages=true to enable (v0.5.6+)
  • Memory Efficient: SMA maintains rolling 10-point window
  • Non-Numeric Handling: Non-numeric values (strings, objects) show null for EMA/SMA
  • Precision: Values rounded to 3 decimal places to prevent floating-point noise
  • Performance: Smaller response sizes when not needed

Real-world Applications

Marine Data Examples:

  • Wind Speed: EMA detects gusts quickly, SMA shows general wind conditions
  • Battery Voltage: EMA shows charging/discharging trends, SMA indicates overall battery health
  • Engine RPM: EMA responds to throttle changes, SMA shows average operating level
  • Water Temperature: EMA detects thermal changes, SMA provides stable baseline

Available in:

  • 📊 History API: Add includeMovingAverages=true to include EMA/SMA calculations

S3 Integration

Upload Timing

Real-time Upload: Files are uploaded immediately after creation

{
  "s3Upload": {
    "enabled": true,
    "timing": "realtime"
  }
}

Consolidation Upload: Files are uploaded after daily consolidation

{
  "s3Upload": {
    "enabled": true,
    "timing": "consolidation"
  }
}

S3 Key Structure

With prefix marine-data/:

marine-data/vessels/self/navigation/position/signalk_data_20250716_consolidated.parquet
marine-data/vessels/self/environment/wind/angleApparent/signalk_data_20250716_120000.parquet

File Consolidation

The plugin automatically consolidates files daily at midnight UTC:

  1. File Discovery: Finds all files for the previous day
  2. Merging: Combines files by SignalK path
  3. Sorting: Sorts records by timestamp
  4. Cleanup: Moves source files to processed/ directory
  5. S3 Upload: Uploads consolidated files if configured

Performance Characteristics

  • Memory Usage: Configurable buffer sizes (default 1000 records)
  • Disk I/O: Efficient batch writes with configurable intervals
  • CPU Usage: Minimal - mostly I/O bound operations
  • Network: Optional S3 uploads with retry logic

Development

Project Structure

signalk-parquet/
├── src/
│   ├── index.ts              # Main plugin entry point and lifecycle (~340 lines)
│   ├── commands.ts           # Command management system (~400 lines)
│   ├── data-handler.ts       # Data processing, subscriptions, S3 (~650 lines)
│   ├── api-routes.ts         # Web API endpoints (~600 lines)
│   ├── types.ts              # TypeScript interfaces (~360 lines)
│   ├── parquet-writer.ts     # File writing logic
│   ├── HistoryAPI.ts         # SignalK History API implementation
│   ├── HistoryAPI-types.ts   # History API type definitions
│   └── utils/
│       └── path-helpers.ts   # Path utility functions
├── dist/                     # Compiled JavaScript
├── public/
│   ├── index.html           # Web interface
│   └── parquet.png          # Plugin icon
├── tsconfig.json            # TypeScript configuration
├── package.json             # Dependencies and scripts
└── README.md               # This file

Code Architecture

The plugin uses a modular TypeScript architecture for maintainability:

  • index.ts: Plugin lifecycle, configuration, and initialization
  • commands.ts: SignalK command registration, execution, and management
  • data-handler.ts: Data subscriptions, buffering, consolidation, and S3 operations
  • api-routes.ts: REST API endpoints for web interface
  • types.ts: Comprehensive TypeScript type definitions
  • utils/: Utility functions and helpers

Adding New Features

  1. API Endpoints: Add to src/api-routes.ts
  2. Data Processing: Extend src/data-handler.ts
  3. Commands: Modify src/commands.ts
  4. Types: Add interfaces to src/types.ts
  5. Claude AI Models: Update src/claude-models.ts (see below)
  6. Update Documentation: Update README and inline comments

Updating Claude AI Models

When Anthropic releases new models, update the single source of truth in src/claude-models.ts:

export const CLAUDE_MODELS = {
  OPUS_4_1: 'claude-opus-4-1-20250805',
  OPUS_4: 'claude-opus-4-20250514',
  SONNET_4: 'claude-sonnet-4-20250514',
  SONNET_4_5: 'claude-sonnet-4-5-20250929',
  // Add new models here
} as const;

export const SUPPORTED_CLAUDE_MODELS = [
  CLAUDE_MODELS.OPUS_4_1,
  CLAUDE_MODELS.OPUS_4,
  CLAUDE_MODELS.SONNET_4,
  CLAUDE_MODELS.SONNET_4_5,
  // Add to supported list
] as const;

export const DEFAULT_CLAUDE_MODEL = CLAUDE_MODELS.SONNET_4_5; // Update default if needed

export const CLAUDE_MODEL_DESCRIPTIONS = {
  [CLAUDE_MODELS.OPUS_4_1]: 'Claude Opus 4.1 (Most Capable & Intelligent)',
  [CLAUDE_MODELS.OPUS_4]: 'Claude Opus 4 (Previous Flagship)',
  [CLAUDE_MODELS.SONNET_4]: 'Claude Sonnet 4 (Balanced Performance)',
  [CLAUDE_MODELS.SONNET_4_5]: 'Claude Sonnet 4.5 (Latest Sonnet)',
  // Add descriptions for new models
} as const;

Why this matters:

  • All model definitions are centralized in one file
  • Type safety across the entire codebase
  • Automatic migration of outdated models on plugin startup
  • Prevents form validation errors when users have old model values saved
  • No need to update multiple files when adding new models

The plugin automatically migrates old/invalid model values to the current default on startup, preventing configuration save failures.

Type Checking

The plugin uses strict TypeScript configuration:

{
  "compilerOptions": {
    "strict": true,
    "noImplicitAny": true,
    "noImplicitReturns": true,
    "strictNullChecks": true
  }
}

Troubleshooting

Common Issues

Build Errors

# Clean and rebuild
npm run clean
npm run build

DuckDB Not Available

  • Check that @duckdb/node-api is installed
  • Verify Node.js version compatibility (>=16.0.0)

S3 Upload Failures

  • Verify AWS credentials and permissions
  • Check S3 bucket exists and is accessible
  • Test connection using web interface

No Data Collection

  • Verify path configurations are correct
  • Check if regimens are properly activated
  • Review SignalK logs for subscription errors

Debug Mode

Enable debug logging in SignalK:

{
  "settings": {
    "debug": "signalk-parquet*"
  }
}

Runtime Dependencies

  • @dsnp/parquetjs: Parquet file format support
  • @duckdb/node-api: SQL query engine
  • @aws-sdk/client-s3: S3 upload functionality
  • fs-extra: Enhanced file system operations
  • glob: File pattern matching
  • express: Web server framework

Development Dependencies

  • typescript: TypeScript compiler
  • @types/node: Node.js type definitions
  • @types/express: Express type definitions
  • @types/fs-extra: fs-extra type definitions

License

MIT License - See LICENSE file for details.

Testing

Comprehensive testing procedures are documented in TESTING.md. The testing guide covers:

  • Installation and build verification
  • Plugin configuration testing
  • Web interface functionality
  • Data collection validation
  • Regimen control testing
  • File output verification
  • S3 integration testing
  • API endpoint testing
  • Performance testing
  • Error handling validation

Quick Test

# Test plugin health
curl http://localhost:3000/plugins/signalk-parquet/api/health

# Test path configuration
curl http://localhost:3000/plugins/signalk-parquet/api/config/paths

# Test data collection
curl http://localhost:3000/plugins/signalk-parquet/api/paths

# Test History API
curl "http://localhost:3000/signalk/v1/history/contexts"

TODO

  • [x] Implement startup consolidation for missed previous days (exclude current day)
  • [x] Add history API integration
  • [ ] Incorporate user preferences from units-preference in the regimen filter system
  • [ ] Expose recorded spatial event via api endpoint (geojson)
  • [ ] Add Grafana integration

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add TypeScript types for new features
  4. Include tests and documentation
  5. Follow the testing procedures in TESTING.md
  6. Submit a pull request

Changelog

See CHANGELOG.md for complete version history.

Version 0.5.6-beta.1 (Latest)

  • 🎯 SignalK History API Compliance: Full support for all 5 standard time range patterns
  • ⏪ Backward Compatibility: Legacy start parameter supported with deprecation warnings
  • 🎛️ Optional Moving Averages: EMA/SMA now opt-in via includeMovingAverages parameter
  • 🔍 Time-Filtered Discovery: Paths and contexts endpoints accept time range parameters
  • ⚡ Performance: 4.3x faster context discovery (13s → 3s) with SQL optimization and caching

Version 0.5.5-beta.4

  • 🔧 Threshold Automation State Machine Fix: Fixed automation enable/disable transitions to properly execute state changes
    • When enabling automation (.auto = true): Command is now set to OFF, then all thresholds are immediately evaluated
    • When disabling automation (.auto = false): Threshold monitoring stops and command state remains unchanged
    • Default command state is hardcoded to OFF on server side
    • Fixed autoPutHandler in src/commands.ts to execute one-time transition operations when .auto path toggles
    • Ensures thresholds take control immediately upon automation enable instead of waiting for next SignalK delta update
    • Removed user-configurable default state dropdown from UI (both add and edit command forms)

Version 0.5.5-beta.3

  • 🧱 Front-end Modularization: Replaced the 5,000-line inline dashboard script with focused JS modules under public/js, improving readability and maintainability.
  • ⚙️ Threshold Automation Fix: Threshold monitoring now listens to raw SignalK values via getSelfStream, so saved trigger conditions reliably toggle their commands.

Version 0.5.5-beta.1

  • 🌍 NEW Spatial Analysis System: Advanced geographic analysis capabilities with DuckDB spatial extension
    • Complete spatial function integration: ST_Point, ST_Distance_Sphere, ST_Centroid, ST_ConvexHull, ST_AsText
    • Automatic spatial extension loading in all DuckDB connections for seamless geographic queries
    • Track analysis with consecutive position distance calculations using LAG() window functions
    • Movement area analysis with bounding boxes and geographic coordinate system calculations
    • Multi-vessel proximity detection for collision avoidance and traffic analysis
  • 🧠 Enhanced Claude AI Spatial Intelligence: Claude now automatically suggests spatial queries for geographic questions
    • Comprehensive spatial query patterns and examples integrated into Claude's analysis prompts
    • Automatic detection of spatial analysis opportunities (track analysis, distance calculations, movement patterns)
    • Pre-built spatial query templates for common maritime analysis scenarios
    • Advanced movement analysis with time bucketing, centroids, and area calculations
  • 📊 Standardized Query Syntax: All queries now use read_parquet() function for better schema flexibility
    • Updated HistoryAPI, api-routes, and claude-analyzer to use read_parquet() with union_by_name=true
    • Enhanced schema compatibility and better error handling for evolving data structures
    • Consistent query patterns across all system components for maintainability
  • 📖 Comprehensive Documentation: Enhanced README with spatial analysis examples and capabilities
    • Advanced spatial query examples including distance tracking and movement analysis
    • Complete spatial function reference with practical maritime use cases
    • Multi-vessel proximity analysis templates for collision detection scenarios

Version 0.5.4-beta.1

  • 🔍 NEW Data Validation System: Comprehensive Parquet file schema validation against SignalK metadata standards
    • Real-time validation of file schemas with progress tracking and cancellation support
    • Detects incorrect data types (e.g., numeric strings, boolean strings) in existing files
    • Validates against SignalK metadata units (meters, volts, amperes) for proper type mapping
  • 🔧 NEW Automated Schema Repair: One-click repair of schema violations with safe backup operations
    • Automatic conversion of incorrectly stored data types (UTF8 → DOUBLE, UTF8 → BOOLEAN)
    • Creates backup files before modification and quarantines corrupted files
    • Processes thousands of files with real-time progress monitoring
  • ⚡ Major Performance Fix: Resolved repair hanging issue on ARM systems (Raspberry Pi, AWS)
    • Replaced problematic DuckDB command-line spawning with direct parquet library schema reading
    • Repair now works reliably across all architectures (x86, ARM64, Apple Silicon)
    • Unified schema reading approach between validation and repair for consistency
  • 📊 Storage & Query Benefits: Proper data types provide significant performance improvements
    • ~50% storage reduction for numeric data (DOUBLE vs UTF8 strings)
    • 5-10x faster query performance with native numeric operations
    • Enhanced data integrity and analytics compatibility

Version 0.5.3-beta.2

  • 🎨 Enhanced User Interface: Major improvements to path configuration and form usability
    • Intelligent SignalK path dropdowns with real-time data population
    • Radio button filters to distinguish self vessel vs other vessel paths
    • Dynamic regimen selection with checkbox interfaces replacing text inputs
    • Auto-populated regimens from defined commands API with dynamic updates
    • Improved form layout with proper label/checkbox alignment and spacing
    • Support for both pre-defined and custom regimen selection
  • 🎮 Regimen/Commands Manager Enhancements: Streamlined command management experience
    • Renamed "Command Manager" to "Regimen/Commands Manager" for clarity
    • Auto-refresh functionality when selecting the tab (eliminates manual refresh)
    • Removed redundant manual refresh button for cleaner interface
    • Dynamic regimen filtering excludes command paths from path dropdowns
  • 🧠 AI Analysis Improvements: Enhanced analysis experience and cancellation controls
    • Analysis button shows "Running Analysis (click to cancel)" during processing
    • Visual feedback with color changes during analysis execution
    • Dual cancellation options: button click or separate cancel button
    • Improved error handling distinguishing between cancelled vs failed analyses
    • Better prevention of double-clicking issues with clear visual states
  • ⚙️ Configuration Management: More flexible path and regimen configuration
    • Removed regimen requirement validation from edit forms (optional regimens)
    • Enhanced dropdown population excludes already configured command paths
    • Better alignment of UI elements with consistent styling patterns
    • Tab labels consolidated to single lines for cleaner navigation

Version 0.5.3-beta.1

  • 📊 Advanced Charting & Visualization: Comprehensive chart generation capabilities with Claude AI
    • Interactive Plotly.js chart embedding with marine-specific visualizations
    • Automated wind rose generation with Beaufort scale categories and compass sectors
    • Multiple chart types: line charts, bar charts, scatter plots, polar charts, radar charts
    • Wind analysis tools with directional frequency distributions
    • Chart data integrity validation ensuring all visualizations use real query data
    • Time-aligned multi-parameter visualization support via History API
  • 🔍 Enhanced Path Discovery: Improved SignalK path discovery and source filtering
    • StreamBundle integration for efficient path enumeration
    • Better source filtering with wildcard pattern support
    • Enhanced debug logging for path discovery troubleshooting
  • 🛡️ Parquet File Validation: Added comprehensive corruption detection and quarantin