@thalalabs/aptos-resilient-client

v0.1.4

Published

5 months ago

A resilient Aptos client with automatic failover and recovery

Downloads

183

0High
0Medium
0Low

0xbe1

samuelqzq

lawson-thala

aptos blockchain resilient failover rpc

Aptos Resilient Client

A resilient Aptos client with automatic failover and recovery capabilities. This client maintains multiple RPC endpoint connections and automatically switches between them when failures occur, ensuring high availability for your Aptos blockchain interactions.

Features

Automatic Failover: Switches to backup endpoints when the primary fails
Auto Recovery: Automatically switches back to higher-priority endpoints when they recover
Configurable Thresholds: Customize failure tolerance and health check intervals
Request Timeout: Configurable timeout for all requests
Health Monitoring: Periodic health checks for failed endpoints
Full Aptos SDK Compatibility: Works as a drop-in replacement for the standard Aptos client
TypeScript Support: Full type definitions included

Installation

npm install @thalalabs/aptos-resilient-client @aptos-labs/ts-sdk

Quick Start

import { AptosResilientClient } from "@thalalabs/aptos-resilient-client";
import { AptosConfig } from "@aptos-labs/ts-sdk";

// Create a resilient client with multiple endpoints
const resilientClient = new AptosResilientClient({
  endpoints: [
    new AptosConfig({ fullnode: "https://api.mainnet.aptoslabs.com/v1" }), // Primary
    new AptosConfig({ fullnode: "https://1rpc.io/aptos/v1" }),              // Backup
  ],
  unhealthyThreshold: 3,      // Mark endpoint unhealthy after 3 failures
  healthCheckInterval: 30000, // Check every 30 seconds
  requestTimeout: 10000,      // 10 second timeout per request
});

// Get the Aptos client instance
const client = resilientClient.getClient();

// Use it just like the standard Aptos client
const ledgerInfo = await client.getLedgerInfo();
const accountInfo = await client.getAccountInfo({ accountAddress: "0x1" });

// Get statistics
const stats = resilientClient.getStats();
console.log("Active endpoint:", stats.activeEndpointUrl);
console.log("Total failovers:", stats.totalFailovers);

// Clean up when done
resilientClient.destroy();

Configuration

ResilientClientConfig

| Property | Type | Default | Description | |----------|------|---------|-------------| | endpoints | AptosConfig[] | required | Array of AptosConfig objects in priority order (first = highest priority). Each AptosConfig can specify fullnode URL, network, and custom client configuration (headers, API keys, etc.) | | unhealthyThreshold | number | 3 | Number of consecutive failures before marking endpoint as unhealthy. Unhealthy endpoints are skipped in future requests. | | healthCheckInterval | number | 30000 | Interval in milliseconds between health checks for unhealthy endpoints | | requestTimeout | number | 10000 | Timeout in milliseconds for each request |

How It Works

Failover Logic

Within a Single Request:

The client tries all healthy endpoints sequentially, starting with the highest priority
If endpoint A fails, it immediately tries endpoint B (no retries)
Each failure increments that endpoint's consecutive failure counter
The request succeeds as soon as any endpoint responds successfully

Across Multiple Requests:

After unhealthyThreshold consecutive failures, an endpoint is marked unhealthy
Unhealthy endpoints are skipped entirely in future requests (cost optimization)
The active endpoint becomes the first healthy endpoint in priority order

Example with 3 endpoints (threshold = 3):

Request 1: Primary fails (1/3) → Backup succeeds ✓
Request 2: Primary fails (2/3) → Backup succeeds ✓
Request 3: Primary fails (3/3, marked unhealthy) → Backup succeeds ✓
Request 4+: Skip primary entirely → Backup succeeds ✓ (saves time & money)

Recovery Logic

A background health check runs every healthCheckInterval milliseconds
Unhealthy endpoints are tested with a lightweight request (getLedgerInfo())
If an endpoint becomes healthy again and has higher priority than the current active endpoint, the client switches back to it
This ensures you always use the highest-priority available endpoint

Error Handling

The client distinguishes between:

Network errors (timeout, connection refused, etc.) → Triggers failover
Application errors (invalid parameters, etc.) → Thrown immediately without failover

Network errors that trigger failover include:

Timeouts
Connection refused (ECONNREFUSED)
DNS errors (ENOTFOUND)
Connection reset (ECONNRESET)
HTTP 502, 503, 504 errors

API Reference

AptosResilientClient

Constructor

new AptosResilientClient(config: ResilientClientConfig)

Methods

`getClient(): Aptos`

Returns the proxy Aptos client instance that should be used for all operations.

const client = resilientClient.getClient();
const ledgerInfo = await client.getLedgerInfo();

`getStats(): ClientStats`

Returns current statistics about the client.

const stats = resilientClient.getStats();
console.log(stats.activeEndpointUrl); // Currently active endpoint
console.log(stats.totalFailovers);    // Total number of failovers
console.log(stats.totalRecoveries);   // Total number of recoveries
console.log(stats.endpoints);         // Health status of all endpoints

`checkHealth(): Promise<void>`

Manually trigger a health check for all unhealthy endpoints.

await resilientClient.checkHealth();

`destroy(): void`

Stop the health check interval and clean up resources.

resilientClient.destroy();

ClientStats

interface ClientStats {
  activeEndpointIndex: number;      // Index of currently active endpoint
  activeEndpointUrl: string;        // URL of currently active endpoint
  endpoints: EndpointState[];       // State of all endpoints
  totalFailovers: number;           // Total failovers that have occurred
  totalRecoveries: number;          // Total recoveries (switches back to higher priority)
}

EndpointState

interface EndpointState {
  url: string;                      // The RPC endpoint URL
  healthy: boolean;                 // Whether this endpoint is currently healthy
  consecutiveFailures: number;      // Number of consecutive failures
  lastFailureTime?: number;         // Timestamp of last failure
  lastSuccessTime?: number;         // Timestamp of last successful request
}

Examples

Basic Usage

import { AptosResilientClient } from "@thalalabs/aptos-resilient-client";
import { AptosConfig } from "@aptos-labs/ts-sdk";

const resilientClient = new AptosResilientClient({
  endpoints: [
    new AptosConfig({ fullnode: "https://api.mainnet.aptoslabs.com/v1" }),
    new AptosConfig({ fullnode: "https://1rpc.io/aptos/v1" }),
  ],
});

const client = resilientClient.getClient();

// Fetch ledger info
const ledgerInfo = await client.getLedgerInfo();
console.log("Chain ID:", ledgerInfo.chain_id);

// Get account info
const accountInfo = await client.getAccountInfo({
  accountAddress: "0x1"
});

resilientClient.destroy();

Monitoring Health

import { AptosConfig } from "@aptos-labs/ts-sdk";

const resilientClient = new AptosResilientClient({
  endpoints: [
    new AptosConfig({ fullnode: "https://endpoint1.com" }),
    new AptosConfig({ fullnode: "https://endpoint2.com" }),
  ],
  healthCheckInterval: 10000, // Check every 10 seconds
});

const client = resilientClient.getClient();

// Monitor stats periodically
setInterval(() => {
  const stats = resilientClient.getStats();
  console.log("Active:", stats.activeEndpointUrl);
  console.log("Failovers:", stats.totalFailovers);

  stats.endpoints.forEach(ep => {
    console.log(`${ep.url}: ${ep.healthy ? 'healthy' : 'unhealthy'} (${ep.consecutiveFailures} failures)`);
  });
}, 5000);

Cost Optimization Use Case

import { AptosConfig } from "@aptos-labs/ts-sdk";

// Optimize for cost: use cheap primary, only use expensive backups when necessary
const resilientClient = new AptosResilientClient({
  endpoints: [
    new AptosConfig({ fullnode: "https://cheap-primary.example.com" }),    // Cheapest, use whenever possible
    new AptosConfig({ fullnode: "https://expensive-backup.example.com" }), // More expensive, use when primary fails
  ],
  unhealthyThreshold: 3,      // Allow 3 failures before giving up on primary
  healthCheckInterval: 30000, // Check primary every 30s to switch back ASAP
  requestTimeout: 10000,      // Don't wait too long on failed endpoints
});

Custom Configuration with API Keys

import { AptosConfig, Network } from "@aptos-labs/ts-sdk";

const resilientClient = new AptosResilientClient({
  endpoints: [
    // Primary: Provider with API key
    new AptosConfig({
      fullnode: "https://primary.example.com",
      network: Network.MAINNET,
      clientConfig: {
        API_KEY: "your-api-key",
        headers: {
          "X-Custom-Header": "value"
        }
      }
    }),
    // Backup: Another provider with different API key
    new AptosConfig({
      fullnode: "https://backup.example.com",
      network: Network.MAINNET,
      clientConfig: {
        API_KEY: "your-backup-api-key",
      }
    }),
    // Fallback: Free public endpoint (no API key needed)
    new AptosConfig({ fullnode: "https://api.mainnet.aptoslabs.com/v1" }),
  ],
  unhealthyThreshold: 2,       // More aggressive - mark unhealthy after 2 failures
  healthCheckInterval: 60000,  // Check every minute
  requestTimeout: 15000,       // 15 second timeout
});

Development

Build

pnpm build

Run Example

pnpm example

Best Practices

Endpoint Priority: List endpoints in order of preference (fastest/most reliable first)
Timeout Configuration: Set requestTimeout based on your network conditions and requirements
Health Check Interval: Balance between quick recovery and avoiding unnecessary requests
Cleanup: Always call destroy() when you're done to clean up the health check interval
Error Handling: Wrap operations in try-catch blocks as you would with the standard Aptos client

License

MIT