@thalalabs/aptos-resilient-client
v0.1.4
Published
A resilient Aptos client with automatic failover and recovery
Readme
Aptos Resilient Client
A resilient Aptos client with automatic failover and recovery capabilities. This client maintains multiple RPC endpoint connections and automatically switches between them when failures occur, ensuring high availability for your Aptos blockchain interactions.
Features
- Automatic Failover: Switches to backup endpoints when the primary fails
- Auto Recovery: Automatically switches back to higher-priority endpoints when they recover
- Configurable Thresholds: Customize failure tolerance and health check intervals
- Request Timeout: Configurable timeout for all requests
- Health Monitoring: Periodic health checks for failed endpoints
- Full Aptos SDK Compatibility: Works as a drop-in replacement for the standard Aptos client
- TypeScript Support: Full type definitions included
Installation
npm install @thalalabs/aptos-resilient-client @aptos-labs/ts-sdkQuick Start
import { AptosResilientClient } from "@thalalabs/aptos-resilient-client";
import { AptosConfig } from "@aptos-labs/ts-sdk";
// Create a resilient client with multiple endpoints
const resilientClient = new AptosResilientClient({
endpoints: [
new AptosConfig({ fullnode: "https://api.mainnet.aptoslabs.com/v1" }), // Primary
new AptosConfig({ fullnode: "https://1rpc.io/aptos/v1" }), // Backup
],
unhealthyThreshold: 3, // Mark endpoint unhealthy after 3 failures
healthCheckInterval: 30000, // Check every 30 seconds
requestTimeout: 10000, // 10 second timeout per request
});
// Get the Aptos client instance
const client = resilientClient.getClient();
// Use it just like the standard Aptos client
const ledgerInfo = await client.getLedgerInfo();
const accountInfo = await client.getAccountInfo({ accountAddress: "0x1" });
// Get statistics
const stats = resilientClient.getStats();
console.log("Active endpoint:", stats.activeEndpointUrl);
console.log("Total failovers:", stats.totalFailovers);
// Clean up when done
resilientClient.destroy();Configuration
ResilientClientConfig
| Property | Type | Default | Description |
|----------|------|---------|-------------|
| endpoints | AptosConfig[] | required | Array of AptosConfig objects in priority order (first = highest priority). Each AptosConfig can specify fullnode URL, network, and custom client configuration (headers, API keys, etc.) |
| unhealthyThreshold | number | 3 | Number of consecutive failures before marking endpoint as unhealthy. Unhealthy endpoints are skipped in future requests. |
| healthCheckInterval | number | 30000 | Interval in milliseconds between health checks for unhealthy endpoints |
| requestTimeout | number | 10000 | Timeout in milliseconds for each request |
How It Works
Failover Logic
Within a Single Request:
- The client tries all healthy endpoints sequentially, starting with the highest priority
- If endpoint A fails, it immediately tries endpoint B (no retries)
- Each failure increments that endpoint's consecutive failure counter
- The request succeeds as soon as any endpoint responds successfully
Across Multiple Requests:
- After
unhealthyThresholdconsecutive failures, an endpoint is marked unhealthy - Unhealthy endpoints are skipped entirely in future requests (cost optimization)
- The active endpoint becomes the first healthy endpoint in priority order
Example with 3 endpoints (threshold = 3):
- Request 1: Primary fails (1/3) → Backup succeeds ✓
- Request 2: Primary fails (2/3) → Backup succeeds ✓
- Request 3: Primary fails (3/3, marked unhealthy) → Backup succeeds ✓
- Request 4+: Skip primary entirely → Backup succeeds ✓ (saves time & money)
Recovery Logic
- A background health check runs every
healthCheckIntervalmilliseconds - Unhealthy endpoints are tested with a lightweight request (
getLedgerInfo()) - If an endpoint becomes healthy again and has higher priority than the current active endpoint, the client switches back to it
- This ensures you always use the highest-priority available endpoint
Error Handling
The client distinguishes between:
- Network errors (timeout, connection refused, etc.) → Triggers failover
- Application errors (invalid parameters, etc.) → Thrown immediately without failover
Network errors that trigger failover include:
- Timeouts
- Connection refused (ECONNREFUSED)
- DNS errors (ENOTFOUND)
- Connection reset (ECONNRESET)
- HTTP 502, 503, 504 errors
API Reference
AptosResilientClient
Constructor
new AptosResilientClient(config: ResilientClientConfig)Methods
getClient(): Aptos
Returns the proxy Aptos client instance that should be used for all operations.
const client = resilientClient.getClient();
const ledgerInfo = await client.getLedgerInfo();getStats(): ClientStats
Returns current statistics about the client.
const stats = resilientClient.getStats();
console.log(stats.activeEndpointUrl); // Currently active endpoint
console.log(stats.totalFailovers); // Total number of failovers
console.log(stats.totalRecoveries); // Total number of recoveries
console.log(stats.endpoints); // Health status of all endpointscheckHealth(): Promise<void>
Manually trigger a health check for all unhealthy endpoints.
await resilientClient.checkHealth();destroy(): void
Stop the health check interval and clean up resources.
resilientClient.destroy();ClientStats
interface ClientStats {
activeEndpointIndex: number; // Index of currently active endpoint
activeEndpointUrl: string; // URL of currently active endpoint
endpoints: EndpointState[]; // State of all endpoints
totalFailovers: number; // Total failovers that have occurred
totalRecoveries: number; // Total recoveries (switches back to higher priority)
}EndpointState
interface EndpointState {
url: string; // The RPC endpoint URL
healthy: boolean; // Whether this endpoint is currently healthy
consecutiveFailures: number; // Number of consecutive failures
lastFailureTime?: number; // Timestamp of last failure
lastSuccessTime?: number; // Timestamp of last successful request
}Examples
Basic Usage
import { AptosResilientClient } from "@thalalabs/aptos-resilient-client";
import { AptosConfig } from "@aptos-labs/ts-sdk";
const resilientClient = new AptosResilientClient({
endpoints: [
new AptosConfig({ fullnode: "https://api.mainnet.aptoslabs.com/v1" }),
new AptosConfig({ fullnode: "https://1rpc.io/aptos/v1" }),
],
});
const client = resilientClient.getClient();
// Fetch ledger info
const ledgerInfo = await client.getLedgerInfo();
console.log("Chain ID:", ledgerInfo.chain_id);
// Get account info
const accountInfo = await client.getAccountInfo({
accountAddress: "0x1"
});
resilientClient.destroy();Monitoring Health
import { AptosConfig } from "@aptos-labs/ts-sdk";
const resilientClient = new AptosResilientClient({
endpoints: [
new AptosConfig({ fullnode: "https://endpoint1.com" }),
new AptosConfig({ fullnode: "https://endpoint2.com" }),
],
healthCheckInterval: 10000, // Check every 10 seconds
});
const client = resilientClient.getClient();
// Monitor stats periodically
setInterval(() => {
const stats = resilientClient.getStats();
console.log("Active:", stats.activeEndpointUrl);
console.log("Failovers:", stats.totalFailovers);
stats.endpoints.forEach(ep => {
console.log(`${ep.url}: ${ep.healthy ? 'healthy' : 'unhealthy'} (${ep.consecutiveFailures} failures)`);
});
}, 5000);Cost Optimization Use Case
import { AptosConfig } from "@aptos-labs/ts-sdk";
// Optimize for cost: use cheap primary, only use expensive backups when necessary
const resilientClient = new AptosResilientClient({
endpoints: [
new AptosConfig({ fullnode: "https://cheap-primary.example.com" }), // Cheapest, use whenever possible
new AptosConfig({ fullnode: "https://expensive-backup.example.com" }), // More expensive, use when primary fails
],
unhealthyThreshold: 3, // Allow 3 failures before giving up on primary
healthCheckInterval: 30000, // Check primary every 30s to switch back ASAP
requestTimeout: 10000, // Don't wait too long on failed endpoints
});Custom Configuration with API Keys
import { AptosConfig, Network } from "@aptos-labs/ts-sdk";
const resilientClient = new AptosResilientClient({
endpoints: [
// Primary: Provider with API key
new AptosConfig({
fullnode: "https://primary.example.com",
network: Network.MAINNET,
clientConfig: {
API_KEY: "your-api-key",
headers: {
"X-Custom-Header": "value"
}
}
}),
// Backup: Another provider with different API key
new AptosConfig({
fullnode: "https://backup.example.com",
network: Network.MAINNET,
clientConfig: {
API_KEY: "your-backup-api-key",
}
}),
// Fallback: Free public endpoint (no API key needed)
new AptosConfig({ fullnode: "https://api.mainnet.aptoslabs.com/v1" }),
],
unhealthyThreshold: 2, // More aggressive - mark unhealthy after 2 failures
healthCheckInterval: 60000, // Check every minute
requestTimeout: 15000, // 15 second timeout
});Development
Build
pnpm buildRun Example
pnpm exampleBest Practices
- Endpoint Priority: List endpoints in order of preference (fastest/most reliable first)
- Timeout Configuration: Set
requestTimeoutbased on your network conditions and requirements - Health Check Interval: Balance between quick recovery and avoiding unnecessary requests
- Cleanup: Always call
destroy()when you're done to clean up the health check interval - Error Handling: Wrap operations in try-catch blocks as you would with the standard Aptos client
License
MIT
