@cheapestinference/openclaw-ratelimit-retry

v1.0.0

Published

4 months ago

Automatically retry agent conversations that fail due to provider rate limits

0High
0Medium
0Low

cheapestinference

openclaw plugin retry rate-limit 429 budget ratelimit

ratelimit-retry

An OpenClaw plugin that automatically retries agent conversations killed by provider rate limits.

Problem

When your LLM provider hits a rate limit or budget cap (HTTP 429), every running agent task dies mid-conversation. Nothing resumes them. If you close the dashboard, those conversations are gone. You have to manually find and re-trigger each one after the budget resets.

Solution

This plugin hooks into OpenClaw's agent_end event, detects retriable errors (429s, rate limits, budget exhaustion), and parks the failed session in a persistent queue on disk. A background service waits for the provider's budget window to reset, then sends chat.send to the original session -- resuming the conversation with its full transcript context, as if the user had typed a message.

Installation

openclaw plugins install @cheapestinference/openclaw-ratelimit-retry

Or copy manually to your extensions directory:

cp -r openclaw-plugin-ratelimit-retry ~/.openclaw/extensions/ratelimit-retry

Enable it in OpenClaw config:

openclaw config set plugins.ratelimit-retry.budgetWindowHours 5
openclaw config set plugins.ratelimit-retry.maxRetryAttempts 3

No npm install needed. The plugin has zero runtime dependencies.

Complete example

# ~/.openclaw/config.yaml
plugins:
  ratelimit-retry:
    budgetWindowHours: 5
    maxRetryAttempts: 3
    checkIntervalMinutes: 5
    retryMessage: "Continue where you left off. The previous attempt failed due to a rate limit that has now reset."

How It Works

Agent run fails (429)
  |
  v
agent_end hook fires
  |-- Non-retriable error? --> ignore
  |-- Retriable error?     --> queue to disk
                                 |
                                 v
                  Background timer (every 5 min)
                    |
                    |-- Budget window not reset? --> wait
                    |-- Budget window reset?     --> chat.send to session
                                                       |
                                                       |--> Ack received: wait for result
                                                       |     |--> agent_end success: remove from queue
                                                       |     |--> agent_end 429: re-queued automatically
                                                       |--> Send failed: wait for next window

The retry uses chat.send with the original sessionKey, which means the gateway loads the complete JSONL transcript and the agent resumes with full context. This is equivalent to the user typing a message in the chat.

The model is fire-and-forget with re-detection: chat.send returns an immediate ack ({ ok, runId, status: "started" }), not the final result. If the retried run fails again with a 429, the agent_end hook fires again and the session is re-queued with an incremented attempt counter. This loop continues until the retry succeeds or maxRetryAttempts is reached.

Configuration

| Option | Type | Default | Description | |--------|------|---------|-------------| | budgetWindowHours | number | 5 | Budget reset window in hours, aligned to UTC clock boundaries | | maxRetryAttempts | number | 3 | Max retries per session before abandoning | | checkIntervalMinutes | number | 5 | How often the background service checks for pending retries | | retryMessage | string | "Continue where you left off..." | Message sent to the session to resume the conversation |

How the Retry Timing Works

Many LLM providers (including LiteLLM) reset budget counters on fixed UTC-aligned windows. With a 5-hour window, the boundaries are:

00:00  05:00  10:00  15:00  20:00  (next day) 00:00
  |------|------|------|------|------|

When an error is queued, the plugin calculates the next boundary after the current time and adds a 1-minute margin (retries at HH:01:00 instead of HH:00:00) to avoid racing the provider's reset.

When 24 is not evenly divisible by windowHours: the math still works. If windowHours is 7, boundaries fall at 0, 7, 14, 21, and the next one would be 28 -- which overflows to 04:00 the next day. The plugin handles day overflow correctly.

Error Classification

Non-retriable patterns are checked first. If an error matches a non-retriable pattern, it is never retried, even if it also matches a retriable pattern.

Retriable (queued for retry)

| Pattern | Catches | |---------|---------| | 429 | "Error code: 429 - ..." | | rate limit, rate_limit | "RateLimitError: ..." | | too many requests | HTTP 429 reason phrases | | budget | "Budget exceeded for ..." | | quota exceeded | Provider quota messages | | resource exhausted | gRPC-style exhaustion errors | | tokens per minute, tpm | TPM limit messages |

Non-retriable (ignored)

| Pattern | Reason | |---------|--------| | 401, 402, 403, 404 | HTTP client errors -- won't succeed on retry | | invalid api key, unauthorized | Auth errors -- fix your credentials | | invalid request, malformed | Bad request format -- won't succeed on retry | | model not found | Model doesn't exist | | context length, prompt too large | Context overflow -- message is too long | | insufficient credits | Billing issue -- requires user action |

Edge Cases

Server restarts: the queue is persisted to {stateDir}/ratelimit-retry/queue.json and reloaded on startup.
Same session errors multiple times: deduplicated by sessionKey. The existing entry is updated with incremented attempts and a recalculated retryAfter.
Retry fails with 429 again: agent_end fires again, re-queuing with incremented attempts. Natural loop until success or maxRetryAttempts.
Gateway unreachable during retry: connection error is caught, entry's retryAfter is pushed to the next budget window to avoid hammering a down gateway every tick.
Max attempts exceeded: entry is removed from queue and a warning is logged.
Sub-agent sessions: handled identically -- sessionKey format agent:X:subagent:Y works the same way.
Timer fires during active retry: a retryInProgress guard prevents overlapping batches.
Queue file corrupted: JSON parse errors are caught; service starts with an empty queue and logs a warning.
Queue overflow: capped at 100 entries. Oldest entries are evicted when full.
Atomic writes: queue is written to a uniquely-named .tmp file first, then renamed, to prevent corruption on crashes or concurrent writes.

Limitations

Fire-and-forget window: after chat.send returns its ack, there is a brief period where the retried run is in progress. If it fails with 429 again immediately, there is a small window before the agent_end hook fires and re-queues it. This is by design -- the re-detection loop handles it.
chat.send requires a non-empty message: the retry always sends the configured retryMessage. It cannot send an empty message to silently resume.
No partial-run recovery: the plugin resumes the conversation from the last completed turn. It does not replay partial streaming output that was interrupted.
Single-instance only: the queue is a local JSON file with no locking. Running multiple OpenClaw instances sharing the same ~/.openclaw/ directory is not supported.
No backpressure on the provider: the plugin retries all ready sessions in sequence. If you have many queued sessions, they all fire at the start of the next window.

License

MIT

Contributing

Contributions are welcome. Please open an issue first to discuss what you would like to change.

git clone https://github.com/cheapestinference/openclaw-plugin-ratelimit-retry
cd openclaw-plugin-ratelimit-retry
# No build step. OpenClaw loads .ts files directly via Jiti.