Retry & Failure Handling
Automations are resilient by configuration: retry transient failures with backoff, fence runaway steps with timeouts, recover partially-failed runs idempotently, and route exhausted runs to a failure handler.
Retry Policy
A retry block can be set at the automation level (whole run) or the action level (single step).
| Property | Description |
|---|---|
maxAttempts |
Maximum retry attempts (1–10). Required. |
delayMs |
Base delay between retries in ms (100–60000, default 1000). |
strategy |
fixed (constant delay, default) or exponential (growing backoff). |
automations:
- name: sync-inventory
trigger: { type: cron, expression: '0 * * * *' }
retry: { maxAttempts: 5, delayMs: 2000, strategy: exponential }
actions:
- name: pull
type: http
operator: get
props: { url: $env.INVENTORY_API, expectedStatus: [200] }
retry: { maxAttempts: 3, strategy: fixed } # per-action override
Timeouts
Two independent timeout scopes:
| Scope | Where | Effect |
|---|---|---|
| Automation | automation.timeout (1000–900000) |
Caps total run time. On expiry the run is marked timed-out. |
| Action (uniform) | action.timeout (1000–900000) |
Caps one step; retry/continueOnError still apply. |
| Action (handler) | action.props.timeout |
Per-type internal fence (e.g. http, code, automation/call). |
A timed-out run is distinct from failed so operators can tell runaway duration apart from errors.
Continue on Error
Set continueOnError: true on an action so its failure doesn't abort the run. Subsequent actions still execute and the run finishes as completed-with-errors.
- name: bestEffortLog
type: analytics
operator: track
continueOnError: true
props: { event: order.created, properties: { id: '{{trigger.data.id}}' } }
Dead Letter & Exhaustion
When a run fails after exhausting all configured maxAttempts, it transitions to the exhausted status (dead-letter). Each attempt is recorded with timestamp, error message, and stack trace. The automation-failure trigger fires after exhaustion, handing the failure handler the full attempt history — enabling centralized failure handling without per-automation notification config.
Partial-Failure Recovery & Idempotency
When a run fails mid-pipeline, completed steps show completed, the failing step shows failed, and downstream steps show skipped. Replaying the run resumes from the failed step, skipping already-completed steps (idempotent resume). Replay always creates a new run, never mutating the original.
Webhook deduplication. Webhook triggers accept a deduplicationKey (a template over the payload) and deduplicationWindow (seconds) to drop duplicate runs from identical payloads. Combined with idempotent resume, this makes at-least-once delivery safe. See Triggers.
Related Pages
- Runs — run/step statuses, replay, and cancel.
- Triggers —
automation-failuretrigger and webhook dedup. - Actions Overview —
retry,timeout,continueOnErrorbase props.