Skip to main content
View as Markdown

Retry & Failure Handling

Automations are resilient by configuration: retry transient failures with backoff, fence runaway steps with timeouts, recover partially-failed runs idempotently, and route exhausted runs to a failure handler.

Retry Policy

A retry block can be set at the automation level (whole run) or the action level (single step).

Property Description
maxAttempts Maximum retry attempts (1–10). Required.
delayMs Base delay between retries in ms (100–60000, default 1000).
strategy fixed (constant delay, default) or exponential (growing backoff).
automations:
  - name: sync-inventory
    trigger: { type: cron, expression: '0 * * * *' }
    retry: { maxAttempts: 5, delayMs: 2000, strategy: exponential }
    actions:
      - name: pull
        type: http
        operator: get
        props: { url: $env.INVENTORY_API, expectedStatus: [200] }
        retry: { maxAttempts: 3, strategy: fixed } # per-action override

Timeouts

Two independent timeout scopes:

Scope Where Effect
Automation automation.timeout (1000–900000) Caps total run time. On expiry the run is marked timed-out.
Action (uniform) action.timeout (1000–900000) Caps one step; retry/continueOnError still apply.
Action (handler) action.props.timeout Per-type internal fence (e.g. http, code, automation/call).

A timed-out run is distinct from failed so operators can tell runaway duration apart from errors.

Continue on Error

Set continueOnError: true on an action so its failure doesn't abort the run. Subsequent actions still execute and the run finishes as completed-with-errors.

- name: bestEffortLog
  type: analytics
  operator: track
  continueOnError: true
  props: { event: order.created, properties: { id: '{{trigger.data.id}}' } }

Dead Letter & Exhaustion

When a run fails after exhausting all configured maxAttempts, it transitions to the exhausted status (dead-letter). Each attempt is recorded with timestamp, error message, and stack trace. The automation-failure trigger fires after exhaustion, handing the failure handler the full attempt history — enabling centralized failure handling without per-automation notification config.

Partial-Failure Recovery & Idempotency

When a run fails mid-pipeline, completed steps show completed, the failing step shows failed, and downstream steps show skipped. Replaying the run resumes from the failed step, skipping already-completed steps (idempotent resume). Replay always creates a new run, never mutating the original.

  • Runs — run/step statuses, replay, and cancel.
  • Triggersautomation-failure trigger and webhook dedup.
  • Actions Overviewretry, timeout, continueOnError base props.