Retry & Failure Handling

Automations are resilient by configuration: retry transient failures with backoff, fence runaway steps with timeouts, recover partially-failed runs idempotently, and route exhausted runs to a failure handler.

Retry Policy

A retry block can be set at the automation level (whole run) or the action level (single step).

Property	Description
`maxAttempts`	Maximum retry attempts (1–10). Required.
`delayMs`	Base delay between retries in ms (100–60000, default 1000).
`strategy`	`fixed` (constant delay, default) or `exponential` (growing backoff).

automations:
  - name: sync-inventory
    trigger: { type: cron, expression: '0 * * * *' }
    retry: { maxAttempts: 5, delayMs: 2000, strategy: exponential }
    actions:
      - name: pull
        type: http
        operator: get
        props: { url: $env.INVENTORY_API, expectedStatus: [200] }
        retry: { maxAttempts: 3, strategy: fixed } # per-action override

Timeouts

Two independent timeout scopes:

Scope	Where	Effect
Automation	`automation.timeout` (1000–900000)	Caps total run time. On expiry the run is marked `timed-out`.
Action (uniform)	`action.timeout` (1000–900000)	Caps one step; `retry`/`continueOnError` still apply.
Action (handler)	`action.props.timeout`	Per-type internal fence (e.g. `http`, `code`, `automation/call`).

A timed-out run is distinct from failed so operators can tell runaway duration apart from errors.

Continue on Error

Set continueOnError: true on an action so its failure doesn't abort the run. Subsequent actions still execute and the run finishes as completed-with-errors.

- name: bestEffortLog
  type: analytics
  operator: track
  continueOnError: true
  props: { event: order.created, properties: { id: '{{trigger.data.id}}' } }

Dead Letter & Exhaustion

When a run fails after exhausting all configured maxAttempts, it transitions to the exhausted status (dead-letter). Each attempt is recorded with timestamp, error message, and stack trace. The automation-failure trigger fires after exhaustion, handing the failure handler the full attempt history — enabling centralized failure handling without per-automation notification config.

Partial-Failure Recovery & Idempotency

When a run fails mid-pipeline, completed steps show completed, the failing step shows failed, and downstream steps show skipped. Replaying the run resumes from the failed step, skipping already-completed steps (idempotent resume). Replay always creates a new run, never mutating the original.

Webhook deduplication. Webhook triggers accept a deduplicationKey (a template over the payload) and deduplicationWindow (seconds) to drop duplicate runs from identical payloads. Combined with idempotent resume, this makes at-least-once delivery safe. See Triggers.

Runs — run/step statuses, replay, and cancel.
Triggers — automation-failure trigger and webhook dedup.
Actions Overview — retry, timeout, continueOnError base props.

← PreviousRuns Next →Auth Actions