Execution Architecture

Nembl's workflow engine is event-driven. Instead of a single monolithic orchestrator, each workflow concern (phase activation, decisions, subprocess lifecycle, audit, cancel, error) is handled by a dedicated execution function listening to its own event subscription.

This page is a reference for power users and developers debugging execution issues. You don't need to understand this to use Nembl — workflows run automatically. But if something misbehaves, the observability surfaces below are where to look.

High-Level Flow

┌────────────┐    emit    ┌──────────────────────┐    match    ┌───────────────┐
│ Nembl web  │ ────────▶  │ nembl-workflow-bus   │ ─────────▶  │ nembl-wf-*    │
│ (advance)  │            │ (event bus)          │             │ handler       │
└────────────┘            └──────────────────────┘             └───────┬───────┘
                                                                       │
                         ┌─────────────────────────────────────────────┤
                         │                                             │
                         ▼                                             ▼
                  ┌──────────────┐                              ┌──────────────┐
                  │ Database     │                              │ Next event   │
                  │ (update      │                              │ emitted on   │
                  │  instance    │                              │ completion   │
                  │  state)      │                              │              │
                  └──────────────┘                              └──────────────┘

The web app writes state changes to PostgreSQL and emits events to the workflow bus. Handler functions subscribe to relevant event types, do their work, and emit follow-up events. This decouples phase logic from the web app and keeps each handler independently deployable.

Handler Functions

12 handlers split by concern. All write structured logs with 30-day retention.

Handler	Concern
`nembl-phase-executor`	Automation handlers — routes to one of: API_CALL, SCRIPT, DATA_TRANSFORM, TIMER, WEBHOOK_CALLOUT (AI agents run in their own provider handlers — see below)
`nembl-wf-start`	Initialize a new instance, set current phase to START, emit first activation
`nembl-wf-process`	Activate a PROCESS phase: resolve responsible parties, notify, dispatch agents, create task templates
`nembl-wf-decision`	Evaluate DECISION conditions, pick outgoing transition, emit advance
`nembl-wf-parallel`	Fork / join for PARALLEL phases
`nembl-wf-data`	Route to data handler (filter / sort / aggregate / transform / enrich / validate / transpose)
`nembl-wf-subprocess`	Launch child instance, wait for completion, propagate outputs back to parent
`nembl-wf-end`	Finalize instance: set COMPLETED, record metrics, notify requester
`nembl-wf-cancel`	Handle cancel requests: route to Cancel node if present, else terminate; cascade to children on finalization
`nembl-wf-status`	Periodic status reconciliation between in-flight workflow state and the database — surfaces stuck instances and emits health metrics
`nembl-wf-audit`	Consumes all `nembl.workflow.*` events, writes audit rows, archives to long-term storage
`nembl-wf-error`	Catches handler failures, updates instance status, notifies responsible parties
`nembl-agent-anthropic`, `nembl-agent-openai`, `nembl-agent-bedrock`	Provider-specific managed-agent handlers. Invoked when an entity has an active managed agent assigned.

Event Routing

11 routing rules on the nembl-workflow-bus, each pattern-matched to the relevant handler. Common event types:

Event type	Emitted by	Consumed by
`nembl.workflow.start`	Web app (request accepted / manual run)	`nembl-wf-start`
`nembl.workflow.phase.activate`	Various handlers on advance	`nembl-wf-process` / `nembl-wf-data` / phase-executor
`nembl.workflow.phase.advance`	Web app, phase-executor on completion	Next phase handler by phaseType
`nembl.workflow.decision.evaluate`	`nembl-wf-process`	`nembl-wf-decision`
`nembl.workflow.subprocess.start`	`nembl-wf-subprocess`	`nembl-wf-start` (for the child)
`nembl.workflow.subprocess.complete`	Child's `nembl-wf-end`	Parent's `nembl-wf-subprocess`
`nembl.workflow.cancel`	Web app	`nembl-wf-cancel`
`nembl.workflow.cancel.finalize`	`nembl-wf-cancel` when cleanup tasks complete	`nembl-wf-cancel`, `nembl-wf-audit`
`nembl.workflow.end`	`nembl-wf-end`	`nembl-wf-audit`
`nembl.workflow.error`	Any handler on failure	`nembl-wf-error`, `nembl-wf-audit`
`nembl.notification.send`	Responsibility resolver / request intake	Notification dispatcher

All events carry at minimum companyId, instanceId, and a phase identifier in their payload.

Dead-Letter Queue

Failed event deliveries (after retries exhaust) land on the dead-letter queue nembl-wf-dlq:

Retention: 14 days
Visibility timeout: 5 minutes
Common causes: a handler was throttled, timed out, or threw an unhandled exception after its retry budget was exceeded

Recovery pattern:

Inspect the DLQ — each message has the original event payload plus failure reason in message attributes
Fix the underlying handler issue (deploy a fix, raise concurrency limit, etc.)
Redrive DLQ messages back to the event bus

Long-Term Event Archive

Every instance's event stream is archived to object storage for forensics and replay:

workflow-archive/
  <companyId>/
    <instanceId>/
      graph.json          ← snapshot of the workflow graph at instance start
      variables.json      ← final instance variables snapshot
    audit/
      <instanceId>/
        <eventId>.json    ← one object per emitted event

Lifecycle: objects transition to cold storage at 90 days, expire at 365 days
Access: server-side via Plan Admin ops only; customers don't read directly. The customer-facing slice is the Audit Trail.

Application & Handler Logs

Structured logs are captured per handler and per app server, with 30-day retention:

Log stream	Content
`nembl-web`	Next.js app server (web + API)
`nembl-phase-executor`	Automation phase executions
`nembl-wf-*`	Each workflow handler
`nembl-agent-*`	Provider-specific managed-agent handlers

Searching: every log entry carries instanceId in a structured field, so log queries like:

filter instanceId = "inst_abc123"
sort @timestamp desc
limit 100

return the full lifecycle of a single instance across all handlers.

Debugging Checklist

When an instance misbehaves:

Find the instance ID from the URL or instance viewer
Check the audit trail (Admin → Audit Trail, filter by target=instanceId) — this is the authoritative event log
Inspect the DLQ if the instance stalled unexpectedly; a failed handler will have landed there
Search application logs across the relevant nembl-wf-* streams, filtering on instanceId, to see per-handler execution detail
Long-term archive for point-in-time state reconstruction when the live instance record has already been cleaned up

Why Event-Driven?

Scale — each concern scales independently; heavy DATA workloads don't starve PROCESS activations
Isolation — a bug in the subprocess handler doesn't take down phase activation
Auditability — every state change is an event, captured by the audit handler and archived for forensics
Replayability — DLQ redrive lets you fix and retry without losing work

The trade-off is more moving parts than a monolithic engine. The above observability is deliberate — debugging event-driven systems requires tool support, and Nembl invests in it.

Audit Trail — the customer-facing slice of the event stream
Workflow Execution — user-facing description of how phases run
Subprocess Workflows — events involved in parent/child lifecycle

Connectors FAQ & Troubleshooting