Execution Architecture
Nembl's workflow engine is event-driven. Instead of a single monolithic orchestrator, each workflow concern (phase activation, decisions, subprocess lifecycle, audit, cancel, error) is handled by a dedicated execution function listening to its own event subscription.
This page is a reference for power users and developers debugging execution issues. You don't need to understand this to use Nembl — workflows run automatically. But if something misbehaves, the observability surfaces below are where to look.
High-Level Flow
┌────────────┐ emit ┌──────────────────────┐ match ┌───────────────┐
│ Nembl web │ ────────▶ │ nembl-workflow-bus │ ─────────▶ │ nembl-wf-* │
│ (advance) │ │ (event bus) │ │ handler │
└────────────┘ └──────────────────────┘ └───────┬───────┘
│
┌─────────────────────────────────────────────┤
│ │
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Database │ │ Next event │
│ (update │ │ emitted on │
│ instance │ │ completion │
│ state) │ │ │
└──────────────┘ └──────────────┘The web app writes state changes to PostgreSQL and emits events to the workflow bus. Handler functions subscribe to relevant event types, do their work, and emit follow-up events. This decouples phase logic from the web app and keeps each handler independently deployable.
Handler Functions
12 handlers split by concern. All write structured logs with 30-day retention.
| Handler | Concern |
|---|---|
nembl-phase-executor | Automation handlers — routes to one of: API_CALL, SCRIPT, DATA_TRANSFORM, TIMER, WEBHOOK_CALLOUT (AI agents run in their own provider handlers — see below) |
nembl-wf-start | Initialize a new instance, set current phase to START, emit first activation |
nembl-wf-process | Activate a PROCESS phase: resolve responsible parties, notify, dispatch agents, create task templates |
nembl-wf-decision | Evaluate DECISION conditions, pick outgoing transition, emit advance |
nembl-wf-parallel | Fork / join for PARALLEL phases |
nembl-wf-data | Route to data handler (filter / sort / aggregate / transform / enrich / validate / transpose) |
nembl-wf-subprocess | Launch child instance, wait for completion, propagate outputs back to parent |
nembl-wf-end | Finalize instance: set COMPLETED, record metrics, notify requester |
nembl-wf-cancel | Handle cancel requests: route to Cancel node if present, else terminate; cascade to children on finalization |
nembl-wf-status | Periodic status reconciliation between in-flight workflow state and the database — surfaces stuck instances and emits health metrics |
nembl-wf-audit | Consumes all nembl.workflow.* events, writes audit rows, archives to long-term storage |
nembl-wf-error | Catches handler failures, updates instance status, notifies responsible parties |
nembl-agent-anthropic, nembl-agent-openai, nembl-agent-bedrock | Provider-specific managed-agent handlers. Invoked when an entity has an active managed agent assigned. |
Event Routing
11 routing rules on the nembl-workflow-bus, each pattern-matched to the relevant handler. Common event types:
| Event type | Emitted by | Consumed by |
|---|---|---|
nembl.workflow.start | Web app (request accepted / manual run) | nembl-wf-start |
nembl.workflow.phase.activate | Various handlers on advance | nembl-wf-process / nembl-wf-data / phase-executor |
nembl.workflow.phase.advance | Web app, phase-executor on completion | Next phase handler by phaseType |
nembl.workflow.decision.evaluate | nembl-wf-process | nembl-wf-decision |
nembl.workflow.subprocess.start | nembl-wf-subprocess | nembl-wf-start (for the child) |
nembl.workflow.subprocess.complete | Child's nembl-wf-end | Parent's nembl-wf-subprocess |
nembl.workflow.cancel | Web app | nembl-wf-cancel |
nembl.workflow.cancel.finalize | nembl-wf-cancel when cleanup tasks complete | nembl-wf-cancel, nembl-wf-audit |
nembl.workflow.end | nembl-wf-end | nembl-wf-audit |
nembl.workflow.error | Any handler on failure | nembl-wf-error, nembl-wf-audit |
nembl.notification.send | Responsibility resolver / request intake | Notification dispatcher |
All events carry at minimum companyId, instanceId, and a phase identifier in their payload.
Dead-Letter Queue
Failed event deliveries (after retries exhaust) land on the dead-letter queue nembl-wf-dlq:
- Retention: 14 days
- Visibility timeout: 5 minutes
- Common causes: a handler was throttled, timed out, or threw an unhandled exception after its retry budget was exceeded
Recovery pattern:
- Inspect the DLQ — each message has the original event payload plus failure reason in message attributes
- Fix the underlying handler issue (deploy a fix, raise concurrency limit, etc.)
- Redrive DLQ messages back to the event bus
Long-Term Event Archive
Every instance's event stream is archived to object storage for forensics and replay:
workflow-archive/
<companyId>/
<instanceId>/
graph.json ← snapshot of the workflow graph at instance start
variables.json ← final instance variables snapshot
audit/
<instanceId>/
<eventId>.json ← one object per emitted event- Lifecycle: objects transition to cold storage at 90 days, expire at 365 days
- Access: server-side via Plan Admin ops only; customers don't read directly. The customer-facing slice is the Audit Trail.
Application & Handler Logs
Structured logs are captured per handler and per app server, with 30-day retention:
| Log stream | Content |
|---|---|
nembl-web | Next.js app server (web + API) |
nembl-phase-executor | Automation phase executions |
nembl-wf-* | Each workflow handler |
nembl-agent-* | Provider-specific managed-agent handlers |
Searching: every log entry carries instanceId in a structured field, so log queries like:
filter instanceId = "inst_abc123"
sort @timestamp desc
limit 100return the full lifecycle of a single instance across all handlers.
Debugging Checklist
When an instance misbehaves:
- Find the instance ID from the URL or instance viewer
- Check the audit trail (
Admin → Audit Trail, filter by target=instanceId) — this is the authoritative event log - Inspect the DLQ if the instance stalled unexpectedly; a failed handler will have landed there
- Search application logs across the relevant
nembl-wf-*streams, filtering oninstanceId, to see per-handler execution detail - Long-term archive for point-in-time state reconstruction when the live instance record has already been cleaned up
Why Event-Driven?
- Scale — each concern scales independently; heavy DATA workloads don't starve PROCESS activations
- Isolation — a bug in the subprocess handler doesn't take down phase activation
- Auditability — every state change is an event, captured by the audit handler and archived for forensics
- Replayability — DLQ redrive lets you fix and retry without losing work
The trade-off is more moving parts than a monolithic engine. The above observability is deliberate — debugging event-driven systems requires tool support, and Nembl invests in it.
Related
- Audit Trail — the customer-facing slice of the event stream
- Workflow Execution — user-facing description of how phases run
- Subprocess Workflows — events involved in parent/child lifecycle