Nembl
Developer Guide
Execution Architecture

Execution Architecture

Nembl's workflow engine is event-driven. Instead of a single monolithic orchestrator, each workflow concern (phase activation, decisions, subprocess lifecycle, audit, cancel, error) is handled by a dedicated execution function listening to its own event subscription.

This page is a reference for power users and developers debugging execution issues. You don't need to understand this to use Nembl — workflows run automatically. But if something misbehaves, the observability surfaces below are where to look.

High-Level Flow

┌────────────┐    emit    ┌──────────────────────┐    match    ┌───────────────┐
│ Nembl web  │ ────────▶  │ nembl-workflow-bus   │ ─────────▶  │ nembl-wf-*    │
│ (advance)  │            │ (event bus)          │             │ handler       │
└────────────┘            └──────────────────────┘             └───────┬───────┘

                         ┌─────────────────────────────────────────────┤
                         │                                             │
                         ▼                                             ▼
                  ┌──────────────┐                              ┌──────────────┐
                  │ Database     │                              │ Next event   │
                  │ (update      │                              │ emitted on   │
                  │  instance    │                              │ completion   │
                  │  state)      │                              │              │
                  └──────────────┘                              └──────────────┘

The web app writes state changes to PostgreSQL and emits events to the workflow bus. Handler functions subscribe to relevant event types, do their work, and emit follow-up events. This decouples phase logic from the web app and keeps each handler independently deployable.

Handler Functions

12 handlers split by concern. All write structured logs with 30-day retention.

HandlerConcern
nembl-phase-executorAutomation handlers — routes to one of: API_CALL, SCRIPT, DATA_TRANSFORM, TIMER, WEBHOOK_CALLOUT (AI agents run in their own provider handlers — see below)
nembl-wf-startInitialize a new instance, set current phase to START, emit first activation
nembl-wf-processActivate a PROCESS phase: resolve responsible parties, notify, dispatch agents, create task templates
nembl-wf-decisionEvaluate DECISION conditions, pick outgoing transition, emit advance
nembl-wf-parallelFork / join for PARALLEL phases
nembl-wf-dataRoute to data handler (filter / sort / aggregate / transform / enrich / validate / transpose)
nembl-wf-subprocessLaunch child instance, wait for completion, propagate outputs back to parent
nembl-wf-endFinalize instance: set COMPLETED, record metrics, notify requester
nembl-wf-cancelHandle cancel requests: route to Cancel node if present, else terminate; cascade to children on finalization
nembl-wf-statusPeriodic status reconciliation between in-flight workflow state and the database — surfaces stuck instances and emits health metrics
nembl-wf-auditConsumes all nembl.workflow.* events, writes audit rows, archives to long-term storage
nembl-wf-errorCatches handler failures, updates instance status, notifies responsible parties
nembl-agent-anthropic, nembl-agent-openai, nembl-agent-bedrockProvider-specific managed-agent handlers. Invoked when an entity has an active managed agent assigned.

Event Routing

11 routing rules on the nembl-workflow-bus, each pattern-matched to the relevant handler. Common event types:

Event typeEmitted byConsumed by
nembl.workflow.startWeb app (request accepted / manual run)nembl-wf-start
nembl.workflow.phase.activateVarious handlers on advancenembl-wf-process / nembl-wf-data / phase-executor
nembl.workflow.phase.advanceWeb app, phase-executor on completionNext phase handler by phaseType
nembl.workflow.decision.evaluatenembl-wf-processnembl-wf-decision
nembl.workflow.subprocess.startnembl-wf-subprocessnembl-wf-start (for the child)
nembl.workflow.subprocess.completeChild's nembl-wf-endParent's nembl-wf-subprocess
nembl.workflow.cancelWeb appnembl-wf-cancel
nembl.workflow.cancel.finalizenembl-wf-cancel when cleanup tasks completenembl-wf-cancel, nembl-wf-audit
nembl.workflow.endnembl-wf-endnembl-wf-audit
nembl.workflow.errorAny handler on failurenembl-wf-error, nembl-wf-audit
nembl.notification.sendResponsibility resolver / request intakeNotification dispatcher

All events carry at minimum companyId, instanceId, and a phase identifier in their payload.

Dead-Letter Queue

Failed event deliveries (after retries exhaust) land on the dead-letter queue nembl-wf-dlq:

  • Retention: 14 days
  • Visibility timeout: 5 minutes
  • Common causes: a handler was throttled, timed out, or threw an unhandled exception after its retry budget was exceeded

Recovery pattern:

  1. Inspect the DLQ — each message has the original event payload plus failure reason in message attributes
  2. Fix the underlying handler issue (deploy a fix, raise concurrency limit, etc.)
  3. Redrive DLQ messages back to the event bus

Long-Term Event Archive

Every instance's event stream is archived to object storage for forensics and replay:

workflow-archive/
  <companyId>/
    <instanceId>/
      graph.json          ← snapshot of the workflow graph at instance start
      variables.json      ← final instance variables snapshot
    audit/
      <instanceId>/
        <eventId>.json    ← one object per emitted event
  • Lifecycle: objects transition to cold storage at 90 days, expire at 365 days
  • Access: server-side via Plan Admin ops only; customers don't read directly. The customer-facing slice is the Audit Trail.

Application & Handler Logs

Structured logs are captured per handler and per app server, with 30-day retention:

Log streamContent
nembl-webNext.js app server (web + API)
nembl-phase-executorAutomation phase executions
nembl-wf-*Each workflow handler
nembl-agent-*Provider-specific managed-agent handlers

Searching: every log entry carries instanceId in a structured field, so log queries like:

filter instanceId = "inst_abc123"
sort @timestamp desc
limit 100

return the full lifecycle of a single instance across all handlers.

Debugging Checklist

When an instance misbehaves:

  1. Find the instance ID from the URL or instance viewer
  2. Check the audit trail (Admin → Audit Trail, filter by target=instanceId) — this is the authoritative event log
  3. Inspect the DLQ if the instance stalled unexpectedly; a failed handler will have landed there
  4. Search application logs across the relevant nembl-wf-* streams, filtering on instanceId, to see per-handler execution detail
  5. Long-term archive for point-in-time state reconstruction when the live instance record has already been cleaned up

Why Event-Driven?

  • Scale — each concern scales independently; heavy DATA workloads don't starve PROCESS activations
  • Isolation — a bug in the subprocess handler doesn't take down phase activation
  • Auditability — every state change is an event, captured by the audit handler and archived for forensics
  • Replayability — DLQ redrive lets you fix and retry without losing work

The trade-off is more moving parts than a monolithic engine. The above observability is deliberate — debugging event-driven systems requires tool support, and Nembl invests in it.

Related