Durability model
This essay is understanding-oriented. It answers one question: how does NetScript make long-running, message-driven work survive a process restart? It builds the mental model — what a saga is, how its state is persisted, where that state physically lives, and how compensation and correlation fit — so you can reason about the running system. It is not a step-by-step guide. When you want to build a durable workflow with your own hands, follow the durable workflow tutorial; for the headline API and ports see
Durable sagas ; for the exact exported symbols see
sagas .
What "durable" actually means here
A plain request handler is ephemeral: it runs, returns, and forgets. If the process restarts halfway through a multi-step business process — provision an account, charge a card, send a welcome email — everything in memory is gone and there is no record of how far you got. Durability is the property that lets a logical workflow outlive any single execution: its progress is written down somewhere external to the process, so a restart, a retry, or a later message can pick up exactly where the last one left off.
NetScript draws the durability boundary at the saga. A saga is a small, named state machine whose state is persisted and whose transitions are driven by messages rather than by a callable returning to its caller. Because the state lives outside the process and the inputs arrive as messages over time, a saga can span minutes, hours, or many process lifetimes — that is what the word durable buys you. This is doctrine axiom A12 in practice: durable workflows are state machines, not long-lived call stacks. See Architecture for how that axiom shapes the whole framework.
The saga builder: a state machine you declare, not wire
A saga is authored with a fluent builder. You declare an id, a durability tier, the shape and
initial value of its state, and one or more message handlers, then call .build() to freeze it into
a definition the runtime can register and run.
// plugins/sagas/<saga>.ts — the scaffolded shape, verbatim core
import { defineSaga, sagaComplete } from '@netscript/plugin-sagas-core';
type State = Readonly<{ status: string; processedAt?: string }>;
type Message = Readonly<{ type: 'UserSettingsCreated'; payload: { userId: string } }>;
export const userOnboardingSaga = defineSaga('user-onboarding')
.durability('t1')
.state<State>({ status: 'pending' })
.on<Message['type'], Message['payload']>(
'UserSettingsCreated',
(saga, message, context) => {
saga.state = {
...saga.state,
status: 'completed',
processedAt: context.now.toISOString(),
};
return [sagaComplete({ messageType: message.type, processedAt: context.now.toISOString() })];
},
)
.build();
export default userOnboardingSaga;
Each call in the chain has a precise job:
| Name | Type | Description |
|---|---|---|
defineSaga(id) |
(id: string) => SagaBuilder |
Opens the builder and names the saga. The id is the registry key and shows up at GET /api/v1/sagas/sagas. |
.durability(tier) |
(tier: 't1' | 't2' | 't3') => SagaBuilder |
Declares the saga-definition durability TIER — how aggressively the runtime should persist this saga. Defaults to t1. This is a property of the definition, distinct from which physical store (kv or prisma) the runtime writes to. |
.state |
|
Declares the persisted state shape and its initial value. This object is what survives across messages and restarts. Must precede any handler. |
.on |
(type, (saga, msg, ctx) => Effect[]) => SagaBuilder |
Registers a transition for one message type. The handler mutates saga.state and returns an array of effects (for example sagaComplete(...)). |
.compensate(type, handler) |
(type, (saga, msg, ctx) => Effect[]) => SagaBuilder |
Registers a compensation handler for a FAILED event type — the first-class rollback hook (see below). Same handler shape as .on(...). |
.build() |
() => SagaDefinition |
Freezes the chain into a SagaDefinition after at least one handler exists. Nothing runs until build() and registration. |
The builder surface is wider than the scaffold sample shows. Alongside the core methods above it
also exposes .correlate(...) (a custom correlation extractor), .concurrency(...) (bounded,
optionally per-message-key concurrency), and .schedule(cron) (a cron schedule on the definition),
plus the two reserved hooks .onSignal(...) and .onQuery(...). Reach for the reference unit when
you need the full surface — this essay sticks to the load-bearing concepts.
The mental model to hold: defineSaga(...).durability(...).state(...).on(...).build() is a
declaration of a persisted state machine. You are not writing imperative control flow that falls
off the end of a callable — you are describing which messages move the workflow and what state
each move leaves behind.
Durability tiers: how hard to persist
.durability(tier) takes one of three tiers — t1, t2, t3 — and defaults to t1. The tier is
the saga definition's contract for how aggressively the runtime should persist it; it is a
separate axis from which physical store the writes land in. Hold the two apart: the tier travels
with the definition, the store backend is chosen once for the whole runtime. The same t1 saga runs
unchanged whether its state is written to Deno KV or to Postgres — you never re-author a saga to
change where it persists.
Where the state physically lives: the durable store
The builder describes what to persist. The durable saga store is where it goes. NetScript ships two interchangeable backends, and the runtime persists every transition to exactly one of them, chosen at startup:
| Name | Type | Description |
|---|---|---|
kv |
KvSagaStore |
Persists saga runtime state to Deno KV. The default scaffold backend — zero external dependencies, ideal for local development and KV-native deployments. |
prisma |
PrismaSagaStore |
Persists saga runtime state through a host-owned Prisma client into your scaffolded relational database (Postgres by default; mysql / mssql / sqlite all work — the store follows your Prisma client, it is not Postgres-specific), across three saga_runtime_* tables. Choose this when you want saga state in your relational store alongside the rest of your data. Requires a Prisma client at construction. |
You select the backend explicitly — it is mandatory, and the runtime throws at startup if
neither source provides it (Saga store backend is required. Set NETSCRIPT_SAGA_STORE=kv|prisma …).
Two equivalent switches:
- Environment:
NETSCRIPT_SAGA_STORE=kvorNETSCRIPT_SAGA_STORE=prisma. - App settings:
sagas.store.backend: "kv" | "prisma".
The composition root is createDurableSagaRuntime(...), exported from the
@netscript/plugin-sagas/runtime subpath. When you ask for the Prisma backend you must hand it a
client; the KV backend can take a Deno.Kv (or open the default):
// composition root — pick the backend once, at startup
import { createDurableSagaRuntime } from '@netscript/plugin-sagas/runtime';
const { runtime, store, dispose } = await createDurableSagaRuntime({
backend: 'prisma', // or 'kv'
prisma: prismaClient, // pass when backend === 'prisma'
});
// `runtime` registers SagaDefinitions and applies messages;
// `store` is the KvSagaStore or PrismaSagaStore instance; `dispose` releases it.
Choosing kv vs prisma
Both backends implement the same SagaStorePort, so the choice is operational, not behavioural:
kv(default). Zero external dependencies — state lives in Deno KV. Lowest-friction for local development and for deployments that are already KV-native. This is what the scaffold uses out of the box, so a fresh project is durable with nothing else running.prisma. Routes the durable write path into your relational database — Postgres by default, ormysql/mssql/sqliteif that is what you scaffolded with--db(the store writes through your Prisma client, so it is not tied to Postgres). Pick it when you already operate that database (the Database & Prisma stack and Aspire bring one up) and want saga state queryable alongside the rest of your relational data — at the cost of a running database and a Prisma client to hand the store at construction.
How the Prisma store maps the runtime
The PrismaSagaStore is a thin delegate over a host-owned Prisma client. The durable write path
spans three saga_runtime_* tables, each capturing one facet of the persisted machine:
| Name | Type | Description |
|---|---|---|
SagaRuntimeState |
saga_runtime_state |
The current persisted state object for an instance — the durable 'position' the runtime rehydrates before the next message. |
SagaRuntimeTransition |
saga_runtime_transition |
The applied transitions, keyed by instance and version, so the durable workflow's history is recorded — not just its latest snapshot. |
SagaRuntimeCorrelation |
saga_runtime_correlation |
The correlation index that maps a saga id plus correlation key back to its instance — how the runtime finds the right state to load. |
This is deliberately distinct from the read-model SagaInstance table (saga_instances) that backs
the listing API. The saga_runtime_* tables are the durability mechanism; SagaInstance is a
projection used to display instances. Choosing the prisma backend opts the durable write path
into Postgres; the kv backend keeps the same logical structure in Deno KV instead.
Compensation: two shapes
If you have used other orchestration frameworks, you may expect a saga to be a list of forward steps each paired with a rollback. NetScript supports compensation in two shapes, and it is worth knowing which one you are looking at.
Shape 1 — compensation as an effect (what the scaffold ships). A saga handler returns an array
of effects. sagaComplete({...}) is one such effect — it signals the workflow reached a terminal,
successful state. In this shape, corrective actions are expressed the same way: as additional
effects a message handler returns, interpreted by the runtime, rather than as a separate rollback
chain registered up front. The scaffolded sample uses exactly this model, which is why its handler's
last line is return [sagaComplete({...})].
Shape 2 — compensation as a first-class builder hook. The SagaBuilder also exposes a
.compensate(eventType, handler) method — "register a compensation handler for a failed event
type". It takes the same (saga, message, context) => Effect[] handler shape as .on(...), but the
runtime invokes it on the failure path for that event rather than the forward path. So if you want
an explicit, named rollback per event type, the hook is there.
Correlation: how one saga instance finds its messages
A saga definition is a template; a running workflow is a saga instance. Many onboarding flows can be in flight at once, so the runtime keys instances by a correlation id and exposes them through the registry API. The registry stores saga metadata in Deno KV, and the API service lists definitions and live instances:
| Name | Type | Description |
|---|---|---|
GET /api/v1/sagas/sagas |
list definitions |
Every registered saga definition and its handled message types — this is where your built saga shows up. |
GET /api/v1/sagas/instances |
list instances |
Live saga instances. Each instance carries its own persisted state. |
GET /api/v1/sagas/instances/{sagaName}/{correlationId} |
single instance |
One instance addressed by saga name plus correlation id — the durable position of one workflow. |
POST /api/v1/sagas/publish |
publish a message |
Hand a message to the saga runtime, which routes it to the matching handler on the correct instance. |
GET /health/live |
liveness |
The sagas service liveness probe. |
The thing to internalize: a saga's identity (id, set via the builder) plus a correlation id is what
lets the durable state survive being put down and picked back up. The runtime does not keep your
workflow in memory; it looks the instance up by correlation, loads its state from the configured
store, applies the message, and writes the new state back. By default the correlation key comes from
the message; .correlate(...) lets a definition extract it differently.
The worked example: three capabilities, one durable workflow
Durability becomes interesting when capabilities compose. The scaffold ships exactly this choreography, and it is the same continuous app the tutorials build rung by rung. Follow one user-onboarding flow through three plugins:
Inbound HTTP Background job Durable saga
(triggers :8093) (workers :8091) (sagas :8092)
──────────────── ──────────────── ────────────────
POST /api/v1/webhooks ───▶ create-user-settings ───▶ user-onboarding
/inbound/generic job runs: saga handles:
│ publishes 'UserSettingsCreated'
enqueueJob(jobRef) UserSettingsCreated │
│ │ returns
▼ ▼ [ sagaComplete({...}) ]
worker job enqueued saga message published workflow terminal
(state persisted to
kv | prisma store)
Step by step, in the real code:
- A trigger turns an inbound webhook into a job. The triggers plugin exposes raw Hono routes
(not oRPC). The webhook handler returns an array of
enqueueJob(jobRef, { payload, priority })effects, so aPOSTto:8093/api/v1/webhooks/inbound/genericenqueues a worker job — the ingress of the durable flow. (enqueueJobis the one live trigger action; thedeferaction is defined but unsupported — it throws and routes to the DLQ, so do not build on deferred replay.) See Triggers & ingress . - A worker job publishes the saga message. The workers plugin's
create-user-settingssample is an ordinarydefineJobHandler(async (ctx) => ...)that, on success, publishes theUserSettingsCreatedmessage via a saga publisher. The job is the unit of work; publishing the message is how it hands control to the durable layer. See Background jobs . - A saga consumes the message and emits
sagaComplete. The sagas plugin registersuserOnboardingSaga, whose.on('UserSettingsCreated', ...)handler mutatessaga.stateand returns[ sagaComplete({...}) ]. That terminal effect is the durable workflow finishing — and the new state is written to whichever store (kvorprisma) the runtime was configured with.
// workers/jobs/create-user-settings.ts — the publish step, verbatim core
import { createSagaPublisher } from '@netscript/plugin-sagas/runtime';
import { createSuccessResult, defineJobHandler } from '@netscript/plugin-workers-core';
import { z } from 'zod';
type UserRegistrationMessage = {
type: 'UserSettingsCreated';
payload: { userId: string };
};
const CreateUserSettingsPayloadSchema = z.object({ userId: z.string().min(1) });
const sagaPublisher = createSagaPublisher<UserRegistrationMessage>();
const handler = defineJobHandler(async (ctx) => {
const { userId } = CreateUserSettingsPayloadSchema.parse(ctx.payload ?? {});
await sagaPublisher.publish({ type: 'UserSettingsCreated', payload: { userId } });
return createSuccessResult({ userId, settingsCreated: true, source: 'scaffold-sample' });
});
export default Object.assign(handler, { id: 'create-user-settings' });
What a crash actually preserves
Concretely, when the process dies mid-workflow:
- Persisted (survives): the instance's
saga.state(itsSagaRuntimeState/ KV equivalent), the applied transitions, and the correlation index. On restart the runtime can find the instance by correlation and rehydrate its exact position. - Not preserved (by design): anything held only in handler-local variables or in-process memory.
That is the whole point of moving state onto
saga.state— local closures are expected to be lost, and the durable store is what stands in for them. - Idempotency: the runtime carries an applied-key boundary (
SagaAppliedKeyStore) so that a redelivered message does not double-apply a transition — important because at-least-once delivery means the same message can arrive twice across a restart.
Limitations (alpha)
The durability story is real but young, and a few edges are worth naming so you do not over-trust it:
- Reserved builder hooks.
.onSignal(...)and.onQuery(...)compile and register, but their runtime dispatch is explicitly deferred. Treat them as forward-declared surface, not live features. - Trigger
deferis unsupported. Thedefertrigger action throws and routes to the DLQ — onlyenqueueJobis a live ingress for durable flows today. - Two stores, same port, different operational maturity.
kvis the path the scaffold and tests exercise most;prismais real and tested but assumes you bring and manage the Postgres + Prisma client yourself.
None of these undermine the core guarantee — a built defineSaga(...) survives restarts on either
store — but they shape what you should and should not design around right now.
Why the model looks like this — the design trade-offs
The message-and-state shape is a deliberate set of trade-offs:
- Persisted state over in-memory closures. Putting the workflow's position on
saga.stateand persisting it to a durable store is what survives restarts. The cost is that you think in transitions, not in straight-line code; the benefit is correctness across process lifetimes. - A pluggable store over a hard-wired backend. Pulling the persistence behind a single
KvSagaStore/PrismaSagaStoreseam — selected by one explicit setting — lets the same saga run on Deno KV in development and on Postgres in production without touching the workflow. The cost is the one mandatory choice at startup; the benefit is that durability is a deployment decision, not a code rewrite. - Effects over imperative side-effects. Returning
sagaComplete({...})(and other effects) instead of calling out directly keeps handlers replayable. The runtime decides when and how effects apply, which is what makes retries and compensation tractable. - Composition over a workflow monolith. Trigger → job → saga are three small capabilities wired by messages. Each can be added, tested, and scaled independently, and the
plugin model is what lets a host assemble them without editing host code.
How this connects to the rest of NetScript
A durable workflow is not an island. It rides on top of the same primitives you have already met or will meet:
- It is delivered as plugins — workers, sagas, triggers — through the plugin model's contribution/registry mechanism.
- Its messages and effects are observable: job dispatch and execution emit real OpenTelemetry
spans that show up in Aspire automatically. See observability for
the full picture — including the one known gap (the scaffold
createJobTools(ctx)handler helpers are no-op stubs, a tracked limitation; call@netscript/telemetryhelpers directly for custom spans). - Its API surfaces (
:8091,:8092,:8093) and the Postgres/Redis backing store — including the Postgres tables behind theprismasaga store — are brought up by Aspire. Remember the ordering:cd aspire && aspire startis what makes the durable infrastructure available before anynetscript dbcommand. See KV, queues & cron for the KV/queue primitives the runtime leans on, and Durable streams for the streaming counterpart to message-driven sagas.
Where to go next
- Do it: Build a durable workflow — the hands-on tutorial that
adds the saga and consumes
UserSettingsCreatedend to end. - See the capability:
Durable sagas
— the headline
defineSagaAPI, the:8092endpoints, thekv | prismastore switch, and the Learn / Do / Reference triplet. - Look it up: sagas for the full generated API surface, and
queue for the queue layer underneath message delivery.