The Saga Blueprint — Orchestrator, State Machine, and Compensation Logic

We build the orchestrator that the stuck order needed. A state machine, a compensation registry, and the outbox pattern holding it all together.

Mar 12, 2026

public enum OrderSagaState
{
    NotStarted,
    InventoryReserved,
    PaymentCharged,
    Completed,
    Compensating,
    Failed
}

Six states. That's all it took to describe the entire lifecycle of a distributed order. When I first wrote this enum, it felt too simple — six values to govern a process that touches three services, handles money, manages inventory, and needs to undo itself when things go wrong.

It wasn't too simple. It was exactly right. And the reason has almost nothing to do with software.

The state machine underneath everything

In 1943, Warren McCulloch and Walter Pitts published a paper on neural networks that accidentally gave us formal automata theory. Their work, refined by Edward Moore and George Mealy in the 1950s, formalized an idea that every engineer uses intuitively: a system can be described as a finite set of states, a set of inputs, and a function that maps each (state, input) pair to a next state.

That's a finite state machine. And it's exactly what a saga orchestrator is.

Think about our order saga. The system starts in NotStarted. It receives an input — "inventory reserved successfully" — and transitions to InventoryReserved. Another input — "payment charged" — transitions it to PaymentCharged. One more — "order confirmed" — and it reaches Completed.

The formal representation matters because it makes the impossible states visible. In automata theory, a well-defined state machine has no undefined transitions. For every state and every possible input, there's a defined next state. If you can't define what happens when state X receives input Y, you've found a bug before you've written any code.

The stuck order from the previous chapter was exactly this: a state machine with an undefined transition. The choreographed system didn't have a mapping for (AwaitingInventory, PaymentConfirmed). The input arrived, no transition existed, and the system froze.

A saga orchestrator is a finite state machine where every transition is explicit, every failure has a compensating path, and the current state is always persisted.

An orchestrator eliminates this class of bug by construction. You define every state, every valid input per state, and every transition — including the compensating ones. If you miss a case, you see the gap in the code. It's not hidden across three services' event handlers.

The pieces we need to build

Before writing any implementation code, let me lay out the components. I've seen more saga implementations go wrong at the design stage than at runtime — the code is usually fine, it's the missing responsibilities that bite you. A saga orchestrator has four responsibilities:

1. State persistence. The orchestrator's current state must survive process restarts. If the service crashes between sending "charge payment" and receiving "payment charged," it needs to recover and pick up where it left off.

2. Step definition. Each saga step is a pair: an action and its compensating action. Reserve inventory / release inventory. Charge payment / refund payment. The orchestrator walks forward through actions and, on failure, walks backward through compensations.

3. Message correlation. When a response arrives from the Payment service, the orchestrator needs to find the correct saga instance. This is the CorrelationId — a unique identifier that threads through every message in the saga's lifecycle.

4. Outbox integration. State changes and outgoing messages must be atomic. If we save "state = InventoryReserved" and send "ChargePayment" as two separate operations, a crash between them means the state says we've moved forward but the command was never sent. The outbox pattern from earlier in the series solves this — we write both the state change and the outgoing message in a single database transaction.

Defining a saga step

Let's start with the abstraction. A saga step needs to know two things: what command to send when moving forward, and what command to send when compensating.

public interface ISagaStep<TSagaState> where TSagaState : Enum
{
    TSagaState SourceState { get; }
    TSagaState TargetState { get; }
    TSagaState CompensationTargetState { get; }
    object CreateCommand(SagaContext context);
    object CreateCompensation(SagaContext context);
}

The SagaContext carries the data the step needs — the order ID, the items, the payment details. The step doesn't know about the database or the message bus. It's a pure definition: "From this state, send this command, and if it succeeds, move to that state."

Here's the inventory reservation step:

public class ReserveInventoryStep : ISagaStep<OrderSagaState>
{
    public OrderSagaState SourceState => OrderSagaState.NotStarted;
    public OrderSagaState TargetState => OrderSagaState.InventoryReserved;
    public OrderSagaState CompensationTargetState => OrderSagaState.NotStarted;

    public object CreateCommand(SagaContext context) =>
        new ReserveInventory(context.CorrelationId, context.Items);

    public object CreateCompensation(SagaContext context) =>
        new ReleaseInventory(context.CorrelationId);
}

And payment:

public class ChargePaymentStep : ISagaStep<OrderSagaState>
{
    public OrderSagaState SourceState => OrderSagaState.InventoryReserved;
    public OrderSagaState TargetState => OrderSagaState.PaymentCharged;
    public OrderSagaState CompensationTargetState => OrderSagaState.InventoryReserved;

    public object CreateCommand(SagaContext context) =>
        new ChargePayment(context.CorrelationId, context.OrderId, context.TotalAmount);

    public object CreateCompensation(SagaContext context) =>
        new RefundPayment(context.CorrelationId, context.OrderId);
}

Notice the CompensationTargetState. When we refund a payment, the saga doesn't go back to NotStarted — it goes back to InventoryReserved, because the inventory is still reserved and needs its own compensation. The compensation chain runs in reverse order: undo payment, then undo inventory.

The orchestrator engine

The orchestrator itself is the machine that walks through steps. It loads the saga's current state, determines the next step, dispatches the command, and persists the transition — all within a single transaction that includes the outbox.

public class SagaOrchestrator<TState> where TState : Enum
{
    private readonly List<ISagaStep<TState>> _steps;
    private readonly ISagaRepository _repository;
    private readonly IOutbox _outbox;

    public SagaOrchestrator(
        IEnumerable<ISagaStep<TState>> steps,
        ISagaRepository repository,
        IOutbox outbox)
    {
        _steps = steps.OrderBy(s => s.SourceState).ToList();
        _repository = repository;
        _outbox = outbox;
    }

    public async Task Start(SagaContext context)
    {
        var saga = new SagaInstance
        {
            CorrelationId = context.CorrelationId,
            CurrentState = default(TState)!.ToString(),
            StartedAt = DateTime.UtcNow,
            Context = JsonSerializer.Serialize(context)
        };

        var firstStep = _steps.First(s =>
            s.SourceState.Equals(default(TState)));

        await _repository.ExecuteInTransaction(async () =>
        {
            await _repository.Save(saga);
            await _outbox.Send(firstStep.CreateCommand(context));
        });
    }
}

The critical line is ExecuteInTransaction. The saga creation and the first command go into the database in a single transaction. If the process crashes after the transaction commits, the outbox publisher will deliver the command. If it crashes before, neither the saga nor the command exist — clean rollback.

Handling responses

When a service completes its work, it publishes a response event. The orchestrator needs a handler that:

Loads the saga by CorrelationId
Validates that the response matches the expected step
Transitions to the next state
Sends the next command (or completes the saga)

public async Task HandleStepCompleted(
    Guid correlationId, TState completedState)
{
    var saga = await _repository.Load(correlationId);
    var context = JsonSerializer
        .Deserialize<SagaContext>(saga.Context)!;

    var nextStep = _steps.FirstOrDefault(s =>
        s.SourceState.Equals(completedState));

    await _repository.ExecuteInTransaction(async () =>
    {
        if (nextStep is not null)
        {
            saga.CurrentState = nextStep.TargetState.ToString();
            await _repository.Save(saga);
            await _outbox.Send(nextStep.CreateCommand(context));
        }
        else
        {
            saga.CurrentState = "Completed";
            saga.CompletedAt = DateTime.UtcNow;
            await _repository.Save(saga);
        }
    });
}

And the failure handler — the one that triggers compensation:

public async Task HandleStepFailed(
    Guid correlationId, TState failedAtState, string reason)
{
    var saga = await _repository.Load(correlationId);
    var context = JsonSerializer
        .Deserialize<SagaContext>(saga.Context)!;

    saga.CurrentState = "Compensating";
    saga.FailureReason = reason;

    // Find all completed steps that need compensation
    var completedSteps = _steps
        .Where(s => s.TargetState.CompareTo(failedAtState) <= 0
                  && !s.SourceState.Equals(default(TState)))
        .Reverse()
        .ToList();

    await _repository.ExecuteInTransaction(async () =>
    {
        await _repository.Save(saga);

        // Queue all compensations through the outbox
        foreach (var step in completedSteps)
        {
            await _outbox.Send(step.CreateCompensation(context));
        }
    });
}

Two design decisions worth pausing on.

First, all compensating commands go into the outbox in a single transaction. This means either all compensations are queued or none are. We don't risk sending "refund payment" without also sending "release inventory."

Second, the compensations are queued, not executed synchronously. The outbox publisher will deliver them. Each participating service handles its compensation independently. If the Inventory service is temporarily down when the compensation arrives, the outbox will retry — exactly the same guarantee we built for the outbox pattern.

The persistence model

The saga instance table is simple:

public class SagaInstance
{
    public Guid CorrelationId { get; set; }
    public string CurrentState { get; set; } = null!;
    public DateTime StartedAt { get; set; }
    public DateTime? CompletedAt { get; set; }
    public string? FailureReason { get; set; }
    public string Context { get; set; } = null!;
    public int Version { get; set; }
}

The Version field is for optimistic concurrency. If two messages arrive simultaneously for the same saga (a race condition we'll stress-test in a later chapter), the second write will fail with a concurrency exception. The message goes back to the queue and retries — finding the saga in its updated state.

The EF Core mapping:

public class SagaInstanceConfiguration
    : IEntityTypeConfiguration<SagaInstance>
{
    public void Configure(EntityTypeBuilder<SagaInstance> builder)
    {
        builder.HasKey(s => s.CorrelationId);
        builder.Property(s => s.CurrentState).HasMaxLength(50);
        builder.Property(s => s.Context).HasColumnType("jsonb");
        builder.Property(s => s.Version).IsConcurrencyToken();

        builder.HasIndex(s => s.CurrentState)
            .HasFilter("\"CompletedAt\" IS NULL");
    }
}

The filtered index is worth noting. We only need to query active sagas — the ones that haven't completed. Once a saga reaches Completed or Failed, it's historical data. The filtered index keeps queries fast as the table grows.

Where the outbox connects

Here's the piece that ties the outbox pattern to the saga. The saga orchestrator's ExecuteInTransaction method writes to both the saga table and the outbox table in the same database transaction:

public async Task ExecuteInTransaction(Func<Task> action)
{
    await using var transaction = await _dbContext.Database
        .BeginTransactionAsync(IsolationLevel.ReadCommitted);

    try
    {
        await action();
        await _dbContext.SaveChangesAsync();
        await transaction.CommitAsync();
    }
    catch
    {
        await transaction.RollbackAsync();
        throw;
    }
}

The outbox Send method doesn't actually send anything — it writes a row to the OutboxMessages table. The outbox publisher (the polling background service we built for the outbox) picks up these messages and delivers them to RabbitMQ.

This means the entire saga transition — state change plus outgoing command — is a single atomic database operation. The "save state then send message" problem that plagues naive implementations doesn't exist. The outbox is the bridge.

The transition map

Let me draw the complete state machine for our order saga. Every arrow is a defined transition. Every state has at least one exit path.

Saga state machine: NotStarted → InventoryReserved → PaymentCharged → Completed, with compensation paths on failure

Six states. Four forward transitions. Two backward paths. Every input at every state has a defined outcome. In Moore's formal model, this is a complete machine — no undefined transitions, no dead states.

That completeness is the point. The choreographed version from the previous chapter's stuck order had gaps in its transition function. This version doesn't, because the orchestrator forces you to define every path.

What this doesn't handle yet

I want to be honest about the edges we haven't covered. This blueprint handles the happy path and the simple failure path — one step fails, we compensate everything in reverse.

But what happens when the compensation itself fails? What if RefundPayment times out? What if ReleaseInventory succeeds but we can't verify it? What about timeouts — how long should the orchestrator wait for a response before assuming failure?

Those are problems for the next chapter. We're going to break this orchestrator in every way I've seen go wrong in production. Compensation failures, message reordering, timeout ambiguity, and the particularly nasty case where a step partially succeeds.

For now, here's something to try with the code above. Take the HandleStepFailed method and trace what happens if the saga receives a PaymentFailed event while it's already in the Compensating state — because the inventory compensation was already triggered by a timeout. What should the orchestrator do? Should it ignore the duplicate failure? Should it log it? Should it track which compensations have completed?

Write the transition for that case. The gap you find is the first thing we'll fix in the next chapter.

Smartly Academy

Discussion about this post

Ready for more?