← All articles
INFRASTRUCTURE Temporal.io: Durable Workflow Orchestration for Micr... 2026-02-15 · 8 min read · temporal · workflow · orchestration

Temporal.io: Durable Workflow Orchestration for Microservices

Infrastructure 2026-02-15 · 8 min read temporal workflow orchestration microservices distributed-systems

Temporal.io: Durable Workflow Orchestration for Microservices

Distributed systems fail. Networks drop, services crash, deployments restart processes, and databases hit timeouts. The question isn't whether your multi-step business process will encounter a failure -- it's what happens when it does.

Most teams cobble together retry logic, dead letter queues, cron-based recovery jobs, and state machines stored in databases. It works until it doesn't. Temporal.io takes a different approach: it lets you write workflows as plain code -- loops, conditionals, function calls -- and guarantees that the workflow will run to completion even if the underlying infrastructure fails.

Temporal.io workflow orchestration platform

What Temporal Actually Does

Temporal is a workflow engine. You write a workflow function that describes a business process step by step. Temporal runs that function and persists its state at every step. If the process hosting your workflow crashes, Temporal replays the workflow from its event history and picks up exactly where it left off.

This is fundamentally different from a message queue or a state machine. You're not defining transitions between states -- you're writing normal code that happens to be durable.

Here's a simple order processing workflow in TypeScript:

import { proxyActivities, sleep } from '@temporalio/workflow';
import type * as activities from './activities';

const { chargePayment, reserveInventory, sendConfirmation, shipOrder } =
  proxyActivities<typeof activities>({
    startToCloseTimeout: '30 seconds',
    retry: {
      maximumAttempts: 5,
      backoffCoefficient: 2,
    },
  });

export async function orderWorkflow(order: Order): Promise<OrderResult> {
  // Step 1: Reserve inventory
  const reservation = await reserveInventory(order.items);

  // Step 2: Charge payment
  const payment = await chargePayment(order.paymentMethod, order.total);

  // Step 3: Send confirmation email
  await sendConfirmation(order.customerEmail, reservation, payment);

  // Step 4: Wait for warehouse to process (could be hours or days)
  await sleep('2 hours');

  // Step 5: Ship the order
  const tracking = await shipOrder(reservation.id);

  return { status: 'completed', trackingNumber: tracking.number };
}

That looks like a normal async function. But if the process crashes after step 2 completes and before step 3 starts, Temporal will restart the workflow, replay steps 1 and 2 from history (without re-executing them), and then continue from step 3. The sleep('2 hours') is durable too -- Temporal persists the timer and wakes the workflow up later, even if the worker process restarts in the meantime.

Core Concepts

Understanding Temporal requires four concepts: workflows, activities, workers, and the Temporal server.

Workflows

A workflow is a function that orchestrates your business logic. Workflows must be deterministic -- given the same inputs and history, they must produce the same sequence of commands. This means no random numbers, no reading the current time directly, no network calls inside the workflow itself.

Constraints that apply inside workflow code:

This sounds restrictive, but it's the key to durability. The workflow function is essentially a state machine that Temporal can replay from event history.

Activities

Activities are where the real work happens. They're normal functions that can do anything -- call APIs, read databases, send emails, process files. Activities run outside the deterministic workflow sandbox.

// activities.ts -- these are normal functions, no restrictions
import { stripe } from './stripe-client';
import { db } from './database';

export async function chargePayment(
  method: PaymentMethod,
  amount: number
): Promise<PaymentResult> {
  const charge = await stripe.charges.create({
    amount: Math.round(amount * 100),
    currency: 'usd',
    source: method.token,
  });
  await db.payments.insert({
    chargeId: charge.id,
    amount,
    status: 'completed',
  });
  return { chargeId: charge.id, amount };
}

export async function reserveInventory(
  items: OrderItem[]
): Promise<Reservation> {
  // Real database operations, API calls, etc.
  const reservation = await db.inventory.reserve(items);
  return reservation;
}

Activities have configurable timeouts and retry policies. Temporal handles retrying failed activities automatically with exponential backoff.

Workers

Workers are the processes that actually execute your workflows and activities. You run workers on your own infrastructure -- they poll the Temporal server for tasks, execute workflow/activity code, and report results back.

// worker.ts
import { Worker } from '@temporalio/worker';
import * as activities from './activities';

async function run() {
  const worker = await Worker.create({
    workflowsPath: require.resolve('./workflows'),
    activities,
    taskQueue: 'order-processing',
  });
  await worker.run();
}

run().catch(console.error);

You can scale workers horizontally -- run as many as you need. Temporal distributes work across them automatically.

Temporal Server

The Temporal server (or Temporal Cloud, the managed offering) is the orchestration backend. It stores workflow event histories, manages task queues, handles timers, and coordinates workers. You can self-host it or use Temporal Cloud.

When to Use Temporal (and When Not To)

Temporal solves specific problems well. It's not the right tool for everything.

Good fits:

Use Case Why Temporal Works
Order processing pipelines Multi-step, needs exactly-once semantics, spans minutes to days
User onboarding flows Multi-service orchestration (email, CRM, billing, notifications)
Data pipelines with retries ETL jobs that need to retry individual steps, not restart from scratch
Subscription billing Monthly recurring processes with complex retry and dunning logic
Long-running approvals Workflows that wait days or weeks for human input
Saga pattern Distributed transactions with compensation (rollback) logic

Poor fits:

Use Case Better Alternative
Simple async job queue Redis + BullMQ, or SQS
Real-time event streaming Kafka, Pulsar, or NATS
Cron jobs with no state systemd timers or Kubernetes CronJobs
Sub-millisecond latency requirements Direct service-to-service calls
Simple request/response APIs Standard HTTP services

The rule of thumb: if your process has multiple steps, needs to survive failures, and the cost of losing progress is high, Temporal is worth evaluating.

Setting Up Temporal Locally

The fastest way to start is with the Temporal CLI development server:

# Install the Temporal CLI
brew install temporal        # macOS
curl -sSf https://temporal.download/cli.sh | sh  # Linux

# Start the development server (includes UI)
temporal server start-dev

# The server runs at localhost:7233
# The UI is at localhost:8233

For a TypeScript project:

# Initialize a new project
mkdir temporal-project && cd temporal-project
npm init -y
npm install @temporalio/client @temporalio/worker @temporalio/workflow @temporalio/activity

# Project structure
# src/
#   workflows.ts    -- Workflow definitions
#   activities.ts   -- Activity implementations
#   worker.ts       -- Worker process
#   client.ts       -- Client to start workflows

Start a workflow from a client:

// client.ts
import { Client } from '@temporalio/client';
import { orderWorkflow } from './workflows';

async function main() {
  const client = new Client();

  const result = await client.workflow.execute(orderWorkflow, {
    taskQueue: 'order-processing',
    workflowId: `order-${orderId}`,
    args: [orderData],
  });

  console.log('Order completed:', result);
}

The workflowId is important -- it's an idempotency key. If you try to start a workflow with the same ID while one is already running, Temporal rejects the duplicate. This prevents double-processing.

Practical Patterns

Saga Pattern (Compensating Transactions)

When a multi-step process fails partway through, you need to undo the completed steps. Temporal makes this straightforward:

export async function bookingWorkflow(trip: TripRequest): Promise<BookingResult> {
  const compensations: Array<() => Promise<void>> = [];

  try {
    const flight = await bookFlight(trip.flight);
    compensations.push(() => cancelFlight(flight.id));

    const hotel = await bookHotel(trip.hotel);
    compensations.push(() => cancelHotel(hotel.id));

    const car = await bookCar(trip.car);
    compensations.push(() => cancelCar(car.id));

    return { flight, hotel, car, status: 'confirmed' };
  } catch (err) {
    // Something failed -- run compensations in reverse order
    for (const compensate of compensations.reverse()) {
      await compensate();
    }
    throw err;
  }
}

If bookCar fails, the workflow automatically cancels the hotel and flight bookings in reverse order.

Waiting for External Signals

Workflows can pause and wait for external events using signals:

import { defineSignal, setHandler, condition } from '@temporalio/workflow';

const approvalSignal = defineSignal<[{ approved: boolean; approver: string }]>('approval');

export async function expenseWorkflow(expense: Expense): Promise<ExpenseResult> {
  let approval: { approved: boolean; approver: string } | undefined;

  setHandler(approvalSignal, (result) => {
    approval = result;
  });

  // Submit for approval
  await notifyApprover(expense);

  // Wait up to 7 days for approval
  const gotApproval = await condition(() => approval !== undefined, '7 days');

  if (!gotApproval) {
    return { status: 'expired' };
  }

  if (approval!.approved) {
    await processReimbursement(expense);
    return { status: 'reimbursed', approver: approval!.approver };
  }

  return { status: 'rejected', approver: approval!.approver };
}

An external service sends the signal when the manager clicks "Approve" or "Reject":

const handle = client.workflow.getHandle('expense-12345');
await handle.signal(approvalSignal, { approved: true, approver: '[email protected]' });

Scheduled and Recurring Workflows

Temporal supports cron-like schedules natively:

// Start a workflow that runs every day at 9am UTC
await client.workflow.start(dailyReportWorkflow, {
  taskQueue: 'reports',
  workflowId: 'daily-report',
  cronSchedule: '0 9 * * *',
  args: [reportConfig],
});

Each cron execution is a separate workflow run with its own event history, so failures in one run don't affect the next.

Temporal vs. Alternatives

Feature Temporal Step Functions Celery BullMQ
Workflow as code Yes JSON/YAML (ASL) Limited No
Language support TS, Go, Java, Python, .NET Any (via Lambda) Python Node.js
Self-hostable Yes No (AWS only) Yes Yes
Durable timers Yes Yes No Limited
Saga support Native Manual Manual No
Managed offering Temporal Cloud Built-in No No
Complexity Medium-high Medium Medium Low
Best for Complex multi-step workflows AWS-native workflows Python task queues Simple job queues

Step Functions is the closest competitor, but it requires defining workflows in Amazon States Language (JSON), which gets unwieldy for complex logic. Temporal lets you use your language's native control flow.

Production Considerations

Versioning Workflows

Running workflows can last days, weeks, or longer. When you deploy new code, you need to handle in-flight workflows that started with the old code. Temporal provides workflow versioning:

import { patched } from '@temporalio/workflow';

export async function orderWorkflow(order: Order): Promise<OrderResult> {
  if (patched('add-fraud-check')) {
    // New code path -- only runs for workflows started after this deploy
    await checkForFraud(order);
  }

  await chargePayment(order.paymentMethod, order.total);
  // ... rest of workflow
}

Observability

Temporal's web UI shows running workflows, their event histories, and current state. For production monitoring, Temporal exposes Prometheus metrics that you can pipe into Grafana.

Key metrics to watch:

Scaling

Temporal scales horizontally at both the server and worker level. The server shards workflow histories across database partitions. Workers are stateless -- add more to handle increased load.

For most teams, Temporal Cloud is the easier path. Self-hosting requires PostgreSQL or Cassandra, Elasticsearch for visibility, and operational expertise to manage the cluster.

Getting Started Checklist

  1. Install the Temporal CLI and start the dev server
  2. Work through the official TypeScript tutorial (temporal.io/docs)
  3. Build a simple workflow with 2-3 activities
  4. Add retry policies and timeout handling
  5. Experiment with signals and queries
  6. Set up the Temporal web UI for visibility
  7. Evaluate Temporal Cloud vs. self-hosting for production

Temporal has a steep-ish learning curve because the deterministic workflow model requires a mental shift. But once it clicks, you stop writing fragile retry logic and ad-hoc state machines. The workflow just runs, and if something breaks, it picks up where it left off.