← All articles
BACKEND Background Job and Workflow Tools: BullMQ, Temporal,... 2026-02-09 · 10 min read · background-jobs · bullmq · temporal

Background Job and Workflow Tools: BullMQ, Temporal, and Celery

Backend 2026-02-09 · 10 min read background-jobs bullmq temporal celery queues redis workflows

Background Job and Workflow Tools: BullMQ, Temporal, and Celery

Every non-trivial application eventually needs to do work outside the request-response cycle. Send an email after signup. Generate a PDF invoice. Process a video upload. Sync data with a third-party API. The question is not whether you need background jobs -- it is which tool fits the shape of your work.

The landscape splits into two categories: job queues and workflow engines. Job queues (BullMQ, Celery, Sidekiq) execute discrete tasks asynchronously. Workflow engines (Temporal, Inngest, Step Functions) orchestrate multi-step processes with durable state. Picking the wrong category is a bigger mistake than picking the wrong tool within a category.

This guide covers the three most popular open-source options across languages, plus guidance on when you should skip all of them and use cron.

When Simple Cron Is Enough

Before reaching for a job queue, ask yourself: does this work need to be triggered by an event, or does it just need to run on a schedule?

If the answer is "on a schedule," cron (or systemd timers) might be all you need. A nightly database cleanup, a weekly report email, an hourly cache warm -- these do not require a message broker, worker processes, or retry logic. A cron job that runs a script is the simplest possible solution, and simplicity is a feature.

# /etc/cron.d/nightly-cleanup
0 3 * * * appuser /opt/app/scripts/cleanup.sh >> /var/log/app/cleanup.log 2>&1

Cron fails when you need:

If you need any of these, keep reading.

Job Queues vs. Workflow Engines

Job queues process individual tasks. You enqueue a message ("send email to user 42"), a worker picks it up, executes it, and acknowledges completion. Each job is independent. If job B depends on the result of job A, you have to wire that up yourself -- typically by having job A enqueue job B upon completion.

Workflow engines orchestrate sequences of steps as a single logical unit. A workflow might be "charge credit card, then provision account, then send welcome email, then wait 3 days, then send onboarding email." The engine tracks where you are in the sequence, handles retries at each step, and can resume from the exact point of failure.

The rule of thumb: if your background work is "do this one thing," use a job queue. If it is "do these five things in order, with branching logic and compensation on failure," use a workflow engine.

BullMQ: The Node.js Standard

BullMQ is the dominant job queue for Node.js applications. It uses Redis as its message broker, which means you probably already have the infrastructure. It replaced the older Bull library with better TypeScript support, improved performance, and a cleaner API.

Setup and Basic Usage

bun add bullmq
# or: npm install bullmq
import { Queue, Worker } from "bullmq";

const connection = { host: "localhost", port: 6379 };

// Producer: add jobs to the queue
const emailQueue = new Queue("email", { connection });

await emailQueue.add("welcome", {
  userId: "user-42",
  template: "welcome-email",
});

// Consumer: process jobs from the queue
const worker = new Worker(
  "email",
  async (job) => {
    const { userId, template } = job.data;
    await sendEmail(userId, template);
    return { sent: true };
  },
  { connection, concurrency: 5 }
);

worker.on("completed", (job, result) => {
  console.log(`Job ${job.id} completed:`, result);
});

worker.on("failed", (job, err) => {
  console.error(`Job ${job?.id} failed:`, err.message);
});

Retry Strategies

BullMQ supports exponential backoff out of the box. Configure it per-queue or per-job:

await emailQueue.add(
  "welcome",
  { userId: "user-42" },
  {
    attempts: 5,
    backoff: {
      type: "exponential",
      delay: 1000, // 1s, 2s, 4s, 8s, 16s
    },
  }
);

For custom retry logic, use a custom backoff strategy:

await emailQueue.add(
  "webhook-delivery",
  { url: "https://example.com/hook", payload: data },
  {
    attempts: 8,
    backoff: {
      type: "custom",
    },
  }
);

// In worker options:
const worker = new Worker("webhooks", processor, {
  connection,
  settings: {
    backoffStrategy: (attemptsMade) => {
      // 1s, 5s, 30s, 2min, 10min, 30min, 1hr, 2hr
      const delays = [1000, 5000, 30000, 120000, 600000, 1800000, 3600000, 7200000];
      return delays[attemptsMade - 1] || delays[delays.length - 1];
    },
  },
});

Dead Letter Queues

Jobs that exhaust all retries need somewhere to go. BullMQ does not have a built-in dead letter queue, but the pattern is straightforward:

const dlq = new Queue("email-dlq", { connection });

worker.on("failed", async (job, err) => {
  if (job && job.attemptsMade >= (job.opts.attempts || 1)) {
    await dlq.add("failed-email", {
      originalJob: job.data,
      error: err.message,
      failedAt: new Date().toISOString(),
    });
  }
});

Scheduling and Rate Limiting

BullMQ handles delayed jobs and repeatable jobs natively:

// Delayed: run 30 minutes from now
await emailQueue.add("reminder", { userId: "user-42" }, { delay: 30 * 60 * 1000 });

// Repeatable: run every hour
await emailQueue.add("digest", { type: "hourly" }, { repeat: { every: 3600000 } });

// Cron pattern: run at 9am daily
await emailQueue.add("daily-report", {}, { repeat: { pattern: "0 9 * * *" } });

Rate limiting is built in:

const apiQueue = new Queue("third-party-api", { connection });

const worker = new Worker("third-party-api", processor, {
  connection,
  limiter: {
    max: 100,
    duration: 60000, // 100 jobs per minute
  },
});

Monitoring with BullBoard

BullBoard provides a web UI for monitoring queues:

import { createBullBoard } from "@bull-board/api";
import { BullMQAdapter } from "@bull-board/api/bullMQAdapter";
import { ExpressAdapter } from "@bull-board/express";

const serverAdapter = new ExpressAdapter();
createBullBoard({
  queues: [new BullMQAdapter(emailQueue), new BullMQAdapter(apiQueue)],
  serverAdapter,
});

app.use("/admin/queues", serverAdapter.getRouter());

BullMQ Strengths and Weaknesses

Strengths: Redis is fast and widely deployed. The API is clean and well-typed. Job priorities, rate limiting, and repeatable jobs work out of the box. The community is large and active.

Weaknesses: Redis is a single point of failure (Redis Sentinel or Cluster adds complexity). No built-in workflow orchestration -- chaining jobs requires manual wiring. Redis persistence (RDB/AOF) means jobs can be lost in a crash unless you configure it carefully. Memory-bound -- large payloads in Redis get expensive.

Celery: The Python Workhorse

Celery has been the default background job system for Python applications since 2009. It supports multiple message brokers (Redis, RabbitMQ, Amazon SQS) and has deep integration with Django and Flask.

Setup

pip install celery[redis]
# celery_app.py
from celery import Celery

app = Celery(
    "myapp",
    broker="redis://localhost:6379/0",
    backend="redis://localhost:6379/1",
)

app.conf.update(
    task_serializer="json",
    accept_content=["json"],
    result_serializer="json",
    timezone="UTC",
    task_track_started=True,
    task_acks_late=True,  # acknowledge after completion, not before
)

@app.task(bind=True, max_retries=5, default_retry_delay=60)
def send_email(self, user_id: int, template: str):
    try:
        do_send_email(user_id, template)
    except SMTPError as exc:
        raise self.retry(exc=exc, countdown=2 ** self.request.retries * 60)
# Start a worker
celery -A celery_app worker --loglevel=info --concurrency=4

Chaining and Workflows

Celery has primitives for composing tasks into workflows -- chains, groups, and chords:

from celery import chain, group, chord

# Sequential: process -> validate -> notify
pipeline = chain(
    process_upload.s(file_id),
    validate_result.s(),
    notify_user.s(user_id),
)
pipeline.apply_async()

# Parallel: process multiple files at once
batch = group(process_file.s(f) for f in file_ids)
batch.apply_async()

# Fan-out/fan-in: process files in parallel, then aggregate
workflow = chord(
    [process_file.s(f) for f in file_ids],
    aggregate_results.s()
)
workflow.apply_async()

These primitives are useful but brittle. Error handling in chords is notoriously tricky, and debugging a failed chain requires digging through multiple task results. If your workflows are complex, Celery's composition primitives will fight you.

Monitoring with Flower

Flower is the standard monitoring tool for Celery:

pip install flower
celery -A celery_app flower --port=5555

It provides real-time worker status, task history, rate controls, and the ability to revoke tasks from a web UI.

Celery Strengths and Weaknesses

Strengths: Battle-tested at massive scale (Instagram ran on Celery for years). Flexible broker support -- RabbitMQ for reliability, Redis for simplicity, SQS for AWS-native. Deep Django integration with django-celery-beat for periodic tasks. Large ecosystem of extensions.

Weaknesses: Configuration is sprawling and poorly documented -- there are hundreds of settings. The codebase is old and the abstractions leak. Celery 5.x improved things but broke backward compatibility. Canvas (chains/chords/groups) is powerful but fragile for complex workflows. Python's GIL means CPU-bound tasks need the prefork pool, which uses more memory. Cold start times can be slow.

Temporal: Durable Workflow Engine

Temporal is not a job queue. It is a workflow engine that guarantees your code will run to completion, even across failures, restarts, and deployments. If BullMQ and Celery are "fire and forget with retries," Temporal is "I will finish this workflow no matter what."

The Mental Model

In Temporal, you write workflows as ordinary functions. The Temporal server records every step. If a worker crashes mid-workflow, another worker picks up exactly where it left off -- not from the beginning, but from the last completed step.

// workflows.ts
import { proxyActivities, sleep } from "@temporalio/workflow";
import type * as activities from "./activities";

const { chargeCard, provisionAccount, sendEmail } = proxyActivities<typeof activities>({
  startToCloseTimeout: "30 seconds",
  retry: { maximumAttempts: 5 },
});

export async function onboardUser(userId: string, plan: string): Promise<void> {
  // Step 1: Charge the card
  const paymentId = await chargeCard(userId, plan);

  // Step 2: Provision the account
  await provisionAccount(userId, plan, paymentId);

  // Step 3: Send welcome email
  await sendEmail(userId, "welcome");

  // Step 4: Wait 3 days, then send onboarding tips
  await sleep("3 days");
  await sendEmail(userId, "onboarding-tips");
}
// activities.ts -- these are the side-effecting functions
export async function chargeCard(userId: string, plan: string): Promise<string> {
  const result = await stripe.charges.create({ /* ... */ });
  return result.id;
}

export async function provisionAccount(
  userId: string, plan: string, paymentId: string
): Promise<void> {
  await db.accounts.create({ userId, plan, paymentId });
}

export async function sendEmail(userId: string, template: string): Promise<void> {
  await emailService.send(userId, template);
}

The key insight: the workflow function looks like synchronous code, but each await on an activity is a checkpoint. If the worker crashes after chargeCard but before provisionAccount, Temporal replays the workflow, skips chargeCard (it already completed), and resumes at provisionAccount.

When Temporal Shines

Temporal is the right choice when:

Temporal Strengths and Weaknesses

Strengths: Durable execution -- workflows survive any failure. The programming model is natural (just write functions). Built-in versioning for deploying workflow changes without breaking running workflows. Excellent visibility into workflow state. Scales to millions of concurrent workflows.

Weaknesses: Operational complexity -- you need to run the Temporal server (or pay for Temporal Cloud). The replay model requires understanding deterministic constraints (no random numbers, no direct I/O in workflows). Steeper learning curve than a job queue. Overkill for simple fire-and-forget jobs. SDK support is strong for Go, TypeScript, Java, and Python, but thinner for other languages.

Comparison Table

Feature BullMQ Celery Temporal
Language TypeScript/JS Python Go, TS, Java, Python
Broker Redis Redis, RabbitMQ, SQS Temporal Server (Cassandra/PostgreSQL)
Job type Independent tasks Independent tasks + basic composition Durable workflows
Retry logic Configurable backoff Configurable backoff Per-activity retry policies
Scheduling Cron + delayed jobs celery-beat periodic tasks Cron schedules + workflow timers
Dead letter queue Manual (pattern) Built-in N/A (workflows retry to completion)
Monitoring BullBoard Flower Temporal Web UI
Hosted option None (Redis managed) None (broker managed) Temporal Cloud
Best for Node.js apps, moderate complexity Python apps, Django/Flask Multi-step workflows, long-running processes

Scaling Patterns

Scaling BullMQ

BullMQ workers are stateless -- scale them horizontally by running more worker processes. Use named queues to isolate workloads:

// Separate queues for different priorities
const criticalQueue = new Queue("critical", { connection });
const bulkQueue = new Queue("bulk", { connection });

// Dedicated workers with different concurrency
new Worker("critical", processor, { connection, concurrency: 10 });
new Worker("bulk", processor, { connection, concurrency: 2 });

Redis becomes the bottleneck at scale. Redis Cluster distributes load but adds complexity. For most applications, a single Redis instance handles thousands of jobs per second.

Scaling Celery

Celery scales by adding workers. Use separate queues for different task types:

@app.task(queue="critical")
def process_payment(order_id):
    pass

@app.task(queue="bulk")
def generate_report(user_id):
    pass
celery -A myapp worker -Q critical --concurrency=8
celery -A myapp worker -Q bulk --concurrency=2

RabbitMQ handles backpressure better than Redis for Celery workloads. If you are processing more than a few thousand tasks per second, RabbitMQ is the better broker choice.

Scaling Temporal

Temporal separates the workflow engine (server) from workflow execution (workers). Scale workers independently -- they are stateless. The Temporal server itself scales horizontally with Cassandra or PostgreSQL as its persistence layer.

For most teams, Temporal Cloud eliminates the operational burden of running the server.

Making the Choice

Use cron if your work is purely scheduled and does not need retries, status tracking, or event-driven triggering.

Use BullMQ if you are building a Node.js application and need a reliable job queue with Redis you already have. It covers 80% of background job use cases with minimal complexity.

Use Celery if you are building a Python application, especially with Django. It is the established choice with the largest ecosystem, despite its rough edges.

Use Temporal if your background work involves multi-step processes, long-running workflows, or complex failure handling. The operational overhead is real, but the programming model is unmatched for workflow orchestration.

The most common mistake is reaching for a workflow engine when a job queue would suffice, or reaching for a job queue when cron would do. Start with the simplest tool that handles your requirements, and migrate up when you hit its limits -- not before.