Container Orchestration: Docker Compose vs Kubernetes vs Nomad vs Swarm

Containers 2026-02-09 · 10 min read containers docker kubernetes nomad docker-compose orchestration devops

Container Orchestration: Docker Compose vs Kubernetes vs Nomad vs Swarm

Container orchestration is the most over-engineered decision in modern software. Teams running three containers reach for Kubernetes. Startups with one server set up Helm charts. The right tool depends on your scale, and most projects need far less than they think. This guide covers the spectrum from Docker Compose to Kubernetes, with honest advice about where each tool makes sense.

The Orchestration Spectrum

Tool	Complexity	Best Scale	Learning Curve	Operational Overhead
Docker Compose	Low	1 server, 1-20 containers	Hours	Minimal
Docker Swarm	Low-Medium	2-10 servers	Days	Low
HashiCorp Nomad	Medium	5-500 servers	Weeks	Medium
Kubernetes (K8s)	High	10-10,000+ servers	Months	High

The key insight: you can go very far with Docker Compose. Most startups and small-to-medium applications don't need anything more. Orchestration tools solve problems of scale -- if you don't have scale problems, the tool is adding complexity for free.

Docker Compose: The Practical Default

Docker Compose manages multi-container applications on a single host. It's the right choice for most development environments and many production deployments.

Production-Ready Docker Compose

# docker-compose.yml
services:
  app:
    build:
      context: .
      dockerfile: Dockerfile
      target: production
    ports:
      - "3000:3000"
    environment:
      - DATABASE_URL=postgresql://app:secret@db:5432/myapp
      - REDIS_URL=redis://redis:6379
      - NODE_ENV=production
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_healthy
    restart: unless-stopped
    deploy:
      resources:
        limits:
          cpus: "2.0"
          memory: 1G
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

  worker:
    build:
      context: .
      dockerfile: Dockerfile
      target: production
    command: ["bun", "run", "worker.ts"]
    environment:
      - DATABASE_URL=postgresql://app:secret@db:5432/myapp
      - REDIS_URL=redis://redis:6379
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_healthy
    restart: unless-stopped
    deploy:
      replicas: 2
      resources:
        limits:
          cpus: "1.0"
          memory: 512M

  db:
    image: postgres:16-alpine
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql
    environment:
      - POSTGRES_USER=app
      - POSTGRES_PASSWORD=secret
      - POSTGRES_DB=myapp
    restart: unless-stopped
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U app -d myapp"]
      interval: 10s
      timeout: 5s
      retries: 5
    deploy:
      resources:
        limits:
          cpus: "2.0"
          memory: 2G

  redis:
    image: redis:7-alpine
    volumes:
      - redis_data:/data
    command: redis-server --appendonly yes --maxmemory 256mb --maxmemory-policy allkeys-lru
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5

  caddy:
    image: caddy:2-alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile
      - caddy_data:/data
      - caddy_config:/config
    depends_on:
      - app
    restart: unless-stopped

volumes:
  postgres_data:
  redis_data:
  caddy_data:
  caddy_config:

Multi-Stage Dockerfile

# Dockerfile
FROM oven/bun:1 AS base
WORKDIR /app

# Install dependencies
FROM base AS deps
COPY package.json bun.lockb ./
RUN bun install --frozen-lockfile --production

# Build
FROM base AS build
COPY package.json bun.lockb ./
RUN bun install --frozen-lockfile
COPY . .
RUN bun run build

# Production
FROM base AS production
COPY --from=deps /app/node_modules ./node_modules
COPY --from=build /app/dist ./dist
COPY package.json ./

USER bun
EXPOSE 3000
CMD ["bun", "run", "dist/server.js"]

Zero-Downtime Deploys with Docker Compose

#!/bin/bash
# deploy.sh -- zero-downtime deploy with Docker Compose

set -euo pipefail

echo "Building new image..."
docker compose build app

echo "Starting new containers..."
docker compose up -d --no-deps --scale app=2 app

echo "Waiting for new containers to be healthy..."
sleep 30

echo "Removing old containers..."
docker compose up -d --no-deps --scale app=1 app

echo "Deploy complete."

When Docker Compose Is Enough

Single server with 1-20 containers
Startups and small teams (under 10 engineers)
Applications with < 10,000 concurrent users (depending on workload)
Development and staging environments for any size project
Side projects and internal tools

When You've Outgrown Docker Compose

You need to run across multiple servers for high availability
A single server can't handle your traffic
You need automatic failover when a server dies
You need to scale specific services independently across machines

Docker Swarm: Multi-Host Without the Complexity

Docker Swarm is Docker's built-in orchestration. It extends Docker Compose syntax to work across multiple servers. If you've outgrown a single server but aren't ready for Kubernetes, Swarm is the gentlest next step.

Setting Up a Swarm

# On the manager node
docker swarm init --advertise-addr 10.0.1.1

# On worker nodes (using the token from the init output)
docker swarm join --token SWMTKN-1-xxx 10.0.1.1:2377

# Check cluster status
docker node ls

Deploying a Stack

# stack.yml (Swarm-compatible Compose file)
services:
  app:
    image: registry.example.com/myapp:latest
    deploy:
      replicas: 3
      update_config:
        parallelism: 1
        delay: 30s
        order: start-first    # New container starts before old one stops
        failure_action: rollback
      rollback_config:
        parallelism: 1
      restart_policy:
        condition: on-failure
        max_attempts: 3
      placement:
        constraints:
          - node.role == worker
      resources:
        limits:
          cpus: "1.0"
          memory: 512M
    ports:
      - "3000:3000"
    networks:
      - app-network

  db:
    image: postgres:16-alpine
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.labels.db == true  # Pin to specific node
    volumes:
      - postgres_data:/var/lib/postgresql/data
    networks:
      - app-network

networks:
  app-network:
    driver: overlay

volumes:
  postgres_data:

# Deploy the stack
docker stack deploy -c stack.yml myapp

# Check service status
docker service ls
docker service ps myapp_app

# Scale a service
docker service scale myapp_app=5

# Rolling update
docker service update --image registry.example.com/myapp:v2 myapp_app

# Rollback
docker service update --rollback myapp_app

Swarm Strengths

Uses Docker Compose syntax -- minimal learning curve
Built into Docker -- no additional software to install
Built-in load balancing via routing mesh
Rolling updates and rollbacks out of the box
Service discovery via DNS

Swarm Limitations

Limited ecosystem -- fewer tools and integrations than Kubernetes
No autoscaling -- must scale manually or script it
Declining community -- Docker has shifted focus to Desktop and Hub
Simpler scheduling -- no pod affinity, taints, or tolerations
No native ingress -- need to add Traefik or similar

When to Choose Swarm

You know Docker Compose and need multi-host deployment
2-10 servers is your target scale
You want the simplest possible multi-host orchestration
Your team doesn't have Kubernetes expertise and doesn't want to invest in it

HashiCorp Nomad: The Middle Ground

Nomad occupies the space between Docker Swarm and Kubernetes. It's simpler than K8s but more capable than Swarm. It's also not Docker-specific -- it can orchestrate containers, VMs, Java apps, and raw binaries.

Nomad Job Specification

# web-app.nomad.hcl
job "web-app" {
  datacenters = ["dc1"]
  type        = "service"

  group "app" {
    count = 3

    network {
      port "http" {
        to = 3000
      }
    }

    service {
      name = "web-app"
      port = "http"
      tags = ["urlprefix-/"]

      check {
        type     = "http"
        path     = "/health"
        interval = "10s"
        timeout  = "5s"
      }
    }

    task "server" {
      driver = "docker"

      config {
        image = "registry.example.com/myapp:latest"
        ports = ["http"]
      }

      resources {
        cpu    = 500   # MHz
        memory = 512   # MB
      }

      env {
        NODE_ENV     = "production"
        DATABASE_URL = "postgresql://app:[email protected]:5432/myapp"
      }

      template {
        # Pull secrets from Vault
        data = <<EOF
{{ with secret "secret/data/myapp" }}
JWT_SECRET={{ .Data.data.jwt_secret }}
STRIPE_KEY={{ .Data.data.stripe_key }}
{{ end }}
EOF
        destination = "secrets/env"
        env         = true
      }
    }

    update {
      max_parallel     = 1
      min_healthy_time = "30s"
      healthy_deadline = "5m"
      auto_revert      = true
      canary           = 1
    }

    scaling {
      enabled = true
      min     = 2
      max     = 10

      policy {
        # Auto-scale based on CPU
        check "cpu" {
          source = "nomad-apm"
          query  = "avg_cpu"
          strategy "target-value" {
            target = 70
          }
        }
      }
    }
  }
}

Running Nomad

# Deploy a job
nomad job run web-app.nomad.hcl

# Check status
nomad job status web-app

# View allocations (running instances)
nomad alloc status <alloc-id>

# View logs
nomad alloc logs <alloc-id>

# Scale manually
nomad job scale web-app app 5

# Rolling update (just run with updated image)
nomad job run web-app.nomad.hcl

# Plan (dry run)
nomad job plan web-app.nomad.hcl

Nomad Strengths

Simpler than Kubernetes -- one binary, less configuration
Multi-workload -- containers, VMs, Java JARs, raw binaries
HashiCorp ecosystem -- integrates with Consul (service mesh) and Vault (secrets)
Autoscaling -- built-in, no additional controller needed
Canary deployments -- first-class support
Single binary -- easy to deploy, no etcd or API server dependencies

Nomad Limitations

Smaller ecosystem than Kubernetes (fewer operators, tools, integrations)
Less community support -- fewer Stack Overflow answers, fewer blog posts
No built-in ingress controller -- need Consul Connect, Traefik, or similar
HashiCorp licensing -- changed to BSL in 2023 (not fully open-source)

When to Choose Nomad

5-100+ servers, and you want simpler operations than K8s
You run mixed workloads (not just containers)
You're already in the HashiCorp ecosystem (Vault, Consul, Terraform)
You want canary deployments and autoscaling without K8s complexity

Kubernetes: The Industry Standard

Kubernetes is the most powerful container orchestration system. It handles any scale, has a massive ecosystem, and every major cloud provider offers a managed version. The trade-off is significant complexity -- in configuration, operations, and mental overhead.

Basic Kubernetes Resources

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
  labels:
    app: web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
        - name: app
          image: registry.example.com/myapp:v1.2.3
          ports:
            - containerPort: 3000
          env:
            - name: NODE_ENV
              value: "production"
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: app-secrets
                  key: database-url
          resources:
            requests:
              cpu: 250m
              memory: 256Mi
            limits:
              cpu: 500m
              memory: 512Mi
          readinessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 10
            periodSeconds: 5
          livenessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 30
            periodSeconds: 10

# service.yaml
apiVersion: v1
kind: Service
metadata:
  name: web-app
spec:
  selector:
    app: web-app
  ports:
    - port: 80
      targetPort: 3000
  type: ClusterIP

# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web-app
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - app.example.com
      secretName: app-tls
  rules:
    - host: app.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: web-app
                port:
                  number: 80

# hpa.yaml -- Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Essential kubectl Commands

# View resources
kubectl get pods
kubectl get services
kubectl get deployments
kubectl get ingress

# Describe a resource (detailed info + events)
kubectl describe pod web-app-7d8f6c9b4-x2k4l

# View logs
kubectl logs web-app-7d8f6c9b4-x2k4l
kubectl logs -f web-app-7d8f6c9b4-x2k4l     # Follow
kubectl logs --previous web-app-7d8f6c9b4-x2k4l  # Previous crash

# Execute a command in a pod
kubectl exec -it web-app-7d8f6c9b4-x2k4l -- /bin/sh

# Apply configuration
kubectl apply -f deployment.yaml

# Scale
kubectl scale deployment web-app --replicas=5

# Rollback
kubectl rollout undo deployment web-app
kubectl rollout status deployment web-app
kubectl rollout history deployment web-app

Managed Kubernetes Services

Service	Provider	Strengths
EKS	AWS	Deep AWS integration, most popular
GKE	Google Cloud	Best managed K8s experience, Autopilot mode
AKS	Azure	Good for Microsoft-heavy shops
DigitalOcean K8s	DigitalOcean	Simplest managed K8s, lower cost
Linode K8s	Akamai	Simple, affordable

Strong recommendation: Use managed Kubernetes. Running your own control plane is a full-time job. GKE Autopilot or EKS with Fargate removes even more operational burden by managing the worker nodes.

Kubernetes Strengths

Ecosystem: Thousands of operators, tools, and integrations
Cloud-native standard: Every cloud provider, every CI/CD tool, every monitoring tool supports K8s
Self-healing: Automatically restarts crashed containers, reschedules on node failure
Advanced scheduling: Affinity, anti-affinity, taints, tolerations, topology spread
Extensibility: Custom Resource Definitions (CRDs) extend the API for anything

Kubernetes Weaknesses

Complexity: YAML verbosity, multiple resource types, networking model
Learning curve: Months to become proficient, years to master
Operational overhead: Even managed K8s requires significant expertise
Resource consumption: The control plane and system pods consume resources
Overkill for small workloads: Running 3 pods on a 3-node cluster is wasteful

Decision Framework

How many servers do you need?

1 server:
  → Docker Compose (done)

2-5 servers:
  → Docker Swarm (if you want simplicity)
  → Nomad (if you want more features)

5-50 servers:
  → Nomad (if you want simpler ops)
  → Managed Kubernetes (if you want ecosystem/hiring)

50+ servers:
  → Kubernetes (managed, preferably)

Special cases:
  Mixed workloads (not just containers): → Nomad
  Already in HashiCorp ecosystem: → Nomad
  Team has K8s experience: → Kubernetes at any scale
  Startup with 3 engineers: → Docker Compose until it hurts

The "Until It Hurts" Philosophy

Start with the simplest tool that works:

Start with Docker Compose on a single server
When you need high availability or more capacity, move to Docker Swarm or Nomad
When you need advanced scheduling, ecosystem tooling, or your team grows, move to Kubernetes

Each migration is a step function in complexity. Don't jump to step 3 because "we might need it someday." The cost of running Kubernetes when you don't need it is real: more YAML, more debugging, more specialized knowledge required, and more things that can break.

Lightweight Kubernetes Distributions

If you do need Kubernetes but want lower overhead, these distributions are simpler to run:

Distribution	Best For	Notable Feature
k3s	Edge, small clusters	Single binary, < 100MB memory
k0s	Production-ready minimal K8s	Zero friction install
MicroK8s	Developer workstations	Snap-based, easy add-ons
kind	CI/CD testing	Runs K8s in Docker containers
minikube	Local development	Multiple driver options

# Install k3s (production-ready K8s in 30 seconds)
curl -sfL https://get.k3s.io | sh -

# Check it's running
kubectl get nodes

# k3s includes:
# - Traefik (ingress)
# - CoreDNS (service discovery)
# - Flannel (networking)
# - Local storage provisioner

Summary

Container orchestration exists on a spectrum, and the right tool depends on your scale, not your ambition. Docker Compose handles far more than most teams realize -- a single well-provisioned server with Compose can serve thousands of concurrent users. Docker Swarm and Nomad occupy the middle ground for teams that need multi-host without Kubernetes complexity. Kubernetes is the right choice at genuine scale or when you need its ecosystem, but it's the wrong choice if you're three engineers running five containers. Start simple, and add complexity only when the pain of the current tool exceeds the cost of migration.