Task Queues & Background Jobs β€” Lecture Notes


1. What is a Background Task?

Definition: Any piece of code or logic that runs outside of the request-response lifecycle.

Client ──── request ────► Server ──── response ────► Client
                         ↕
                   Background Task
                (separate process, no response needed)

Key properties:

  • Not mission-critical to the immediate response
  • Not synchronous β€” doesn’t block the API call
  • Can be deferred, retried, and executed independently

2. Why Background Tasks? (The Problem They Solve)

Example: User sign-up with email verification

Synchronous approach (bad):

Client β†’ POST /signup β†’ Server validates β†’ calls Email API β†’ waits β†’ returns 200

Problems:

  • If the email service is down β†’ the signup API itself fails or lies to the user
  • API response is held until the email provider responds β€” poor latency
  • A slow or unresponsive external service directly degrades your API’s responsiveness

Async approach with background task (good):

Client β†’ POST /signup β†’ Server validates β†’ pushes task to queue β†’ returns 200 immediately
                                                      ↓
                                            Worker picks up task
                                                      ↓
                                            Worker calls Email API
                                                      ↓
                                     (retry with backoff if it fails)

Benefits:

  • API responds instantly β€” user sees β€œverification email sent” immediately
  • Email service downtime doesn’t affect signup success
  • Worker retries automatically until the email is sent successfully

3. Common Use Cases for Background Tasks

TaskWhy offload?
Sending emailsDepends on external email providers (Resend, Mailgun, Brevo) that may be slow or down
Processing images/videosCPU-intensive β€” resizing, encoding to multiple resolutions
Generating reportsHeavy DB queries + PDF generation; scheduled (daily/weekly/monthly)
Push notificationsExternal service call to Apple/Google push notification services
Account deletionLarge-scale DB operations across multiple tables/shards; can’t fit in one request
Cleanup/maintenanceDeleting orphan sessions, expired tokens, stale data periodically

4. How Task Queues Work

Core Components

Producer (your app) ──► Queue / Broker ──► Consumer / Worker (separate process)
    β”‚                        β”‚                       β”‚
  Creates task          Stores tasks          Picks up & executes tasks
  Serializes to JSON    until consumed        Deserializes β†’ runs handler
  Pushes to queue       Manages ordering      Sends acknowledgement

Producer:

  • Application code (Node.js/Python/Go/Rust β€” any language)
  • Creates the task with all data the worker needs
  • Serializes to JSON (or another format)
  • ENQueues (pushes) the task into the broker

Broker (Queue):

  • Temporary holding area for tasks
  • Technologies: RabbitMQ, Redis PubSub, Amazon SQS, BullMQ’s Redis backend
  • Stores tasks until a worker is ready
  • Manages ordering, retries, visibility timeouts

Consumer / Worker:

  • Runs in a separate process from the main backend
  • Constantly monitors the queue for new tasks
  • DEQueues (picks up) a task when available
  • Deserializes the JSON β†’ native format β†’ runs the registered handler
  • Sends acknowledgement back to the queue on success or failure

Libraries by Language

LanguagePopular Libraries
PythonCelery
Node.jsBullMQ
Goasynq

5. Reliability Mechanisms

Acknowledgements

After a worker processes a task, it sends an acknowledgement (ack) back to the queue:

  • Success ack β†’ queue removes the task permanently
  • Failure ack β†’ queue schedules a retry
  • No ack (timeout) β†’ queue makes task available to another worker

Visibility Timeout

The period during which a task is considered β€œin progress” by a worker. If the worker crashes or hangs and doesn’t ack within the timeout, the queue re-releases the task for another worker to pick up.

Worker picks task β†’ visibility timeout starts
                        ↓
Worker acks within timeout β†’ task removed βœ“
Worker doesn't ack in time β†’ task requeued for another worker βœ“

Prevents tasks from being lost when a worker crashes mid-execution.

Retry with Exponential Backoff

When a task fails, it’s retried with increasing delays:

Attempt 1 fails β†’ retry after 1 min
Attempt 2 fails β†’ retry after 2 min
Attempt 3 fails β†’ retry after 4 min
Attempt 4 fails β†’ retry after 8 min
...up to max retries (e.g., 5)

Most external service downtime is seconds to milliseconds β€” the task usually succeeds within 1–2 retries.


6. Types of Background Tasks

6.1 One-Off Tasks

Triggered by a specific event; executed once.

Examples:

  • Send verification email after signup
  • Send welcome email after successful verification
  • Send password reset email
  • Send in-app or push notification when a user receives a message

6.2 Recurring Tasks (Cron Jobs)

Executed repeatedly at fixed intervals. Most task frameworks support scheduled/periodic tasks.

Examples:

  • Send daily/weekly/monthly reports to users
  • Clean up expired/orphan sessions from the database
  • Delete soft-deleted records after a grace period
  • Refresh external API data (exchange rates, weather)

6.3 Chain Tasks (Parent-Child Relationship)

Task B can only start after Task A completes successfully. Multiple independent tasks can run in parallel if they share the same parent.

Example: Video upload pipeline on an LMS platform

Video uploaded
       ↓
Task 1: Encode video to multiple resolutions
       ↓
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚                        β”‚
Task 2a: Generate          Task 2b: Generate audio
thumbnails                 transcription (subtitles)
  β”‚
  ↓
Task 3: Process thumbnails to multiple resolutions
  • Task 2a and 2b can run in parallel (neither depends on the other)
  • Task 3 depends on Task 2a completing first
  • Task 1 must complete before 2a or 2b can start

6.4 Batch Tasks

A single trigger spawns many tasks, or a single task processes a large volume of data.

Example 1: Account deletion

DELETE /account API call β†’ returns 200 immediately
                      ↓
Worker: delete all user projects
        delete all user assets (images, files)
        delete user profile
        delete user record
        send confirmation email

Entire deletion happens in background; user is logged out and sees β€œaccount deleted” without waiting.

Example 2: Scheduled report delivery

Midnight cron job β†’ triggers 10,000 report generation tasks
                    (one per user, all at once)

7. Design Considerations at Scale

ConsiderationDetail
IdempotencyDesign tasks so they can be safely retried from scratch without side effects. If a delete task fails midway, rolling back via DB transaction ensures the next retry starts clean
Error handlingRobust error handling is critical β€” everything runs in a separate process, so unhandled exceptions can cause silent failures
MonitoringTrack queue length, task success/failure rates, error types. Tools: Prometheus + Grafana
ScalabilityDesign consumers to scale horizontally β€” add more worker nodes as user base grows
OrderingIf task execution order matters, ensure your queue/library supports ordered delivery
Rate limitingIf tasks call external APIs, implement rate limiting on the worker side to avoid hitting provider rate limits

8. Best Practices

Keep tasks small and focused One task = one processing unit. Don’t bundle unrelated operations. If one step fails, only that step retries β€” not the entire pipeline.

Avoid long-running tasks If a task takes too long, it’s a signal to break it into smaller chained or parallel tasks. Long tasks are harder to retry, harder to monitor, and hold up worker resources.

Robust error handling and logging Log every failure with enough context to debug (task ID, user ID, error message, stack trace). Good logging = faster debugging when tasks fail in production.

Monitor queue length and worker health Set up alerting: if queue length spikes unexpectedly, workers may have crashed or external services may be down. React before users notice.


9. Summary

ConceptDefinition
Background TaskCode running outside the request-response lifecycle
ProducerApplication code that creates and enqueues tasks
Broker / QueueTemporary storage for tasks (RabbitMQ, Redis, SQS)
Consumer / WorkerSeparate process that dequeues and executes tasks
ENQueuePushing a task into the queue
DEQueuePulling a task out of the queue
AcknowledgementSignal from worker to queue confirming task success/failure
Visibility TimeoutTime window in which a task is β€œreserved” by a worker; re-released if no ack
Exponential BackoffRetry delay that doubles after each failure
IdempotencyTask can safely be retried without causing unintended side effects

Quick Revision Checklist

  • Background task = logic outside the request-response lifecycle; not blocking, not synchronous
  • Main reasons to offload: external service dependency, heavy computation, long-running operations
  • Components: Producer (creates task) β†’ Broker/Queue (stores it) β†’ Consumer/Worker (executes it)
  • Worker runs in a separate process; monitors queue constantly
  • Acknowledgement mechanism: success ack removes task; no ack within visibility timeout β†’ task requeued
  • Retry with exponential backoff: doubles wait time after each failure; configurable max retries
  • One-off tasks: single trigger β†’ single execution (emails, notifications)
  • Recurring tasks: cron-like intervals (reports, cleanup jobs)
  • Chain tasks: parent-child dependency (video encode β†’ thumbnail generation β†’ transcription)
  • Batch tasks: one trigger β†’ many tasks, or large-volume processing (account deletion, bulk reports)
  • Design for idempotency: tasks must be safely retryable from scratch
  • Monitor: queue length, success/failure rates, worker health (Prometheus + Grafana)
  • Scale by adding more consumer instances horizontally
  • Keep tasks small and focused; avoid long-running monolithic tasks
  • Libraries: Celery (Python), BullMQ (Node.js), asynq (Go); brokers: RabbitMQ, Redis, Amazon SQS