Skip to main content
avatarJay Patel

Solving the Double-Booking Problem: Distributed Locking, Idempotency, and Redis

How I built a ticket booking system that handles concurrent seat reservations using Redis distributed locks, PostgreSQL pessimistic locking, idempotency keys, and compensating transactions.

Posted Apr 11, 20269 min readBackend, Distributed Systems

The Problem: Two People, One Seat

Imagine 500 people trying to book the same concert seat at the same moment. Without proper concurrency control, multiple users can pass the "is this seat available?" check before any of them actually reserve it. The result: duplicate bookings, double charges, and a customer support nightmare.

This isn't a theoretical concern. I've seen race conditions in production booking systems that caused exactly this. So I built a ticket booking system from scratch to explore three different approaches to solving it — from intentionally broken to production-ready.

Three Approaches, One Problem

The system implements three booking strategies side by side, so you can see exactly where each one breaks or holds:

ApproachMechanismCorrectnessAvg ResponseScales Horizontally?
NaiveNo lockingBroken~100msN/A
PessimisticSELECT FOR UPDATECorrect~500msNo
DistributedRedis SET NXCorrect~150msYes

Let's walk through each one.

The Naive Approach: How Race Conditions Happen

The naive implementation checks if a seat is available, then reserves it. The problem is the gap between check and update:

Request A: Checks seat → Available
Request B: Checks seat → Available    (A hasn't written yet!)
Request C: Checks seat → Available    (Neither A nor B have written!)
Request A: Updates seat → Reserved
Request B: Updates seat → Reserved    (Overwrites A!)
Request C: Updates seat → Reserved    (Overwrites B!)

Three users all get confirmation. One seat exists. The test script fires 10 concurrent requests at the same seat and consistently produces race conditions:

RACE CONDITION DETECTED!
3 users successfully "reserved" the SAME seat!

This is the baseline — the thing every other approach exists to prevent.

Pessimistic Locking: Let the Database Handle It

The first correct approach uses PostgreSQL's SELECT ... FOR UPDATE. When a transaction locks a row, every other transaction trying to lock the same row blocks until the first one commits or rolls back:

SELECT * FROM "seats"
WHERE "id" = $1
AND "eventId" = $2
FOR UPDATE

The timeline becomes sequential:

Request A: SELECT FOR UPDATE → Gets lock, proceeds
Request B: SELECT FOR UPDATE → BLOCKS (waiting for A)
Request A: UPDATE, COMMIT → Releases lock
Request B: Gets lock, checks status → ALREADY RESERVED → Returns error

One critical detail: lock ordering. If User A locks seats [1, 2] and User B locks seats [2, 1], they deadlock. The fix is sorting seat IDs before acquiring locks:

const sortedSeatIds = [...seatIds].sort();

This works, but it has a fundamental limitation. Every request queues behind the previous one. Under load, response times climb because requests are blocking, not failing fast. And since the locks live in PostgreSQL, you can't scale horizontally across multiple database replicas.

Distributed Locking with Redis: The Production Approach

Redis distributed locks solve both problems. They're non-blocking (fail fast if the lock is taken) and work across multiple application servers.

Acquiring a Lock

The core primitive is Redis SET key value NX PX ttl — set a key only if it doesn't exist, with a millisecond expiry:

export async function acquireLock(
  key: string,
  options: LockOptions = {}
): Promise<AcquiredLock | null> {
  const { ttlSeconds = 10, retryCount = 0, retryDelayMs = 100 } = options;
  const value = uuidv4(); // Unique owner identifier
  const ttlMs = ttlSeconds * 1000;
 
  for (let attempt = 0; attempt <= retryCount; attempt++) {
    const result = await redis.set(key, value, 'PX', ttlMs, 'NX');
    if (result === 'OK') {
      return { key, value }; // Lock acquired
    }
    if (attempt < retryCount) {
      await sleep(retryDelayMs);
    }
  }
  return null; // Lock not acquired
}

The value is a UUID — the lock owner's identity. This matters when releasing.

Releasing a Lock Safely

You can't just DEL the key. Consider this scenario:

  1. Process A acquires lock with value "A123"
  2. Process A takes too long — lock expires via TTL
  3. Process B acquires lock with value "B456"
  4. Process A finishes and calls DEL
  5. Process A just deleted Process B's lock

The fix is a Lua script that atomically checks ownership before deleting:

if redis.call('get', KEYS[1]) == ARGV[1] then
  return redis.call('del', KEYS[1])
else
  return 0
end

This runs atomically on the Redis server — no race condition between the check and the delete.

Multi-Seat Locking

When a user reserves multiple seats, all locks must be acquired atomically. If any lock fails, all previously acquired locks are released:

export async function acquireMultipleLocks(
  keys: string[],
  options: LockOptions = {}
): Promise<AcquiredLock[] | null> {
  const sortedKeys = [...keys].sort(); // Prevent deadlocks
  const acquired: AcquiredLock[] = [];
 
  for (const key of sortedKeys) {
    const lock = await acquireLock(key, options);
    if (!lock) {
      // Release all previously acquired locks
      await Promise.all(
        acquired.map(l => releaseLock(l.key, l.value))
      );
      return null;
    }
    acquired.push(lock);
  }
  return acquired;
}

Same deadlock prevention as the pessimistic approach — sort the keys first.

The Full Reservation Flow

The distributed booking service combines Redis locks with database transactions for defense in depth:

  1. Acquire distributed locks for all requested seats
  2. Verify availability inside a database transaction (the lock prevents races, but the DB is the source of truth)
  3. Update seat status to RESERVED with a version increment
  4. Schedule cleanup via BullMQ (in case the user abandons the reservation)
  5. Release locks in a finally block — always, even on error
const SEAT_LOCK_OPTIONS: LockOptions = {
  ttlSeconds: 30,
  retryCount: 3,
  retryDelayMs: 100,
};

The lock TTL is 30 seconds — long enough for the database transaction, short enough to recover from a crashed process.

Idempotency: Handling Duplicate Requests

Network failures cause retries. A user's browser might send the same booking request twice. Without idempotency, you'd create two bookings and charge twice.

The solution: the client generates a unique idempotencyKey per booking attempt and sends it with every retry of that same attempt:

// Client generates once, sends on every retry
const idempotencyKey = crypto.randomUUID();
 
POST /api/bookings
{
  "reservationIds": ["..."],
  "idempotencyKey": "abc-123"
}

On the server, the check happens before acquiring any locks:

const existingBooking = await prisma.booking.findUnique({
  where: { idempotencyKey },
});
 
if (existingBooking) {
  return existingBooking; // Safe retry — no duplicate created
}

The idempotencyKey column has a unique index, so even if two identical requests race past the check, the database constraint prevents a duplicate insert. The second request gets a conflict error and can retry, hitting the findUnique path.

Reservation Expiry: Cleaning Up Abandoned Seats

When a user reserves a seat but never completes payment, the seat needs to go back to the pool. Two mechanisms handle this:

Scheduled Jobs (Primary)

When a reservation is created, a BullMQ job is scheduled to fire at the expiry time (default: 10 minutes):

export async function scheduleReservationCleanup(
  data: ReservationCleanupJobData
): Promise<Job> {
  const delay = Math.max(0, expiresAt.getTime() - Date.now());
 
  return reservationCleanupQueue.add('cleanup', data, {
    delay,
    jobId: `cleanup:${data.reservationId}`, // Prevents duplicate jobs
  });
}

If the user completes the booking, the cleanup job is cancelled before it fires.

Periodic Sweep (Fallback)

If the worker was down when a job was scheduled, or Redis lost the job, a periodic sweep catches anything that slipped through:

// Runs every 60 seconds
const expiredReservations = await prisma.reservation.findMany({
  where: {
    status: 'ACTIVE',
    expiresAt: { lt: new Date() },
  },
  take: 100, // Limit per run
});

Both cleanup paths acquire a distributed lock before modifying the seat and double-check the reservation status — the seat might have been confirmed while waiting for the lock.

Compensating Transactions: When Payment Succeeds but Booking Fails

In distributed systems, you can't always wrap everything in a single ACID transaction. What if Stripe charges the card but the database update fails?

The system handles this with compensating transactions:

1. Create booking with status PENDING
2. Create Stripe PaymentIntent
3. User pays → Stripe webhook fires
4. Update booking to CONFIRMED
   └── If this fails:
       → Refund payment automatically
       → Release reserved seats
       → Mark booking as FAILED
       → Create audit log entry

The customer is never charged for a failed booking. The audit log captures the full timeline for debugging.

The Data Model

The schema separates the booking lifecycle into distinct entities:

  • Seats have a version field for optimistic locking and a status enum (AVAILABLE, RESERVED, BOOKED, BLOCKED)
  • Reservations are temporary holds with an expiresAt timestamp
  • Bookings are confirmed purchases with an idempotencyKey (unique index) and payment tracking
  • AuditLog records every state transition with before/after values
model Seat {
  status     SeatStatus @default(AVAILABLE)
  version    Int        @default(0)
  reservedBy    String?
  reservedUntil DateTime?
 
  @@index([eventId, status])
  @@index([reservedUntil])
}
 
model Booking {
  idempotencyKey String @unique
  status         BookingStatus
  paymentStatus  PaymentStatus
 
  @@index([idempotencyKey])
}

The @@index([reservedUntil]) index is specifically for the periodic cleanup query — without it, the sweep would table-scan on every run.

What I Learned

Fail fast beats blocking

The distributed approach returns errors in ~150ms when a seat is taken. The pessimistic approach blocks for ~500ms waiting for the lock. Users get faster feedback, and the system handles higher throughput.

Defense in depth isn't optional

Redis locks prevent race conditions, but Redis can fail. The database transaction inside the lock provides a second layer of protection. Belt and suspenders.

Idempotency keys belong on the client

The server can't generate idempotency keys — it doesn't know if two requests are retries of the same operation or two intentional operations. The client generates the key once and reuses it across retries.

Background cleanup needs a fallback

Scheduled jobs handle the happy path. But workers go down, Redis loses data, jobs get stuck. The periodic sweep is a safety net that catches everything the scheduled jobs miss.

Lua scripts are essential for Redis correctness

Any Redis operation that needs to check-then-act must use a Lua script. Without atomicity, you get the same race conditions you're trying to prevent.

Try It Out

The full source code is on GitHub: jay-1799/seat-masters

Clone the repo, run docker-compose up -d to start PostgreSQL and Redis, then npm run dev for the app and npm run workers for background processing. The test scripts demonstrate the race conditions and prove the distributed locking works:

npm run test:race          # Watch the naive approach break
npm run test:distributed   # 5, 20, 50 concurrent users — exactly 1 wins
#redis#distributed-locking#idempotency#concurrency#postgresql#nextjs#bullmq#typescript

Licensed under CC BY 4.0

Share: