DDIA Notes — Chapter 7: Transactions

Chapter 7 is the longest chapter in the book, and it's the one where I most needed to slow down. The reason it's hard isn't any single idea — it's that the chapter is really a catalogue: a list of subtle race conditions, and for each one, which isolation levels prevent it and which don't. Miss one row of that table and the whole thing feels like fog.

So these notes are deliberately comprehensive. I've tried to cover everything, in an order where each race condition motivates the next isolation level. The single most useful artifact is the summary table near the end — if I only reread one thing, it's that.

Here's the shape of the whole chapter up front:

A transaction is a tool for hiding concurrency and partial-failure problems from the application. ACID names what it promises. Isolation levels are the menu of how much of that promise you actually pay for — from weak (fast, leaky) to serializable (strong, expensive). Most of the chapter is a tour of the leaks.

What a Transaction Actually Is

A transaction groups several reads and writes into one logical unit. The whole unit either commits (all of it succeeds) or aborts/rolls back (none of it takes effect, and it's safe to retry).

The point is to simplify the programming model. Without transactions, the application has to reason about every possible way a write could half-complete: the process crashes mid-update, the network drops, two clients touch the same data at once. Transactions take that whole class of problems and let the application pretend they don't exist.

Crucial framing the book hammers: transactions are not a law of nature. They were invented to make life easier for application developers, and they come with real costs. Plenty of systems deliberately weaken or drop them to gain performance or availability. The question is never "transactions: yes or no" — it's "which guarantees do I need, and what do they cost?"

ACID, Demystified

ACID stands for Atomicity, Consistency, Isolation, Durability. The book is blunt that the term has become mostly a marketing word — implementations vary so wildly that "we're ACID" tells you almost nothing. (The vague opposite, BASE — Basically Available, Soft state, Eventual consistency — is even less precise.) So it's worth nailing down what each letter really means.

Atomicity — all or nothing. This is not about concurrency. It means that if a transaction makes several writes and then a fault happens partway through, the transaction is aborted and any writes already made are discarded. The application can safely retry without worrying about half-applied changes. The book suggests "abortability" would have been a clearer name.

Consistency — the odd one out. This means the database preserves application-specific invariants (e.g., credits and debits across accounts always balance). But here's the catch: that's mostly the application's responsibility, not the database's. The database can enforce some constraints (foreign keys, uniqueness), but it can't stop you from writing data that violates your business rules. The book notes the C was essentially "tossed in to make the acronym work" — it's really a property of the app, not a property the database guarantees.

Isolation — concurrent transactions don't step on each other. The textbook ideal is serializability: the result is as if transactions ran one at a time, in some serial order, even though they actually ran concurrently. In practice serializable isolation is expensive and often not used — which is why most of this chapter is about the weaker levels people use instead.

Durability — once committed, data survives. On a single node, that means written to nonvolatile storage (typically via a write-ahead log). In a replicated system, it means copied to some number of other nodes. The book is careful: there is no such thing as perfect durability — every mechanism has failure modes (disks corrupt, all replicas in one datacenter die). It's about reducing risk, not eliminating it.

Single-Object vs Multi-Object

A transaction usually spans multiple objects — debit one account, credit another; update a record and its secondary index. For this, the database needs to know which operations belong together, usually tracked by the client's TCP connection or an explicit transaction ID.

Why you need multi-object transactions:

Foreign-key references must stay valid across rows.
Denormalized data and materialized views must update together with the source.
Secondary indexes must update in lockstep with the row they index — otherwise a row exists in one index but not another.

Single-object operations are a smaller case: a single increment, or a compare-and-set on one row. Storage engines almost universally provide atomicity (via a log for crash recovery) and isolation (via a lock) for one object. But that's not what people usually mean by a transaction — those guarantees stop at the object boundary.

Aborts and retries. A defining feature of transactions is that a failed one can be thrown away and retried. The book flags that this is more subtle than it looks, and lists the gotchas:

If the transaction actually succeeded but the network dropped the acknowledgment, blindly retrying creates a duplicate.
If the error is from overload, retrying makes the overload worse.
Retrying only makes sense for transient errors, not permanent ones (e.g., a constraint violation).
If the transaction had side effects outside the database (sending an email, charging a card), a retry repeats them — you need idempotency or a two-phase mechanism.
If the client process dies mid-retry, any data it was holding is lost.

Many leaderless (Dynamo-style) stores don't do aborts at all — they're "best effort," leaving cleanup to the application.

Weak Isolation Levels: The Tour Begins

Serializable isolation is the gold standard but it has a real performance cost, so databases ship weaker levels that prevent some concurrency bugs but not others. These weak levels cause real, expensive bugs in production — the book cites cases of financial loss and data corruption. The trouble is they only show up under specific timing, so they slip through testing and bite at scale. The rest of the chapter is essentially: here's each race condition, and here's the weakest level that stops it.

Read Committed

The most basic useful level. It makes two guarantees:

No dirty reads — you only see data that has been committed.
No dirty writes — you only overwrite data that has been committed.

Dirty read: transaction A writes but hasn't committed; transaction B reads A's uncommitted value. Bad because B might act on data that gets rolled back and never existed. It also lets B see some of A's writes but not others — a partially-updated state.

Dirty write: transaction A writes a value; before A commits, transaction B overwrites it. If two real-world updates interleave this way, you can get a corrupted mix. The classic example: two people buy the same car; the sale (one table) and the invoice (another table) end up assigned to different buyers.

Dirty read:
   A: write x=2 (uncommitted) ────┐
   B: read x  ───────────────────▶ sees 2  ✗  (A might roll back!)

Dirty write:
   A: write x=A's value ──┐
   B: write x=B's value ──┴──▶ which one wins, and is it consistent
                                with the other rows each also wrote?  ✗

Implementation: dirty writes are prevented with row-level locks — a transaction holds the lock from the moment it writes a row until it commits or aborts, so no one else can write that row meanwhile. Dirty reads could use read locks too, but a long-running write would then block all readers. So databases instead remember both the old committed value and the new uncommitted value, and serve the old one to everybody until the writer commits.

Snapshot Isolation: Fixing Read Skew

Read committed still allows a nasty anomaly. Picture transferring $100 between two accounts. You read account 1 after the debit but account 2 before the credit lands — total appears to be $100 short. A moment later it's fine again. This is read skew (a.k.a. a nonrepeatable read): a transaction sees different parts of the database at different points in time.

   Transfer: move 100 from acct1 to acct2
   Observer transaction reads:
        read acct1  → 400   (after debit)
        read acct2  → 500   (before credit)
        total = 900, but it should be 1000   ✗

For many things this brief inconsistency is tolerable. For three things it absolutely is not:

Backups — copying the DB takes time; read skew bakes inconsistency permanently into the backup.
Analytic queries and integrity checks — scanning the whole DB while it changes underneath returns nonsense.

The fix is snapshot isolation: each transaction reads from a consistent snapshot — it sees all the data as it was committed at the moment the transaction started. Even if other transactions change things meanwhile, this transaction keeps seeing its frozen view. Readers never block writers and writers never block readers.

How It Works: MVCC

Snapshot isolation is implemented with multi-version concurrency control (MVCC). The database keeps several committed versions of each object side by side, because different in-flight transactions may need to see the database as it was at different times.

The mechanism:

Every transaction gets a unique, always-increasing transaction ID (txid).
Every write is tagged with the txid of the transaction that made it. Each row keeps a created_by and (when deleted/overwritten) a deleted_by txid.
Visibility rules: when a transaction reads, it sees a row only if (a) the row was created by a transaction that had already committed when this transaction started, and (b) the row is not marked for deletion by such a transaction. Writes by transactions that started later, or that were still in progress / aborted, are invisible.

The result: each transaction gets a coherent point-in-time view without locking anything for reads.

Object "balance", with versions:
   created_by=13, deleted_by=17, value=500
   created_by=17,                value=400
A transaction that started at txid=15 sees the value=500 version
   (created by 13 which committed; the deletion by 17 isn't visible
    because 17 started after txid 15).

Indexes under MVCC: one approach has the index point to all versions and filter at read time; another uses append-only / copy-on-write B-trees where each write creates a new root, so each snapshot just references the root that was current when it started.

A naming mess to be aware of: snapshot isolation is called "serializable" in Oracle, and "repeatable read" in PostgreSQL and MySQL. The SQL standard's definition of "repeatable read" is so ambiguous that these names don't reliably tell you what you're getting. Don't trust the label; check what the engine actually does.

Preventing Lost Updates

Snapshot isolation handles readers vs writers. But it doesn't, on its own, handle two writers doing the read-modify-write cycle concurrently. That's the lost update problem.

Both transactions read a value, modify it in application code, and write it back. The second write clobbers the first without incorporating its change. Examples: incrementing a counter, two users editing a wiki page, adding an element to a JSON list.

   A: read counter (42) ──── add 1 ──── write 43 ─┐
   B: read counter (42) ──── add 1 ──── write 43 ─┴── final = 43, should be 44  ✗

Five ways to deal with it:

Atomic write operations. Let the database do the read-modify-write itself: UPDATE counters SET value = value + 1 WHERE .... The best option when your update fits this shape. Implemented with an exclusive lock on the row (sometimes called cursor stability) or by forcing the operation onto a single thread.
Explicit locking. The application locks the rows it's about to modify with SELECT ... FOR UPDATE, forcing other transactions to wait. Necessary when the logic is too complex for an atomic op (e.g., validating a move in a game).
Automatic lost-update detection. Let the transactions run in parallel; the database detects that a lost update happened and aborts the loser to retry. This pairs well with snapshot isolation — PostgreSQL's repeatable read, MySQL/InnoDB, and others detect it automatically. Nice because the application doesn't have to remember to lock anything.
Compare-and-set (CAS). UPDATE ... SET value = new WHERE value = old — only writes if the value hasn't changed since you read it. Watch out: if the WHERE reads from an old snapshot, the check can be fooled.
Conflict resolution under replication. With multiple leaders or leaderless replication, there isn't one copy to lock, so locks and CAS don't work. Instead you allow concurrent writes to create siblings (multiple versions) and merge them afterward, or use commutative atomic operations. Last-write-wins is the easy default here and is prone to lost updates — fine only when losing data is acceptable.

Write Skew and Phantoms

Now the subtlest race in the chapter. Write skew is the generalization of lost update: two transactions read the same set of objects, then update different objects based on what they read.

The canonical example: hospital on-call scheduling. Rule: at least one doctor must be on call. Alice and Bob are both on call and both feel sick. Each transaction checks "are there ≥2 doctors on call?" — both see 2 (each other), both conclude it's safe to drop off, both update their own row. Result: zero doctors on call. No dirty write (different rows), no lost update (different objects) — yet a broken invariant.

   Alice's txn: count on-call doctors → 2, ok → set Alice off-call ─┐
   Bob's txn:   count on-call doctors → 2, ok → set Bob off-call  ──┴─▶ 0 on call ✗

More real-world write-skew situations the book lists: booking a meeting room (two bookings for the same slot), claiming a username, multiplayer-game moves, and double-spending money.

Phantoms. Write skew is usually caused by a phantom: a transaction reads objects matching some condition, and another transaction inserts a new object matching that condition, changing the answer to the first transaction's query. The on-call check, the room-booking check, the username check — all are "is there a row matching X?" queries whose answer a concurrent insert can invalidate.

The trouble is you can't lock a row that doesn't exist yet. One ugly workaround is materializing conflicts: pre-create the rows you'd want to lock (e.g., a row for every room-timeslot combination) so there's a concrete object to lock on. The book calls this a last resort — it's hard to get right and it leaks concurrency-control concerns into your data model.

The key fact: snapshot isolation prevents read skew and (with detection) lost updates, but it does not prevent write skew or phantoms. For those you need full serializability.

Serializability: The Three Implementations

Serializable isolation is the strongest level — it guarantees the result is as if transactions ran one at a time, which by definition rules out every race condition above, including ones nobody anticipated. There are exactly three ways databases actually achieve it.

1. Actual Serial Execution

The simplest idea: literally run transactions one at a time, on a single thread. For decades this was considered absurd, but two changes made it viable: RAM got cheap enough to fit entire datasets in memory (so transactions don't wait on disk), and OLTP transactions are typically short and touch little data. Used by VoltDB/H-Store, Redis, and Datomic.

The catch is that an interactive transaction — where the app sends a query, thinks, sends another — would leave the single thread idle waiting on the network, killing throughput. So serial execution requires submitting each transaction as a stored procedure: the entire transaction logic is sent ahead of time as one packaged unit, with no round-trips mid-transaction.

Interactive (bad for serial):     Stored procedure (good):
   BEGIN                              CALL place_order(args)
   SELECT ...   ← network wait          └─ entire logic runs server-side,
   (app thinks)                            no waiting on the client
   UPDATE ...   ← network wait
   COMMIT

Trade-offs: stored procedures historically meant clunky vendor-specific languages (modern systems use general-purpose languages instead), and they're harder to manage and debug. The fundamental ceiling: throughput is limited to a single CPU core. You can partition data across cores, but any transaction spanning multiple partitions needs cross-partition coordination and is dramatically slower.

2. Two-Phase Locking (2PL)

For about 30 years this was the way to do serializability. (Note: 2PL is not 2PC — two-phase locking is about isolation; two-phase commit is about atomic distributed commit. Unrelated despite the names.)

The rule is stricter than read-committed locking: readers and writers block each other. Several transactions may hold a shared lock to read the same object concurrently, but a writer needs an exclusive lock, which is incompatible with any other lock. So a writer waits for all readers to finish, and readers wait for any writer. Locks are held until the transaction ends — the "two phases" are acquiring locks (growing) and releasing them all at commit/abort (shrinking).

   shared (read) lock:    many transactions can hold it together
   exclusive (write) lock: only one, blocks all shared and exclusive locks
   → "readers block writers, and writers block readers"

To prevent phantoms, 2PL uses predicate locks — locks on all objects matching a search condition, including ones that don't exist yet. Because true predicate locks are expensive, databases approximate them with index-range locks (next-key locking): lock a whole range in an index rather than the exact predicate. Coarser, but cheaper and still safe.

The problem with 2PL is performance. All that locking adds overhead and slashes concurrency. Deadlocks are frequent (two transactions each waiting on a lock the other holds); the database detects them and aborts a victim, which then retries — wasting work. Latency is unstable and the high percentiles (p99) can be very slow. Correct, but often too slow.

3. Serializable Snapshot Isolation (SSI)

The new and promising one — Michael Cahill's 2008 PhD work, used by PostgreSQL's SERIALIZABLE level and FoundationDB. It delivers full serializability with only a small performance penalty over snapshot isolation.

The big difference is that SSI is optimistic rather than pessimistic. 2PL and serial execution are pessimistic — they assume conflict and prevent it up front (by blocking). SSI lets transactions run without blocking, on the bet that conflicts are rare; then, at commit time, it checks whether isolation was actually violated, and if so, aborts and retries the offending transaction.

It's built on top of snapshot isolation (so reads come from a consistent snapshot, MVCC and all), plus an algorithm that detects when a transaction's decisions were based on a premise that's no longer true. The book breaks the detection into two cases:

Detecting stale MVCC reads. A transaction reads from its snapshot, ignoring writes that weren't yet committed when it started. At commit, the database checks whether any of those ignored writes have since committed. If so, the read was based on outdated information — abort.
Detecting writes that affect prior reads. When a transaction writes, the database checks whether that write affects data another transaction has already read (it tracks reads using the indexes). If so, it notifies the affected transaction that its premise has been invalidated — and when that transaction tries to commit, it aborts.

   T1: reads "doctors on call = 2", decides to go off-call, writes
   T2: reads "doctors on call = 2", decides to go off-call, writes
   At commit, SSI sees each transaction's read was invalidated by
   the other's write → aborts one → it retries and now reads "= 1" → safe

Performance: because nothing blocks, SSI shines on read-heavy workloads and avoids 2PL's deadlock/latency issues. Its cost is the abort rate — if conflicts are frequent, lots of retries hurt. And unlike serial execution, it isn't pinned to a single core or partition, so it scales better.

The Summary Table

The whole chapter compresses into this. Rows are race conditions, columns are isolation levels; ✓ means prevented.

Race condition	Read Committed	Snapshot Isolation	Serializable
Dirty read (see uncommitted data)	✓	✓	✓
Dirty write (overwrite uncommitted data)	✓	✓	✓
Read skew (nonrepeatable read)	✗	✓	✓
Lost update (read-modify-write clobbered)	✗	✗ (some engines detect it)	✓
Write skew (read same, write different)	✗	✗	✓
Phantoms (a write changes a query's result)	✗	✗ (mostly)	✓

And the three serializable implementations:

Implementation	Style	Strength	Weakness
Serial execution	One thread, stored procedures	Simple, no locking overhead	Single-core throughput; needs in-memory data
Two-phase locking (2PL)	Pessimistic locking	Battle-tested, correct	Slow; deadlocks; bad tail latency
Serializable snapshot isolation (SSI)	Optimistic + snapshot	Little blocking, scales	Abort rate hurts under high contention

Key Takeaways

A transaction is a tool for hiding concurrency and partial failure. It's not a law of nature — it's a convenience with a cost, and you choose how much of it to buy.

ACID is four separate promises, and "C" barely belongs. Atomicity = all-or-nothing (better called abortability). Isolation = concurrency safety. Durability = survives crashes. Consistency = your app's invariants, mostly your job, not the database's.

Weak isolation levels are defined by which race conditions they leak. Read committed stops dirty reads/writes. Snapshot isolation (MVCC) adds protection against read skew. Neither stops lost updates, write skew, or phantoms on its own.

Lost update, write skew, and phantoms are the three that catch people. Lost update has many fixes (atomic ops, locking, detection, CAS). Write skew and phantoms generally require full serializability.

Serializability is the only level that's safe by default, and there are exactly three ways to get it: run transactions serially, lock pessimistically (2PL), or detect conflicts optimistically (SSI). SSI is the modern sweet spot — serializable guarantees without 2PL's blocking.

Don't trust isolation-level names. Oracle's "serializable" is snapshot isolation; PostgreSQL/MySQL "repeatable read" is snapshot isolation. Check what the engine actually does, not what it's called.