Sections
Section · Overview
Designing consistent hashing
Horizontal scaling means handing requests and data across a fleet of servers that grows and shrinks over time. Naïve hashing makes adding or removing one server redistribute almost everything — cache hit rates collapse and the cluster thrashes. Consistent hashing changes the math so only a small, bounded fraction of keys move.
The shape of the chapter
- 01The rehashing problem with naïve hash(key) % N
- 02The hash ring, server placement, key placement
- 03Server lookup — clockwise sweep from the key
- 04Add & remove a server — only the affected arc redistributes
- 05Two issues with the basic approach (uniformity)
- 06Virtual nodes — replicas per server smooth distribution
- 07Wrap-up — Dynamo, Cassandra, Discord, Akamai, Maglev
Live Lab is in this same module
After the reference walkthrough, a Live Lab section lets you manipulate the ring directly — add and remove servers, dial vnode count, drop arbitrary keys, and run a backend stress test that reproduces the chapter's “5% std-dev at 200 vnodes” claim on demand.
Section · The problem
The rehashing problem
With naïve hash-mod-N, “which server owns this key?” depends on N. Change N — by adding or removing a server — and the answer changes for almost every key. The animation below shows both grids side-by-side: every cell outlined in red is a key that crossed servers.
Drag N below — every red-outlined cell crossed servers.
The book's point: with naïve hash-mod-N, changing N by even one number flushes ~(N − 1)/N of the cache. Consistent hashing rebinds only the affected arc — bounded redistribution regardless of fleet size.
⚠ Why this matters in production
In a cache cluster, a key's “owner server” is where its value lives. If most keys reassign on a server change, most subsequent lookups miss the cache and stampede the backing store. The chapter calls this a “storm of cache misses”. Consistent hashing fixes it by changing the math: only a bounded fraction of keys reassign on any single change — and that ratio is the green readout under the grids.
Section · Foundations
Hash space & the ring
Pick a hash function with a fixed-size output — anything from MD5 to SHA-1 to FNV-1a. Its output range is the hash space: a contiguous integer interval, e.g. [0, 2³²). To make the math wrap, connect the two ends and you get a hash ring — the animation below walks through how the line becomes a circle.
Why this matters
Same range. Same uniformity. But now there's no edge — every point has a clockwise neighbor. That single geometric property is what the rest of the algorithm relies on: server lookup, add/remove, virtual nodes — every operation expresses as “walk clockwise from this point.”
A note on the hash function
The chapter uses one hash function for placement on the ring and a differentone for the rehashing-problem demo. Inside the ring, cryptographic strength doesn't matter — uniform distribution does. This module uses FNV-1a 32-bitbecause it's deterministic, runs synchronously, and ports to the backend trivially.
Section · Lookup
Server & key lookup
Hash each server's identifier to put it on the ring. Hash each key to put it on the same ring. To find which server owns a key, walk clockwise from the key's position until you hit a server — that's the owner. Below, the demo cycles through keys; click any key chip to inspect it.
Section · Add a server
Adding a server
When server s4 joins, it lands at its hashed position. Only keys in the arc between s4 and its previous counter-clockwise neighbor change owner — the rest of the ring is untouched.
Section · Remove a server
Removing a server
When server s1 leaves the ring, the keys it owned redistribute to the next clockwise server. The arc that s1 used to own is the affected range — everything else is untouched.
Section · The catch
Two issues with the basic approach
Karger et al.'s original consistent-hashing paper from MIT handles the rehashing problem perfectly, but two distribution issues remain when each server has only one position on the ring.
Issue 01 · Non-uniform partitions
Hashing identifiers onto the ring scatters servers randomly. Random points on a circle don't produce equal-sized arcs — some servers end up with a tiny slice while others own a huge one. The chapter notes a partition can be twice the size of another in a 4-server setup.
Issue 02 · Non-uniform key distribution
Even with equal arcs, keys themselves hash unevenly. The pile-up from arc imbalance + key clumping means some servers get dramatically more traffic. A “hot” server in a cluster is exactly this failure mode in production.
Worked illustration · 2 servers, 100 keys, 1 vnode each
Two servers share the ring. With 100 random keys, the histogram below shows the imbalance: equal expected counts are not what we get.
Keys per server
Solution preview · the next section introduces virtual nodes — multiple positions per server — which slashes this imbalance dramatically.
Section · The fix
Virtual nodes
Instead of one position per server, hash N replicas — virtual nodes — onto the ring per physical server. The arcs each server owns become the union of N small arcs spread around the ring; with N ≥ ~100, the partition imbalance and the key-clumping both average out toward uniformity.
1 virtual node per server
4 positions on the ringKeys per server
50 virtual nodes per server
200 positions on the ringKeys per server
What just happened
- ▸Std-dev ratio dropped from 32.5% to 102.0% just by adding more positions per server.
- ▸The chapter cites “5% std-dev at 200 vnodes, 10% at 100” — empirical numbers from production deployments. The Live Lab section reproduces the curve on demand.
- ▸Cost is more memory: each vnode is one extra ring entry. Real systems run with 100–200 vnodes per physical server because the smoothing pays off and the memory is trivial.
Section · Wrap-up
Why this is everywhere
Consistent hashing is a quiet workhorse — once you know the shape, you start spotting it in every distributed-systems paper. The chapter closes by listing where it shows up in production.
Minimized key redistribution
Adding or removing one server only redistributes a fraction of keys (the affected arc), not all of them.
Easy horizontal scaling
Add a node, redistribute the affected arc, done. No global rebalance, no downtime, no hash-mod-N storm.
Hotspot mitigation
Virtual nodes spread one server's load across many small arcs. A single 'hot' celebrity key still hammers one server, but the cluster as a whole doesn't pile up on one node.
Bounded blast radius
Even a node failure only affects the arc that node owned. Recovery is local, not a cluster-wide event.
Production systems that use it
from the chapter's referencesAmazon Dynamo
Problem · Partition data across a fleet of nodes that scales up and down dynamically.
→ Each node owns a contiguous arc of the ring; adding a node only redistributes keys in that arc. Foundational for Dynamo-style stores (Riak, Cassandra, ScyllaDB).
↗ https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfApache Cassandra
Problem · Distribute a tokenized key-value workload across a peer-to-peer cluster.
→ Cassandra hashes each row's partition key onto the ring; vnodes per physical node smooth the distribution. Scaling out is a member-add operation that touches only adjacent ranges.
↗ http://www.cs.cornell.edu/Projects/ladis2009/papers/Lakshman-ladis2009.PDFDiscord — Elixir cluster
Problem · Route 5M concurrent users across a fleet of stateful Elixir nodes.
→ User-id mapped onto a ring of session-handler nodes; rebalancing on node-add costs only the affected arc. Documented as the route to 5M concurrent connections.
↗ https://blog.discord.com/scaling-elixir-f9b8e1e7c29bAkamai CDN
Problem · Pin requests for the same URL to the same edge cache reliably across a global fleet.
→ URLs are hashed onto the edge ring so identical requests land on the same edge with high probability — minimizing origin fetches as the edge fleet changes.
Google Maglev
Problem · Stick TCP connections to the same backend across a fleet of L4 load balancers.
→ Maglev's fast lookup table is built on consistent-hashing principles; same-flow packets reach the same backend even as the LB fleet scales.
↗ https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44824.pdfNext up
The Live Lab section in this same module lets you manipulate the ring directly — add and remove servers, dial the vnode count, drop arbitrary keys — and run a backend stress test that empirically reproduces the chapter's “5% std-dev at 200 vnodes” claim on your chosen configuration.
Chapter 6 →
Consistent hashing is the partition scheme the next chapter assumes from page one. Design a key-value store builds replication, quorum, vector clocks, and Merkle anti-entropy on top of the ring you just walked through.
Section · Live Lab
Interactive ring & stress test
backend · celery · sseTwo panels. The interactive ring lives entirely in the browser — servers, keys, and vnodes are yours to dial. The stress test runs on the backend: a Celery task sweeps every vnode count from 1 to your maximum, computes the std-dev ratio at each, and streams the curve back live. Reference lines on the chart match the chapter's 5%/10% claim so you can verify the result against the book.
Panel 1 · Interactive ring
Manipulate the ring directly — every change recomputes locally.
Servers
4 / 12Keys
- key0→ s1
- key1→ s1
- key2→ s1
- key3→ s1
- key4→ s1
- key5→ s1
- key6→ s1
- key7→ s1
- key8→ s1
- key9→ s1
- key10→ s2
- key11→ s2
- key12→ s2
- key13→ s2
- key14→ s2
- key15→ s0
- key16→ s0
- key17→ s0
- key18→ s1
- key19→ s1
- key20→ s2
- key21→ s1
- key22→ s2
- key23→ s2
- key24→ s1
- key25→ s1
- key26→ s1
- key27→ s1
- key28→ s2
- key29→ s2
- key30→ s3
- key31→ s3
Keys per server
Panel 2 · Distribution stress test
Backend Celery + SSE — sweeps every vnode count and streams the curve.
Run sweeps vnode_min..vnode_max
What to expect
chapter curve: 1000 / vnode_countSweep points
~200
Final stddev
~5.0%
Cross 10%
v ≈ 100
Cross 5%
v ≈ 200
Hits the chapter's '5% at 200 vnodes' production-grade claim. Curve should cross 10% around v=100 and 5% around v=200, ending near ~5.0%.
Awaiting curve data…
Sections
Section · Overview
Designing consistent hashing
Horizontal scaling means handing requests and data across a fleet of servers that grows and shrinks over time. Naïve hashing makes adding or removing one server redistribute almost everything — cache hit rates collapse and the cluster thrashes. Consistent hashing changes the math so only a small, bounded fraction of keys move.
The shape of the chapter
- 01The rehashing problem with naïve hash(key) % N
- 02The hash ring, server placement, key placement
- 03Server lookup — clockwise sweep from the key
- 04Add & remove a server — only the affected arc redistributes
- 05Two issues with the basic approach (uniformity)
- 06Virtual nodes — replicas per server smooth distribution
- 07Wrap-up — Dynamo, Cassandra, Discord, Akamai, Maglev
Live Lab is in this same module
After the reference walkthrough, a Live Lab section lets you manipulate the ring directly — add and remove servers, dial vnode count, drop arbitrary keys, and run a backend stress test that reproduces the chapter's “5% std-dev at 200 vnodes” claim on demand.
Section · The problem
The rehashing problem
With naïve hash-mod-N, “which server owns this key?” depends on N. Change N — by adding or removing a server — and the answer changes for almost every key. The animation below shows both grids side-by-side: every cell outlined in red is a key that crossed servers.
Drag N below — every red-outlined cell crossed servers.
The book's point: with naïve hash-mod-N, changing N by even one number flushes ~(N − 1)/N of the cache. Consistent hashing rebinds only the affected arc — bounded redistribution regardless of fleet size.
⚠ Why this matters in production
In a cache cluster, a key's “owner server” is where its value lives. If most keys reassign on a server change, most subsequent lookups miss the cache and stampede the backing store. The chapter calls this a “storm of cache misses”. Consistent hashing fixes it by changing the math: only a bounded fraction of keys reassign on any single change — and that ratio is the green readout under the grids.
Section · Foundations
Hash space & the ring
Pick a hash function with a fixed-size output — anything from MD5 to SHA-1 to FNV-1a. Its output range is the hash space: a contiguous integer interval, e.g. [0, 2³²). To make the math wrap, connect the two ends and you get a hash ring — the animation below walks through how the line becomes a circle.
Why this matters
Same range. Same uniformity. But now there's no edge — every point has a clockwise neighbor. That single geometric property is what the rest of the algorithm relies on: server lookup, add/remove, virtual nodes — every operation expresses as “walk clockwise from this point.”
A note on the hash function
The chapter uses one hash function for placement on the ring and a differentone for the rehashing-problem demo. Inside the ring, cryptographic strength doesn't matter — uniform distribution does. This module uses FNV-1a 32-bitbecause it's deterministic, runs synchronously, and ports to the backend trivially.
Section · Lookup
Server & key lookup
Hash each server's identifier to put it on the ring. Hash each key to put it on the same ring. To find which server owns a key, walk clockwise from the key's position until you hit a server — that's the owner. Below, the demo cycles through keys; click any key chip to inspect it.
Section · Add a server
Adding a server
When server s4 joins, it lands at its hashed position. Only keys in the arc between s4 and its previous counter-clockwise neighbor change owner — the rest of the ring is untouched.
Section · Remove a server
Removing a server
When server s1 leaves the ring, the keys it owned redistribute to the next clockwise server. The arc that s1 used to own is the affected range — everything else is untouched.
Section · The catch
Two issues with the basic approach
Karger et al.'s original consistent-hashing paper from MIT handles the rehashing problem perfectly, but two distribution issues remain when each server has only one position on the ring.
Issue 01 · Non-uniform partitions
Hashing identifiers onto the ring scatters servers randomly. Random points on a circle don't produce equal-sized arcs — some servers end up with a tiny slice while others own a huge one. The chapter notes a partition can be twice the size of another in a 4-server setup.
Issue 02 · Non-uniform key distribution
Even with equal arcs, keys themselves hash unevenly. The pile-up from arc imbalance + key clumping means some servers get dramatically more traffic. A “hot” server in a cluster is exactly this failure mode in production.
Worked illustration · 2 servers, 100 keys, 1 vnode each
Two servers share the ring. With 100 random keys, the histogram below shows the imbalance: equal expected counts are not what we get.
Keys per server
Solution preview · the next section introduces virtual nodes — multiple positions per server — which slashes this imbalance dramatically.
Section · The fix
Virtual nodes
Instead of one position per server, hash N replicas — virtual nodes — onto the ring per physical server. The arcs each server owns become the union of N small arcs spread around the ring; with N ≥ ~100, the partition imbalance and the key-clumping both average out toward uniformity.
1 virtual node per server
4 positions on the ringKeys per server
50 virtual nodes per server
200 positions on the ringKeys per server
What just happened
- ▸Std-dev ratio dropped from 32.5% to 102.0% just by adding more positions per server.
- ▸The chapter cites “5% std-dev at 200 vnodes, 10% at 100” — empirical numbers from production deployments. The Live Lab section reproduces the curve on demand.
- ▸Cost is more memory: each vnode is one extra ring entry. Real systems run with 100–200 vnodes per physical server because the smoothing pays off and the memory is trivial.
Section · Wrap-up
Why this is everywhere
Consistent hashing is a quiet workhorse — once you know the shape, you start spotting it in every distributed-systems paper. The chapter closes by listing where it shows up in production.
Minimized key redistribution
Adding or removing one server only redistributes a fraction of keys (the affected arc), not all of them.
Easy horizontal scaling
Add a node, redistribute the affected arc, done. No global rebalance, no downtime, no hash-mod-N storm.
Hotspot mitigation
Virtual nodes spread one server's load across many small arcs. A single 'hot' celebrity key still hammers one server, but the cluster as a whole doesn't pile up on one node.
Bounded blast radius
Even a node failure only affects the arc that node owned. Recovery is local, not a cluster-wide event.
Production systems that use it
from the chapter's referencesAmazon Dynamo
Problem · Partition data across a fleet of nodes that scales up and down dynamically.
→ Each node owns a contiguous arc of the ring; adding a node only redistributes keys in that arc. Foundational for Dynamo-style stores (Riak, Cassandra, ScyllaDB).
↗ https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfApache Cassandra
Problem · Distribute a tokenized key-value workload across a peer-to-peer cluster.
→ Cassandra hashes each row's partition key onto the ring; vnodes per physical node smooth the distribution. Scaling out is a member-add operation that touches only adjacent ranges.
↗ http://www.cs.cornell.edu/Projects/ladis2009/papers/Lakshman-ladis2009.PDFDiscord — Elixir cluster
Problem · Route 5M concurrent users across a fleet of stateful Elixir nodes.
→ User-id mapped onto a ring of session-handler nodes; rebalancing on node-add costs only the affected arc. Documented as the route to 5M concurrent connections.
↗ https://blog.discord.com/scaling-elixir-f9b8e1e7c29bAkamai CDN
Problem · Pin requests for the same URL to the same edge cache reliably across a global fleet.
→ URLs are hashed onto the edge ring so identical requests land on the same edge with high probability — minimizing origin fetches as the edge fleet changes.
Google Maglev
Problem · Stick TCP connections to the same backend across a fleet of L4 load balancers.
→ Maglev's fast lookup table is built on consistent-hashing principles; same-flow packets reach the same backend even as the LB fleet scales.
↗ https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44824.pdfNext up
The Live Lab section in this same module lets you manipulate the ring directly — add and remove servers, dial the vnode count, drop arbitrary keys — and run a backend stress test that empirically reproduces the chapter's “5% std-dev at 200 vnodes” claim on your chosen configuration.
Chapter 6 →
Consistent hashing is the partition scheme the next chapter assumes from page one. Design a key-value store builds replication, quorum, vector clocks, and Merkle anti-entropy on top of the ring you just walked through.
Section · Live Lab
Interactive ring & stress test
backend · celery · sseTwo panels. The interactive ring lives entirely in the browser — servers, keys, and vnodes are yours to dial. The stress test runs on the backend: a Celery task sweeps every vnode count from 1 to your maximum, computes the std-dev ratio at each, and streams the curve back live. Reference lines on the chart match the chapter's 5%/10% claim so you can verify the result against the book.
Panel 1 · Interactive ring
Manipulate the ring directly — every change recomputes locally.
Servers
4 / 12Keys
- key0→ s1
- key1→ s1
- key2→ s1
- key3→ s1
- key4→ s1
- key5→ s1
- key6→ s1
- key7→ s1
- key8→ s1
- key9→ s1
- key10→ s2
- key11→ s2
- key12→ s2
- key13→ s2
- key14→ s2
- key15→ s0
- key16→ s0
- key17→ s0
- key18→ s1
- key19→ s1
- key20→ s2
- key21→ s1
- key22→ s2
- key23→ s2
- key24→ s1
- key25→ s1
- key26→ s1
- key27→ s1
- key28→ s2
- key29→ s2
- key30→ s3
- key31→ s3
Keys per server
Panel 2 · Distribution stress test
Backend Celery + SSE — sweeps every vnode count and streams the curve.
Run sweeps vnode_min..vnode_max
What to expect
chapter curve: 1000 / vnode_countSweep points
~200
Final stddev
~5.0%
Cross 10%
v ≈ 100
Cross 5%
v ≈ 200
Hits the chapter's '5% at 200 vnodes' production-grade claim. Curve should cross 10% around v=100 and 5% around v=200, ending near ~5.0%.
Awaiting curve data…