SYSTEMSIM
CH 04·Rate Limiter
Overview

Section · Overview

Designing a rate limiter

A rate limiter caps how many requests a client or service can issue in a given window. It protects you from accidental overuse, hostile traffic, and the cost spiral that happens when neither is checked. This chapter walks through five algorithms, the architecture behind them, and the distributed-correctness traps you hit at scale.

Why rate limit

  • Prevent resource starvation from DoS attacks. Most public APIs publish a default ceiling for exactly this reason — Twitter caps writes at 300 / 3h, Google Docs at 300 reads/min/user.
  • Reduce cost. If you bill third-party services per call, a rate limiter is the only thing standing between a buggy retry loop and a six-figure surprise.
  • Prevent server overload. Bots and misbehaving clients can fill your queues; throttling at the edge keeps the rest of the system within capacity.

Where it goes

Client-side

Insecure

A client-side limiter is trivially bypassed by a hostile or modified client. Useful for UX hints only — never as a safety control.

Server-side

In your code

Embed the limiter in the API server. Full control, but every service implements its own, and rules don't compose across teams.

Middleware / Gateway

API gateway

Cloud microservices favor a dedicated gateway tier (Envoy, Kong, AWS API Gateway). Cross-cutting policy, one configuration to maintain.

Five algorithms

click any to read

The chapter compares five strategies. They split along two axes: how they store state (per-request log vs. counter vs. bucket) and what they do at burst (allow / smooth / strict).

Worked scenarios

algorithm choice

Twitter — 300 posts per 3 hours

A user can publish at most 300 posts in any rolling three-hour window. The cap protects fanout infrastructure and discourages spam.

Without a limit, a single compromised account could flood every follower's feed and the message queue. The cap is per-user, sustained over hours.

Recommended · Sliding Window Counter

Three hours is large enough that fixed-window edge effects matter; the per-user counter doesn't need timestamp-level precision, just smooth enforcement.

Google Docs APIs — read requests per minute per user

Default read quota is 300 per 60 seconds per user, configurable up. The limit is short-window and high-frequency.

Read fan-out from a single client (e.g. a custom integration) can saturate downstream services. A short rolling window catches misbehaving clients quickly.

Recommended · Token Bucket

Token bucket tolerates legitimate short bursts (a UI loading several documents at once) while still enforcing the steady-state rate.

Marketing SMS — 5 per day per user

A messaging service caps marketing SMS to 5 per user per day, hard limit. Compliance, not capacity.

User-experience and regulatory reasons. Going over is a real failure, not a soft hint.

Recommended · Sliding Window Log

Five per day is small enough that the per-request memory cost is trivial, and absolute precision matters — no edge-of-window 2× allowed.

Auth — 5 login attempts per minute per IP

Login endpoint capped at 5 attempts per minute per IP to slow credential stuffing without locking out users on retry.

Brute-force defence. The limit must be strict — letting through a 6th attempt on the boundary defeats the point.

Recommended · Sliding Window Log

Strict rolling-window precision; per-IP scope keeps memory bounded.