Imagine doing your homework. Every time you need the calculator you could walk all the way to the cupboard down the hall and back. Slow. So instead you grab it once and leave it on your desk. The next time you need it, it's already there. That little spot on your desk is a cache: a small, fast shelf for the things you use a lot. Computers do exactly this. The first time you ask for something it's a slow trip; the computer keeps a copy nearby, and every time after that it's instant. Because a few things get asked for over and over, that tiny shelf catches most of the requests. Fire some requests in the simulator and watch the hit rate climb.
Most people think a cache makes the slow storage itself run faster. In fact the slow store stays exactly as slow as ever; the cache just keeps a nearby copy so you almost never have to visit the slow store at all.
What's actually happening
The instinct is to think a cache makes the slow store faster, the way a shortcut makes a road quicker. It does nothing of the kind. The slow store is exactly as slow as it ever was. What a cache does is avoid going there at all, by keeping a copy of recently-wanted data somewhere much closer and faster. A request that finds its answer in the cache is a hit and comes back almost instantly; one that doesn't is a miss, and pays the full slow-store price before a copy is tucked into the cache for next time.
This only pays off because of a lopsided fact about how data gets used: a small slice of it gets asked for constantly while the rest is barely touched. It is the 80/20 rule again. A handful of web pages, database rows, or memory addresses account for the bulk of all requests. So even a cache that holds a tiny fraction of everything will catch most of the traffic, because most of the traffic is for the same few things. When the cache fills up, it makes room by throwing out whatever was used longest ago, on the bet that the recently-used stuff is what you'll want again.
The numbers are what make caching one of the load-bearing tricks of all computing. A CPU's level-1 cache is only a few dozen kilobytes, yet it answers well over 90% of memory requests, and it does so hundreds of times faster than reaching out to main memory. Your browser caches images so a second visit to a site loads in a blink. Content networks cache videos in cities near you so the bits travel metres instead of oceans. None of it makes the slow part faster. It just makes sure you almost never have to wait for the slow part.
A tiny cache catches most requests because real access is lopsided, so a small fast copy beats speeding up the slow store.
- 1For one day, leave the three things you reach for most (a mug, a spoon, the kettle) out on the counter instead of in cupboards.
- 2Count how many cupboard-trips you save. A few favourite items will account for most of your reaching, even though the cupboards hold far more.
- 3That counter is a cache, and the ratio you notice (a small set serving most of the demand) is exactly why a tiny computer cache works so well.
Common questions
Because real access patterns are lopsided: a small slice of data gets asked for constantly while the rest is barely touched. So even a cache holding a tiny fraction of everything still catches most of the requests.
It evicts something to make room, using a policy such as least-recently-used: it throws out whatever was used longest ago, betting that recently-used data is what you will want next.
No. The slow store stays exactly as slow as ever. A cache simply avoids going there by serving repeat requests from a copy kept somewhere much closer and faster.