Imagine a huge shelf and you need one book. Reading every spine until you find it would take forever. A hash table is smarter: it does a quick sum on the book's name to get a shelf number, then walks straight to that shelf. No searching. Sometimes two books get the same number and share a spot, so you check the few that landed together. Insert some keys in the simulator and watch them jump to their buckets.
Most people think finding one item among millions must mean searching through them. In fact a hash table computes the location from the key and goes straight there, touching only one bucket instead of scanning the whole pile.
What's actually happening
Finding one item among a million sounds like it has to be slow. If your data is just a long list, it is: you check the first, then the second, and on average you read through half the pile before you find what you want, which gets painful as the pile grows. The natural assumption is that bigger means slower to search. A hash table breaks that assumption completely, and it does so with a trick that feels almost like cheating.
The trick is to compute the location instead of searching for it. A hash function takes the key, say the word "fox", and crunches it into a number, a bucket index. The same word always produces the same number, so you don't store things wherever there's room, you store each item in the bucket its key computes to. Later, when you want "fox" back, you run the exact same crunch, get the same bucket, and look only there. You never touch the other buckets at all. The simulator shows it directly: each key you insert runs through the hash and drops straight into its slot, and the readout compares the one or two checks a hash table needs against the dozen a list scan would cost.
There's one wrinkle that makes the design honest: two different keys can crunch to the same number, a collision. You can't avoid them, because you're squeezing many possible keys into few buckets. The fix shown here is chaining: each bucket holds a tiny list, so colliding keys just hang in a short chain you scan through. As long as the table has enough buckets that chains stay short, lookups stay nearly instant no matter how big the data grows. That constant-time behaviour is why hash tables sit underneath dictionaries, database indexes, caches, and the variable names in nearly every program you've ever run.
Hash tables compute where an item must be rather than searching, so lookups stay near-instant even as the data grows huge.
- 1Set out a few trays and label them A-E, F-J, K-O, and so on. These are your buckets.
- 2For each letter, drop it in the tray its first letter belongs to. That rule, first letter to tray, is your hash function.
- 3Now find a letter from "Khan": you go straight to the K-O tray and check only the few there, never the other trays. That jump-straight-to-it move is what a hash table does for a computer.
Common questions
That is a collision, and it is unavoidable when you squeeze many possible keys into few buckets. Chaining handles it by storing each bucket as a short list, so colliding keys just hang in a chain you scan through.
Barely. A good hash table finds an item in about the same time whether it holds a hundred keys or a hundred million, as long as it has enough buckets to keep the chains short.
They sit underneath dictionaries, sets, database indexes, caches, and the way a programming language stores your variables by name. The same quiet trick runs beneath nearly every program.