Hash Tables: Near-Instant Lookup

The coat-check insight

At a coat check, you don't search every hook for your coat — your ticket number *is* the hook number. You go straight there. A hash table does the same for data: it computes, from your key, exactly which slot to look in. No scanning the whole collection.

The heart of it is the hash function: a small calculation that turns a key (a string, a number, anything) into an array index. Same key in, same index out, every time.

key "cat"  --hash-->  37  --% 8-->  bucket 5
key "dog"  --hash-->  12  --% 8-->  bucket 4

  buckets (an array):
  [ 0 | 1 | 2 | 3 |dog| cat| 6 | 7 ]
                    4    5

look up "cat": hash it -> go straight to bucket 5

A hash function maps each key to a bucket index in an underlying array.

Key to bucket, in C++

Under the hood sits a plain array of "buckets". To store a key, you hash it to a number, then take that number modulo the array size to land on a valid index. A tiny string hasher makes the idea concrete:

// A simple (illustrative) hash for a string:
size_t hashKey(const std::string& s, size_t nBuckets) {
    size_t h = 0;
    for (char c : s)
        h = h * 31 + c;        // mix each character in
    return h % nBuckets;        // fold into a valid index
}
// hashKey("cat", 8) always returns the same bucket.

Mix every character, then % nBuckets to get an index in range.

If the hash spreads keys evenly across buckets, then storing (put) and retrieving (get) each touch about one bucket — that's the famous average O(1). It doesn't matter whether you have 100 keys or 100 million.

Collisions: when two keys want the same slot

Here's the leak. Two different keys can hash to the same bucket — that's a collision. With a finite number of buckets and unlimited keys, collisions are inevitable. The most common fix is chaining: each bucket holds a small linked list of all the entries that landed there.

bucket 5:  cat -> hat -> bat -> /     (all hashed to 5)
bucket 4:  dog -> /

look up "hat": hash -> bucket 5, then
walk the short list comparing keys until found.

Chaining: each bucket is a tiny linked list of colliding entries.

std::unordered_map in practice

You almost never hand-roll a hash table. C++ gives you `std::unordered_map<Key, Value>` — a hash table with average O(1) insert, lookup, and erase. (Its cousin `std::map` keeps keys sorted but is O(log n); reach for `unordered_map` when you only need fast lookup, not order.)

#include <unordered_map>
#include <string>

std::unordered_map<std::string, int> age;
age["alice"] = 30;          // put
age["bob"]   = 25;

if (age.count("alice"))     // membership test, ~O(1)
    std::cout << age["alice"];   // get -> 30

age.erase("bob");           // remove, ~O(1)

std::unordered_map: insert, look up, and erase by key in average constant time.