Advanced Structures: Tries, Union-Find & More

The trie: searching by prefix

Imagine typing into a search box and seeing suggestions appear after each keystroke. Behind that is often a [[trie|trie]] (say "try"), a tree where each edge is a single character and each path from the root spells out a prefix. Words that share a prefix share the same path until they diverge. To check whether a word is present, you walk down one character at a time — so a lookup costs O(L) where L is the length of the word, *no matter how many words the trie holds*.

Trie holding {"cat", "car", "can", "dog"}

        (root)
        /     \
       c       d
       |       |
       a       o
     / | \     |
    t  r  n    g*
    *  *  *

(* marks the end of a real word)
Lookup "car":  root -> c -> a -> r , and r is marked  => found, O(3).

A trie. Shared prefixes share nodes; an end-of-word flag distinguishes "car" from a mere prefix "ca".

struct TrieNode {
    TrieNode* child[26] = {nullptr};   // one slot per lowercase letter
    bool isEnd = false;                // true if a word ends here
};

void insert(TrieNode* root, const string& w) {
    TrieNode* cur = root;
    for (char ch : w) {
        int i = ch - 'a';
        if (!cur->child[i]) cur->child[i] = new TrieNode();
        cur = cur->child[i];
    }
    cur->isEnd = true;                 // mark the final node
}

Inserting a word: walk or create one node per character, then flag the last node. Lookup is the same walk without creating.

The payoff is prefix queries, which a hash table cannot do well. To autocomplete "ca", walk to the "ca" node, then collect every word in the subtree below it. A hash table can tell you if "ca" is *exactly* a key, but it has no idea which keys *start with* "ca". That is why tries power autocomplete, spell-checkers, and IP routing tables.

Union-Find: are these connected?

[[union-find|Union-find]] (also called a disjoint-set structure) maintains a collection of groups and answers one question astonishingly fast: "are x and y in the same group?" It supports two operations — union(x, y) merges two groups, and find(x) returns a representative for x's group. Two elements are connected exactly when they share a representative. The idea: store each group as a tree of parent pointers, with the root acting as the group's name.

Elements 0..5.  parent[i] points up; a root points to itself.

Start: each its own group        After union(0,1), union(2,3), union(1,3):
  0  1  2  3  4  5                         3
  ^  ^  ^  ^  ^  ^                       / | \
 (each is its own root)                1  2  (and 0 hangs under 1)
                                        |
                                        0
find(0) walks 0 -> 1 -> 3 ; find(2) walks 2 -> 3 ; same root => connected.

Each group is a tree of parent pointers. find() walks to the root; two elements are connected iff their roots match.

Two small optimizations make this nearly free. Union by rank always hangs the shorter tree under the taller one, so trees stay shallow. Path compression re-points every node you touch during a find() straight at the root, flattening the tree as a side effect. Together they make each operation run in near-O(1) time (formally, the inverse Ackermann function α(n), which is below 5 for any n you will ever meet).

int parent[N], rnk[N];
void init(int n){ for(int i=0;i<n;i++){ parent[i]=i; rnk[i]=0; } }

int find(int x){                       // path compression
    if (parent[x] != x) parent[x] = find(parent[x]);
    return parent[x];
}
void unite(int a, int b){              // union by rank
    a = find(a); b = find(b);
    if (a == b) return;
    if (rnk[a] < rnk[b]) swap(a, b);
    parent[b] = a;
    if (rnk[a] == rnk[b]) rnk[a]++;
}

Union-find with both optimizations — about 15 lines, near-O(1) per call.

What to learn next

Two structures into a big field. A few more are worth knowing by name so you recognize them when a problem calls for them, and can look up the details. They all answer some version of "summarize or update a range of values quickly."

Segment tree — supports both range queries (sum/min/max over a[i..j]) and point updates in O(log n); a workhorse for problems that mix reads and writes over intervals.
Fenwick tree (binary indexed tree) — a lighter, smaller cousin for prefix sums with point updates, also O(log n); fewer features than a segment tree but tiny to code.
Balanced-tree libraries — in real code you rarely hand-roll a balanced tree. Reach for std::map / std::set (ordered, O(log n)) or std::unordered_map / std::unordered_set (hashed, average O(1)); they are the balanced tree and hash table you already understand, battle-tested.