Binary Search Trees & Balance

One rule that changes everything

A plain binary tree is just a shape; it says nothing about *where* a value should live. A [[binary-search-tree|binary search tree]] (BST) adds a single, strict ordering rule, and that rule is the whole magic: for every node, everything in its left subtree is smaller, and everything in its right subtree is larger. That's it. The rule holds not just for a node's immediate children but for the entire subtree beneath it.

              (8)
             /   \
          (3)     (10)
         /   \        \
      (1)    (6)       (14)

  left of 8  -> {3,1,6}  all < 8
  right of 8 -> {10,14}  all > 8
  ...and the same holds at 3, at 10, everywhere.

A binary search tree. At every node, the whole left subtree is smaller and the whole right subtree is larger.

Search: halving at every step

Searching a BST is the tree version of [[binary-search|binary search]]. Start at the root and compare. Equal? Found it. Smaller than the node? The rule guarantees your target can only be on the left, so go left and forget the entire right subtree. Larger? Go right. Each comparison throws away a whole side of what remains — exactly the halving that made binary search on a sorted array so fast, now expressed as walking down a tree. Insertion follows the identical path: search for the value, and where you fall off the bottom (`nullptr`), that's where the new node hangs.

// search: returns the node holding key, or nullptr
Node* find(Node* n, int key) {
    while (n != nullptr && n->value != key)
        n = (key < n->value) ? n->left : n->right;
    return n;   // each step drops one whole subtree
}

// insert: walk to the empty slot, hang the new node there
Node* insert(Node* n, int key) {
    if (n == nullptr) return new Node{key};
    if (key < n->value) n->left  = insert(n->left,  key);
    else if (key > n->value) n->right = insert(n->right, key);
    return n;   // key already present -> no change
}

BST search and insert. Both walk a single root-to-leaf path, so they cost about one comparison per level.

How many levels are there? If the tree is balanced — roughly the same number of nodes left and right at each step — then n nodes pack into about log₂ n levels. A search touches one node per level, so search, insert, and delete all cost O(log n). A million items, found in about twenty comparisons. That is the whole reason BSTs exist.

When a tree rots into a list

That O(log n) came with three words you must not skip: *when it's balanced*. The BST has no built-in defense against a bad order of insertions. Insert 1, then 2, then 3, then 4, then 5 into an empty BST: each value is larger than everything before it, so it always goes right. You get no branching at all — a tree that's really a one-sided chain.

insert 1,2,3,4,5 in order:

  (1)
     \
      (2)
         \
          (3)
             \
              (4)
                 \
                  (5)   <- height = n-1, not log n!

  searching for 5 now visits all 5 nodes -> O(n)

A degenerate BST. Sorted input makes the tree as tall as a linked list, and search degrades to O(n).

Self-balancing, and the STL gift

The fix is a [[balanced-tree|self-balancing tree]]: after each insert or delete it does a little local restructuring (called a *rotation*) to keep the left and right sides roughly even, so the height stays near log n no matter what order the data arrives in. You don't need the machinery yet, just the idea and two famous names. An AVL tree keeps the two sides' heights within 1 of each other — very strictly balanced, so lookups are fast. A red–black tree colors nodes red or black and obeys coloring rules that keep it *good enough* — a touch less balanced than AVL, but cheaper to maintain.

Here is the practical headline: you almost never hand-write a balanced tree. The C++ standard library ships them. `std::map` and `std::set` are balanced binary search trees (red–black in practice), so they keep their keys sorted and guarantee O(log n) lookup, insert, and erase — automatically, with no degenerate case to fear. When you need a sorted, searchable collection, reach for these and let the library do the balancing.

#include <map>
std::map<std::string, int> ages;   // a balanced BST under the hood
ages["Ana"] = 30;                  // O(log n) insert
ages["Bo"]  = 25;
if (ages.count("Ana")) /* O(log n) lookup */ ;
for (auto& [name, age] : ages)     // iterates in SORTED key order
    std::cout << name << ' ' << age << '\n';
// prints Ana then Bo, alphabetically -- for free, like in-order.

std::map is a ready-made balanced BST: O(log n) operations, and iteration comes out sorted by key.