JOVANA
Library Glossary Getting Started Three Levels Fields How it works Mission
Join the mission
All guides

Searching: Linear vs Binary

Now we cash in everything so far. Finding a value by checking every slot is **linear search**, O(n). But if the data is *sorted*, you can halve the search each step — **binary search**, O(log n). We'll build a correct binary search in C++ and meet the little invariant that proves it works.

Linear search: check them all, O(n)

The most obvious way to find a value in an array is to look at each slot in turn, from the first to the last, stopping the moment you find it. That's [[linear-search|linear search]], and it always works — even on a jumbled, unsorted list. The cost: in the worst case (the value is last, or absent) you check all *n* items, so it's O(n). Double the data, double the work. It's simple and honest, and for small or unsorted data it's exactly the right tool.

#include <vector>

// Return the index of target in v, or -1 if not present.
int linearSearch(const std::vector<int>& v, int target) {
    for (int i = 0; i < v.size(); i++) {  // check each slot in turn
        if (v[i] == target) return i;     // found it -> done
    }
    return -1;                            // walked the whole array, absent
}
Linear search makes one pass and stops early on a hit. Worst case it touches all n elements — O(n). It needs no sorting at all.

But linear search throws away a gift. If the data happens to be sorted, every comparison tells you more than just "not this one" — it tells you which *direction* the target must lie. Linear search ignores that. The next idea pounces on it.

Binary search: halve it every time, O(log n)

Here's the picture: you're looking up a word in a real paper dictionary. You don't start at page 1. You flip to the *middle*, see whether your word comes before or after, and throw away half the book in one move. Then you do it again on the half that's left, and again — each flip cutting the problem in two. That's [[binary-search|binary search]], and it only works because a dictionary is sorted. Sorting is the price of admission; halving is the reward.

search for 23 in a SORTED array (lo, hi mark the live range):

  index:   0    1    2    3    4    5    6
         [  3 |  8 | 15 | 23 | 42 | 56 | 91 ]
           lo            mid              hi      mid=3 -> v[3]=23 ?

  step 1: lo=0, hi=6, mid=3, v[mid]=23 == 23  -> FOUND at index 3!

  (if we'd wanted 8: 23>8 so drop the right half -> hi=mid-1=2,
   then mid=1, v[1]=8 -> found.  each step throws away half.)
Look at the middle; one comparison discards half the array. n -> n/2 -> n/4 -> ... reaches 1 in about log₂(n) steps — that's the O(log n).

Why is halving so powerful? Starting from *n*, you go *n → n/2 → n/4 → …* until only one item is left. The number of halvings it takes is log₂(n) — and that grows astonishingly slowly. A million items? About 20 steps. A billion? About 30. Compare that to linear search's million or billion steps. *This* is the payoff table from the Big-O guide made real.

A correct binary search, and why it works

Binary search is famously easy to get *almost* right. We keep two markers, `lo` and `hi`, fencing in the range that *could still contain* the target. We look at the middle, then move whichever marker shrinks the range toward the answer. Note the small but vital detail: when we look past the middle, we move to `mid + 1` or `mid - 1`, never back to `mid` — otherwise the range might never shrink and the loop could spin forever.

#include <vector>

// v MUST be sorted ascending. Returns an index of target, or -1.
int binarySearch(const std::vector<int>& v, int target) {
    int lo = 0, hi = (int)v.size() - 1;
    while (lo <= hi) {                 // while the range is non-empty
        int mid = lo + (hi - lo) / 2;  // midpoint (avoids overflow)
        if (v[mid] == target)      return mid;     // hit
        else if (v[mid] < target)  lo = mid + 1;   // target is to the right
        else                       hi = mid - 1;   // target is to the left
    }
    return -1;                         // range emptied -> not present
}
Use lo + (hi - lo) / 2, not (lo + hi) / 2, so the sum can't overflow on huge arrays. lo <= hi keeps the empty range honest. Moving to mid±1 guarantees the range shrinks every step.

How do we *know* it's correct, not just plausible? With a [[loop-invariant|loop invariant]] — a statement that's true before the loop and stays true after every pass. Here it is: *if the target is in the array at all, its index lies within `[lo, hi]`.* It's true at the start (the range is the whole array). Each step only discards a half we've *proven* can't hold the target, so it stays true. When the loop ends with `lo > hi`, the range is empty — and since the invariant held, the target truly wasn't there. The invariant is the bridge from "seems right" to "is right."