Graphs: Modeling Connections

Dots and lines

Strip away every detail and a [[graph|graph]] is just *dots and lines*. The dots are vertices (also called nodes) — the things — and the lines are edges — the relationships between them. That's the entire definition, and its emptiness is its power: whenever you have *things* and *connections between things*, you have a graph. Cities and roads. People and friendships. Web pages and links. Tasks and "must-happen-before" arrows. All the same abstract object.

You already know a special case: a [[tree|tree]] is just a graph that happens to have a single root, no cycles, and a parent for every node but one. A graph throws all three rules out. It can have cycles (A→B→C→A), nodes with many "parents," disconnected islands, even a vertex with an edge to itself. The freedom is exactly why graphs model the messy real world so well.

  (0)---(1)        an undirected graph, 5 vertices:
   |   / |         edges = { 0-1, 0-2, 1-2, 1-3, 3-4 }
   | /   |
  (2)   (3)---(4)

  note the cycle 0-1-2-0: trees can't do this, graphs can.

A small undirected graph. Vertices are circles, edges are lines; the loop 0-1-2-0 is a cycle.

Three flavors of edge

Edges come in a few flavors, and choosing the right one is half of modeling a problem correctly.

Undirected — the edge goes both ways. Friendship: if you know me, I know you. Drawn as a plain line.
Directed — the edge has a direction, drawn as an arrow. Twitter follows: you can follow someone who doesn't follow back. A→B does not imply B→A.
Weighted — each edge carries a number: a distance, a cost, a travel time. A road map's edges are weighted by kilometers.

These combine freely. A road network is usually undirected and weighted (the road runs both ways, and it's 5 km either direction). A project schedule is directed and unweighted ("pour foundation" must come before "build walls," full stop). A flight network is directed and weighted (a one-way ticket, costing a fare). Naming the flavors before you code saves you from modeling the wrong thing.

Storing a graph: list vs matrix

Dots and lines are an idea; a computer needs them as data. There are two standard representations. The adjacency matrix is an n×n grid where cell `[i][j]` is 1 (or the weight) if there's an edge from i to j, else 0. Simple, and "is there an edge i→j?" is an instant O(1) lookup — but it always uses n² space, even if the graph has only a handful of edges.

The [[adjacency-list|adjacency list]] instead keeps, for each vertex, a list of just its neighbors. It uses only as much space as there are edges, which is why it's the default for the large, sparse graphs you meet in the real world (a city has millions of intersections but each touches only a few roads). In C++ it's wonderfully natural: a `vector` of `vector`s.

graph:  0---1      adjacency LIST (a vector per vertex):
        |  /|        adj[0] = {1, 2}
        | / |        adj[1] = {0, 2, 3}
        2   3        adj[2] = {0, 1}
                     adj[3] = {1}

  same graph as a MATRIX (m[i][j]=1 if edge i-j):
        0 1 2 3        adjacency list:  ~edges of space
      0[0 1 1 0]       matrix:          always n*n space
      1[1 0 1 1]
      2[1 1 0 0]       sparse, big graph -> use the LIST
      3[0 1 0 0]

The same graph two ways. The list stores only real edges; the matrix reserves a cell for every possible pair.

#include <vector>
int n = 4;
std::vector<std::vector<int>> adj(n);   // adjacency list

auto addEdge = [&](int u, int v) {       // UNDIRECTED:
    adj[u].push_back(v);                 //   add both directions
    adj[v].push_back(u);                 //   (drop one line if directed)
};
addEdge(0,1); addEdge(0,2); addEdge(1,2); addEdge(1,3);

An adjacency list in C++. For an undirected graph add the edge both ways; for directed, just one.