@shmVirus

Graphs

Vertices, edges, adjacency representations, DFS, BFS, cycle detection, topological sort, and shortest paths — the data structures behind every network problem.

Arrays store sequences. Trees store hierarchies. Graphs store arbitrary relationships — and it turns out that an enormous fraction of the interesting structure in the world is, underneath, a collection of things and a collection of connections between them: cities and roads, web pages and links, tasks and dependencies, people and friendships, molecules and bonds. Every other data structure you study is, in a precise sense, a special case of a graph: an array is a path graph, a tree is a connected acyclic graph, a linked list is a graph where every vertex has degree at most 1 in each direction. Understanding graphs means understanding the most general shape of structured data, and the algorithms that work on everything else.

Vocabulary

A graph G = (V, E) consists of a set of vertices (nodes) V and a set of edges E, where each edge connects a pair of vertices. Key distinctions that affect which algorithms apply:

  • Directed vs undirected — in a directed graph (digraph), edge (u, v) goes from u to v only; in undirected, it connects both ways
  • Weighted vs unweighted — edges may carry a numeric cost (distance, time, capacity) or simply exist / not exist
  • Cyclic vs acyclic — a graph without any cycle is called acyclic; a directed acyclic graph is a DAG
  • Connected — an undirected graph is connected if a path exists between every pair of vertices; a directed graph is strongly connected if a directed path exists in both directions between every pair
  • Degree — the number of edges incident to a vertex; in directed graphs, split into in-degree (edges arriving) and out-degree (edges leaving)

Representations

How a graph is stored determines the cost of every operation. Two standard representations:

Adjacency Matrix

A V × V array where matrix[u][v] holds the edge weight (or 1 / 0 for unweighted). Edge lookup is O(1); listing all neighbours of a vertex requires scanning a whole row — O(V) regardless of how many neighbours actually exist.

#include <stdbool.h>
#include <stdio.h>

#define MAX_V 6

typedef struct {
    int  weight[MAX_V][MAX_V];   /* 0 means no edge */
    int  num_vertices;
} AdjMatrix;

void am_init(AdjMatrix *g, int n) {
    g->num_vertices = n;
    for (int i = 0; i < n; i++)
        for (int j = 0; j < n; j++)
            g->weight[i][j] = 0;
}

void am_add_edge(AdjMatrix *g, int u, int v, int w) {
    g->weight[u][v] = w;
    g->weight[v][u] = w;     /* omit for directed graphs */
}

bool am_has_edge(const AdjMatrix *g, int u, int v) {
    return g->weight[u][v] != 0;
}

Space: O(V²). For a graph with 1000 vertices and 2000 edges (very sparse), an adjacency matrix wastes 99.8% of its memory. For a graph with 1000 vertices and 499,500 edges (complete), it uses every cell efficiently.

Adjacency List

Each vertex stores only the neighbours it actually has — a list (or array) of (vertex, weight) pairs. Space is O(V + E). Listing all neighbours of vertex u costs O(degree(u)) — proportional to how many there actually are, not to the total number of vertices.

#include <stdlib.h>

#define MAX_NEIGHBORS 20

typedef struct {
    int to;
    int weight;
} Edge;

typedef struct {
    Edge  edges[MAX_NEIGHBORS];
    int   count;
} AdjList;

typedef struct {
    AdjList adj[MAX_V];
    int     num_vertices;
} Graph;

void graph_init(Graph *g, int n) {
    g->num_vertices = n;
    for (int i = 0; i < n; i++) g->adj[i].count = 0;
}

void graph_add_edge(Graph *g, int u, int v, int weight) {
    g->adj[u].edges[g->adj[u].count++] = (Edge){ v, weight };
    g->adj[v].edges[g->adj[v].count++] = (Edge){ u, weight };  /* undirected */
}
QueryAdjacency matrixAdjacency list
Does edge (u, v) exist?O(1)O(degree(u))
All neighbours of uO(V)O(degree(u))
All edges in the graphO(V²)O(V + E)
SpaceO(V²)O(V + E)

Real-world graphs are almost always sparse — a social network of a billion people averages a few hundred friends each, not a billion. The adjacency list’s O(V + E) space and degree-proportional traversal make it the default choice for nearly every practical application.

DFS commits to one path as deep as possible, backtracks when stuck, and tries the next alternative — exactly the shape of recursive exploration. The call stack is the DFS stack:

void dfs_recursive(const Graph *g, int u, bool visited[]) {
    visited[u] = true;
    printf("Visit %d\n", u);
    for (int i = 0; i < g->adj[u].count; i++) {
        int v = g->adj[u].edges[i].to;
        if (!visited[v])
            dfs_recursive(g, v, visited);
    }
}

void dfs(const Graph *g, int start) {
    bool visited[MAX_V] = { false };
    dfs_recursive(g, start, visited);
}

For graphs with many vertices, deep recursion risks stack overflow. The iterative version replaces the call stack with an explicit stack:

void dfs_iterative(const Graph *g, int start) {
    bool visited[MAX_V] = { false };
    int  stack[MAX_V];
    int  top = 0;
    stack[top++] = start;

    while (top > 0) {
        int u = stack[--top];
        if (visited[u]) continue;
        visited[u] = true;
        printf("Visit %d\n", u);
        for (int i = 0; i < g->adj[u].count; i++) {
            int v = g->adj[u].edges[i].to;
            if (!visited[v]) stack[top++] = v;
        }
    }
}

Both visit every reachable vertex and every edge exactly once: Θ(V + E) time.

Cycle Detection

In a directed graph, a cycle exists if DFS ever reaches a vertex that is currently being explored (on the recursion stack, not just previously visited). Track this with a three-colour scheme:

#define WHITE 0   /* undiscovered */
#define GRAY  1   /* on the current path — in-progress */
#define BLACK 2   /* fully finished */

bool has_cycle_util(const Graph *g, int u, int color[]) {
    color[u] = GRAY;
    for (int i = 0; i < g->adj[u].count; i++) {
        int v = g->adj[u].edges[i].to;
        if (color[v] == GRAY)  return true;    /* back edge → cycle */
        if (color[v] == WHITE && has_cycle_util(g, v, color)) return true;
    }
    color[u] = BLACK;
    return false;
}

bool has_cycle(const Graph *g) {
    int color[MAX_V];
    for (int i = 0; i < g->num_vertices; i++) color[i] = WHITE;
    for (int i = 0; i < g->num_vertices; i++)
        if (color[i] == WHITE && has_cycle_util(g, i, color)) return true;
    return false;
}

Pause and think: Why does “visited” need to be a three-state colour rather than a simple boolean for directed-graph cycle detection? Consider a graph A → B → C and A → C. When DFS explores from A, it visits B, then C through B, marks C black, returns to B, marks B black, returns to A — then tries A → C again and finds C already BLACK. No cycle. Now try A → B → C → A. When DFS is exploring the path A → B → C, vertex A is still GRAY (on the current path). Finding a GRAY vertex is exactly the signal that identifies a back edge — and a back edge means a cycle. A two-colour approach (visited/not-visited) would confuse “was already fully explored via a different path” with “is currently on this very path” — the first is harmless, the second means a cycle.

BFS explores in widening rings from the source: all vertices at distance 1 first, then all at distance 2, and so on. A queue enforces this discipline — the data structure that guarantees first-discovered, first-explored:

void bfs(const Graph *g, int source, int distance[]) {
    bool visited[MAX_V] = { false };
    int  queue[MAX_V];
    int  front = 0, back = 0;

    for (int i = 0; i < g->num_vertices; i++) distance[i] = -1;

    visited[source]  = true;
    distance[source] = 0;
    queue[back++]    = source;

    while (front < back) {
        int u = queue[front++];
        for (int i = 0; i < g->adj[u].count; i++) {
            int v = g->adj[u].edges[i].to;
            if (!visited[v]) {
                visited[v]   = true;
                distance[v]  = distance[u] + 1;   /* one step further than u */
                queue[back++] = v;
            }
        }
    }
}

The key guarantee: in an unweighted graph, distance[v] after BFS is the true minimum number of edges on any path from source to v. This follows from the order of processing — BFS never visits a vertex at distance k until all vertices at distance k-1 are fully processed, so the first time BFS reaches any vertex, it has taken the shortest possible route.

Time: Θ(V + E) — same as DFS, same code structure, but the queue vs. stack is the entire difference in which vertices get visited in which order.

Try it yourself: On a small graph with 5 vertices and edges {0-1, 0-2, 1-3, 2-3, 3-4}, trace BFS from vertex 0 step by step — write out the queue contents and distance[] array after each dequeue. Then trace DFS from vertex 0 recursively. Compared to BFS, in what order does DFS visit the vertices? Is the DFS visit order a shortest-path order? Why or why not?

Topological Sort: Ordering Dependencies

A topological ordering of a DAG lists vertices such that for every directed edge (u → v), u appears before v. This is the algorithmic foundation of build systems, course prerequisites, package installers, and spreadsheet recalculation.

Kahn’s algorithm uses BFS starting from all vertices with in-degree 0 (nothing they depend on), repeatedly removing each processed vertex and decrementing the in-degree of its neighbours — adding any that reach 0 to the queue:

#include <string.h>

int topological_sort_kahn(const Graph *g, int result[]) {
    int in_degree[MAX_V];
    int queue[MAX_V];
    int front = 0, back = 0;
    int count = 0;

    memset(in_degree, 0, sizeof(in_degree));
    for (int u = 0; u < g->num_vertices; u++)
        for (int i = 0; i < g->adj[u].count; i++)
            in_degree[g->adj[u].edges[i].to]++;

    for (int u = 0; u < g->num_vertices; u++)
        if (in_degree[u] == 0) queue[back++] = u;

    while (front < back) {
        int u = queue[front++];
        result[count++] = u;
        for (int i = 0; i < g->adj[u].count; i++) {
            int v = g->adj[u].edges[i].to;
            if (--in_degree[v] == 0) queue[back++] = v;
        }
    }

    /* if count < num_vertices, a cycle exists — no valid ordering */
    return count;
}

If the result contains fewer than V vertices, the graph has a cycle — some vertices could never reach in-degree 0 because they are part of a loop. Topological sort and cycle detection are two sides of the same question: “is this a DAG?”

Shortest Paths in Weighted Graphs

BFS finds shortest paths when all edges cost 1. For weighted graphs, the edge weight matters.

Dijkstra’s algorithm (non-negative weights) — greedily finalize the closest unvisited vertex and relax its neighbours. The simple array-scan version runs in O(V²); a min-heap version runs in O((V + E) log V). Covered in full in the Algorithms course’s Graph Algorithms chapter, including the correctness proof.

Bellman-Ford (negative weights allowed) — relax every edge V − 1 times. Slower at O(V · E) but handles negative weights and detects negative cycles.

Choosing a Representation

ScenarioReach for
Dense graph (E close to V²), frequent edge-existence queriesAdjacency matrix
Sparse graph — the common caseAdjacency list
”Is v reachable from u?”, connected components, cycle detectionDFS
Shortest path (unweighted), level-order processing, bipartitenessBFS
Valid ordering of tasks with dependenciesTopological sort
Shortest path (weighted, non-negative)Dijkstra
Shortest path (negative weights possible)Bellman-Ford

Bringing It Together: Practice as a Problem-Solver

  1. Representation trade-off. You’re building a social network with 100 million users where the average person has 300 friends. Compute the space used by (a) an adjacency matrix and (b) an adjacency list (assume 8 bytes per pointer/integer). Which is feasible? What does this tell you about when “the simple approach” becomes physically impossible regardless of algorithmic cleverness?

  2. Trace cycle detection. Draw a directed graph with vertices {A, B, C, D} and edges A→B, B→C, C→D, D→B. Trace has_cycle_util starting at A, writing the color[] array at every step. Identify exactly which recursive call sets the flag and returns true — and confirm that the vertex it finds in GRAY state is indeed part of the cycle.

  3. Implement and verify Kahn’s algorithm. Build the dependency graph for compiling five source files: main.c depends on utils.h and io.h; io.c depends on io.h; utils.c depends on utils.h. Encode this as a directed graph, run topological_sort_kahn, and verify the result is a valid build order. Now add an impossible dependency (utils.h depends on main.c) and re-run — confirm the function detects the cycle by returning a count less than the vertex count.

  4. BFS vs. DFS selection. For each of the following, decide whether BFS or DFS is the better tool and explain precisely why the structure of the problem maps to that traversal’s characteristic behaviour: (a) finding the shortest sequence of Wikipedia “see also” links connecting two articles; (b) finding whether a maze has any solution at all; (c) finding all courses a student must complete (directly or indirectly) before taking “Advanced Algorithms.”