Dynamic Programming

Memoisation and tabulation for overlapping sub-problems: from Fibonacci to the 0-1 knapsack, longest common subsequence, and Kadane's algorithm.

Return, for a moment, to a feeling you were asked to sit with in the Recursion and Divide and Conquer chapter: the mild outrage of watching fib(3) get computed twice, fib(2) get computed three times, and the whole naive recursive Fibonacci function spiral into O(2ⁿ) territory — not because the problem is hard, but because the solution keeps redoing work it has already finished. Dynamic programming (DP) is the discipline built entirely around refusing to let that happen. It is not a specific algorithm, or even a family of algorithms in the way “sorting algorithms” is — it’s a design technique: a way of looking at a problem, recognizing a particular kind of redundancy hiding inside its recursive structure, and restructuring the computation so that redundancy simply cannot occur. Once you can reliably see that structure, an entire universe of exponential-looking problems quietly becomes polynomial.

The Two Hallmarks of a DP Problem

A problem is a good candidate for dynamic programming when it has both of these properties:

Optimal substructure — an optimal solution to the problem can be built from optimal solutions to its sub-problems. (Compare: the shortest path from A to C through B is the shortest path from A to B, plus the shortest path from B to C — you’d never want to use a non-shortest sub-path inside an optimal whole.)
Overlapping sub-problems — a recursive solution would solve the same sub-problems repeatedly, rather than encountering only fresh ones at each step. (Compare: merge sort’s recursive calls — mergeSort([1..4]) and mergeSort([5..8]) never overlap; they’re solving genuinely different problems. That’s why merge sort is divide-and-conquer, not DP — the absence of overlap is precisely what’s missing.)

This second property is the one that unlocks the technique: if sub-problems overlap, then solving each distinct sub-problem exactly once — and reusing that answer every subsequent time it’s needed — turns repeated exponential work into a bounded, polynomial amount of work. The number of distinct sub-problems becomes a hard ceiling on the total work, no matter how many times the recursive structure would have revisited them.

Pause and think: Test this lens on a few problems you already understand well. Does binary search have overlapping sub-problems? Does quicksort’s partitioning step? Does the divide-and-conquer maximum-subarray algorithm from the Recursion chapter? In each case, the answer is no — and noticing why not (the sub-problems are always genuinely disjoint pieces of the input) is just as instructive as recognizing when the answer is yes. Being able to say “this recursive structure does NOT call for dynamic programming, and here’s the specific reason” is just as valuable a skill as spotting when it does — it’s what stops you from reaching for a sledgehammer every time you see a recursive function.

Two Ways to Remember: Memoisation and Tabulation

Once you’ve recognized overlapping sub-problems, there are two complementary ways to eliminate the redundancy — and the choice between them is itself a small design decision worth understanding on its own terms.

Memoisation (top-down) keeps the natural recursive structure, but wraps it in a cache: before computing a sub-problem’s answer, check whether it’s already been solved; if so, return the cached value instead of recursing.

#include <stdlib.h>

#define UNSOLVED -1

long long fib_memo(int n, long long memo[]) {
    if (n < 2) {
        return n;
    }
    if (memo[n] != UNSOLVED) {
        return memo[n];                          /* already solved: reuse it */
    }
    memo[n] = fib_memo(n - 1, memo) + fib_memo(n - 2, memo);
    return memo[n];
}

long long fibonacci_top_down(int n) {
    long long *memo = malloc((n + 1) * sizeof(long long));
    for (int i = 0; i <= n; i++) {
        memo[i] = UNSOLVED;
    }
    long long result = fib_memo(n, memo);
    free(memo);
    return result;
}

Tabulation (bottom-up) inverts the direction entirely: instead of starting from the big problem and recursing down to small ones, start from the smallest sub-problems, solve them directly, and build upward — filling in a table that the larger problems will read from, by the time they need to.

long long fibonacci_bottom_up(int n) {
    if (n < 2) {
        return n;
    }
    long long *table = malloc((n + 1) * sizeof(long long));
    table[0] = 0;
    table[1] = 1;
    for (int i = 2; i <= n; i++) {
        table[i] = table[i - 1] + table[i - 2];   /* every value needed already sits in the table */
    }
    long long result = table[n];
    free(table);
    return result;
}

Both eliminate the exponential blowup completely: each distinct sub-problem fib(k) for k from 0 to n is now solved exactly once, giving Θ(n) time — an astronomical improvement over Θ(2ⁿ), achieved without changing what is being computed, only how many times.

Try it yourself: Look at fibonacci_bottom_up and ask: at any point in the loop, how much of the table is actually still needed? The answer — only the two most recent entries — is the seed of an important optimization called rolling variables: replace the whole array with two scalar variables that you update each iteration, dropping the space cost from O(n) to O(1). Rewrite the function this way. Then ask the harder follow-up question: under what circumstances would this space optimization be impossible — that is, when does a DP solution genuinely need to keep its entire table around, not just the most recent slice? (Hold onto that question; the longest common subsequence problem, just ahead, is a clean example of “needs the whole table,” and contrasting it with Fibonacci’s “needs only the last two entries” will sharpen your sense for when each shape applies.)

Which approach should you reach for? Memoisation mirrors the recursive definition almost verbatim — often the easiest first version to write correctly, and it has a pleasant bonus: it naturally solves only the sub-problems actually needed for this particular input, which matters when the full table would be enormous but any one input only touches a sliver of it. Tabulation avoids recursion’s call-stack overhead and depth limits entirely, often runs faster in practice due to better memory access patterns, and makes space optimizations like rolling variables far more natural to spot and apply. A common, productive workflow: write the memoised version first — it’s usually the fastest path to “correct” — and then, once you understand the dependency structure between sub-problems (which ones need which others), convert to tabulation if performance or stack depth demands it. The conversion is mechanical once you see the pattern: tabulation is just memoisation with the recursion replaced by a carefully ordered set of loops.

Worked Example: The 0-1 Knapsack Problem

Here is the problem in its classic phrasing: you have a knapsack that can carry total weight W, and n items, each with a weight and a value. You may take each item whole or not at all (hence “0-1” — no fractional items, which is what separates this from the greedy fractional version you’ll meet in the next chapter). Which items should you take to maximize total value without exceeding the weight limit?

The path to a DP solution always starts with the same question: what is the smallest set of facts that completely determines the rest of a sub-problem’s solution? Here, if you’re deciding what to do about item i, the only things that matter are which item you’re considering and how much capacity remains. That pair — (i, remaining capacity) — is the state. Once you’ve identified the state, the recurrence almost writes itself: for item i and remaining capacity w, you have exactly two choices — leave item i out (and solve the rest of the problem with the same capacity), or take it (if it fits — and then solve the rest of the problem with weight[i] less capacity, banking value[i] immediately). Take whichever choice yields more:

#include <stdlib.h>

int knapsack(int capacity, const int weights[], const int values[], int n) {
    int **dp = malloc((n + 1) * sizeof(int *));
    for (int i = 0; i <= n; i++) {
        dp[i] = calloc(capacity + 1, sizeof(int));     /* dp[i][w]: best value using items 1..i, capacity w */
    }

    for (int i = 1; i <= n; i++) {
        for (int w = 0; w <= capacity; w++) {
            dp[i][w] = dp[i - 1][w];                    /* option 1: don't take item i */
            if (weights[i - 1] <= w) {
                int with_item = dp[i - 1][w - weights[i - 1]] + values[i - 1];
                if (with_item > dp[i][w]) {
                    dp[i][w] = with_item;               /* option 2: take it, if that's better */
                }
            }
        }
    }

    int result = dp[n][capacity];
    for (int i = 0; i <= n; i++) {
        free(dp[i]);
    }
    free(dp);
    return result;
}

This fills an (n + 1) × (capacity + 1) table, each cell costing O(1) to compute from cells already filled — giving Θ(n · W) time and space. Stop and notice something slightly uncomfortable about that bound: it depends on W, the numeric value of the capacity, not just n, the number of items. An input where W is a billion makes this table impossibly large, even though n might be tiny. This is called pseudo-polynomial time — polynomial in the magnitude of the numbers involved, but not in the size of the input as written down (which, for the number W, is only O(log W) digits). This distinction will return with real teeth in the chapter on NP-completeness — file it away as another instance of “the bound that looks reassuring hides a detail that matters enormously in the right circumstances,” a theme that should feel familiar by now.

Try it yourself: knapsack as written returns only the best value — not which items achieve it. Extend it to reconstruct the actual list of chosen items. (Hint: once the table is filled, start at dp[n][capacity] and walk backward — at each step, ask “did including item i actually change the answer compared to dp[i-1][w], or is this cell identical to the one above it?” That single comparison, repeated as you walk from dp[n][capacity] back toward dp[0][0], reconstructs the whole choice sequence. This “compute the optimal value first, then walk back through the table to recover the optimal path” pattern is not specific to knapsack — it is the standard way nearly every DP solution answers “and what does the optimal solution actually look like?”, and it’s worth drilling until it’s automatic.)

Worked Example: Longest Common Subsequence

Given two sequences (think: two strings, or two DNA sequences, or two versions of a document), a subsequence is what remains after deleting zero or more elements without changing the order of what’s left ("ACE" is a subsequence of "ABCDE", but not a substring — the letters aren’t contiguous). The longest common subsequence (LCS) problem asks: what’s the longest sequence that is a subsequence of both inputs? This is the algorithmic heart of every “diff” tool you’ve ever used to compare two versions of a file.

Apply the same question as before — what state determines everything else? Here, it’s “how far have I progressed through each of the two sequences” — a pair of positions (i, j). The recurrence falls out by considering the last characters of the two prefixes you’re currently comparing: if they match, that character must be part of an optimal common subsequence (a short proof-by-contradiction confirms this — if you had a longest common subsequence that didn’t use this matching pair, you could always extend it by one, contradicting “longest”), so the answer is 1 + the answer for both sequences with that character removed. If they don’t match, at least one of the two characters cannot be part of the LCS — so the answer is the better of “drop the last character of the first sequence” or “drop the last character of the second”:

#include <string.h>

int lcs_length(const char *a, const char *b) {
    int m = (int) strlen(a);
    int n = (int) strlen(b);

    int **dp = malloc((m + 1) * sizeof(int *));
    for (int i = 0; i <= m; i++) {
        dp[i] = calloc(n + 1, sizeof(int));    /* dp[i][j]: LCS length of a[0..i) and b[0..j) */
    }

    for (int i = 1; i <= m; i++) {
        for (int j = 1; j <= n; j++) {
            if (a[i - 1] == b[j - 1]) {
                dp[i][j] = dp[i - 1][j - 1] + 1;
            } else {
                dp[i][j] = (dp[i - 1][j] > dp[i][j - 1]) ? dp[i - 1][j] : dp[i][j - 1];
            }
        }
    }

    int result = dp[m][n];
    for (int i = 0; i <= m; i++) {
        free(dp[i]);
    }
    free(dp);
    return result;
}

This runs in Θ(m · n) — and unlike knapsack’s pseudo-polynomial bound, this is genuinely polynomial in the input size, because string lengths (unlike capacity values) directly are the size of the input. Trace through lcs_length("ABCBDAB", "BDCABA") by hand, filling in the table cell by cell — watching the diagonal-propagation pattern emerge from the dp[i-1][j-1] + 1 rule is one of the clearest “aha” moments dynamic programming has to offer, because it makes the optimal substructure property visible as an actual, physical pattern in a grid of numbers in front of you, rather than an abstract claim you’re asked to take on faith.

Pause and think: Earlier, you were asked when a DP solution can be compressed down to O(1) space (as Fibonacci can, via rolling variables) versus when it genuinely needs its whole table. Look at lcs_length’s recurrence: computing row i needs all of row i - 1 (because j ranges over every column, and dp[i-1][j] for arbitrary j might be needed). That means you can compress to two rows — a much smaller saving than Fibonacci’s “two scalars,” but a real one (O(min(m, n)) space instead of O(m · n)). But notice what you’d lose: the table-walking reconstruction technique from the knapsack exercise needs the entire table to trace a path backward through it. There is no free lunch here — the space optimization and the “recover the actual answer, not just its length” feature are in direct tension, and choosing between them is a real design decision that depends entirely on which one your situation actually needs. (Algorithms that need both — full reconstruction and compressed space — exist, but require substantially cleverer techniques, such as Hirschberg’s algorithm, which combines this exact DP with the divide-and-conquer ideas from two chapters ago. That convergence is not a coincidence — the deeper your toolkit gets, the more its pieces start combining in unexpected ways.)

Coming Full Circle: Kadane’s Algorithm

Recall the maximum-subarray problem from the Recursion and Divide and Conquer chapter — find the contiguous run of an array with the largest sum — and the O(n log n) divide-and-conquer solution we built for it there, alongside a promise that an even better, O(n) solution exists. It’s time to deliver on that promise, and the route there is a textbook example of finding the right DP state.

Define best_ending_here[i] = the maximum sum of any subarray that ends exactly at index i (not “somewhere in the first i elements” — exactly at i; that precision is the entire trick). Here’s the key recursive insight: a subarray ending at index i is either just the single element arr[i] (if extending any subarray ending at i - 1 would only drag the sum down), or it’s “the best subarray ending at i - 1” with arr[i] tacked onto the end. There is no third option — and that’s exactly what makes this a clean recurrence:

best_ending_here[i] = max(arr[i], best_ending_here[i - 1] + arr[i])

The overall answer is simply the largest value across all best_ending_here[i]. And now watch the rolling-variable optimization from the Fibonacci discussion reappear, in a more consequential setting: computing best_ending_here[i] only ever needs best_ending_here[i - 1] — never anything further back — so the entire array collapses into a single running variable:

int max_subarray_kadane(const int arr[], int n) {
    int best_ending_here = arr[0];
    int best_overall = arr[0];

    for (int i = 1; i < n; i++) {
        /* extend the previous best subarray, or start fresh at arr[i] — whichever is larger */
        best_ending_here = (best_ending_here + arr[i] > arr[i]) ? best_ending_here + arr[i] : arr[i];
        if (best_ending_here > best_overall) {
            best_overall = best_ending_here;
        }
    }
    return best_overall;
}

A single linear scan — O(n) time, O(1) space — solving a problem whose brute-force version was O(n²) or O(n³), and whose cleverest divide-and-conquer version was O(n log n). Sit with the fact that this nine-line function outperforms the more “sophisticated-looking” recursive solution from two chapters ago. The lesson is not “divide and conquer was a waste of time” — that solution taught you a generalizable design recipe that solves a vast number of other problems beautifully. The lesson is sharper and more useful than that: the most powerful technique you know is not always the right one for the problem in front of you, and the discipline of asking “is there a simpler characterization of the answer that I’m missing?” — even after you already have a solution that works — is what separates a good algorithm from the best one. That discipline, more than any single algorithm in this chapter, is the actual subject of this course.

A Recipe for Designing Your Own DP Solutions

Faced with a new problem that smells like dynamic programming — usually because it’s asking you to optimize something (longest, shortest, most, fewest, best) over a space of choices that nest inside each other — work through these questions, roughly in order:

What does a sub-problem look like, and what’s the smallest set of parameters that fully describes one? This is the state. Get it wrong (too few parameters) and your recurrence will be incorrect because two genuinely-different sub-problems collapse into one cell; get it wrong the other way (too many parameters) and your solution will be correct but needlessly expensive. Finding the minimal sufficient state is, far more often than not, the entire problem.
How does the answer to a sub-problem relate to the answers of smaller sub-problems? This is the recurrence — and it usually emerges from asking “what’s the last decision made, and what does each possible last decision imply about the rest?”
What are the base cases — the smallest sub-problems, simple enough to answer directly without any recursion?
In what order can I fill in a table so that, by the time I need any cell’s value, the cells it depends on are already filled? This is the bridge from “I have a correct recurrence” to “I have a working tabulated algorithm” — and it’s also where you’ll notice, often immediately, whether a space optimization like rolling variables or rolling rows is available.
Do I need just the optimal value, or the optimal solution itself? If the latter, you’ll need to either keep auxiliary “what choice did I make here?” information alongside the table, or walk back through the completed table reconstructing the choices — and, as the LCS discussion just showed you, that requirement may be in direct tension with any space optimization you were hoping to apply.

Worked Example: Longest Increasing Subsequence

Given an array, find the length of the longest strictly increasing subsequence — a subset of elements (not necessarily contiguous) that are in strictly ascending order. For [10, 9, 2, 5, 3, 7, 101, 18] the answer is 4 (e.g. [2, 3, 7, 101]).

State: lis_ending_at[i] = length of the longest increasing subsequence that ends exactly at index i. This is the Kadane-style trick again — pinning the endpoint converts a global problem into a family of local ones.

Recurrence: to extend a subsequence to include arr[i], we can append arr[i] to any subsequence ending at some j < i where arr[j] < arr[i]. So:

lis_ending_at[i] = 1 + max(lis_ending_at[j]) for all j < i where arr[j] < arr[i]

Base case: every single element is a length-1 increasing subsequence.

int lis_length(const int arr[], int n) {
    int *dp = malloc(n * sizeof(int));
    for (int i = 0; i < n; i++) dp[i] = 1;        /* each element alone is length 1 */

    for (int i = 1; i < n; i++) {
        for (int j = 0; j < i; j++) {
            if (arr[j] < arr[i] && dp[j] + 1 > dp[i]) {
                dp[i] = dp[j] + 1;                 /* extend the subsequence ending at j */
            }
        }
    }

    int best = 0;
    for (int i = 0; i < n; i++)
        if (dp[i] > best) best = dp[i];

    free(dp);
    return best;
}

This runs in O(n²) — two nested loops, O(1) work per pair. A cleverer O(n log n) solution exists using binary search to maintain a “patience sorting” structure, but the O(n²) version makes the DP structure completely transparent and is the right starting point.

Try it yourself: Extend lis_length to also reconstruct the actual subsequence (not just its length) by keeping a parent[] array that records, for each i, which j produced the best extension. Then trace lis_length on [3, 10, 2, 1, 20] by hand — fill in the full dp[] array cell by cell and verify the answer is 3 ([3, 10, 20] or [2, 20] extended — which one, and why?).

Worked Example: Matrix-Chain Multiplication

Given matrices M₀, M₁, …, M_{n-1} where matrix Mᵢ has dimensions dims[i] × dims[i+1], find the order of multiplication that minimises the total number of scalar multiplications. (Matrix multiplication is associative — the result is the same regardless of order — but the cost can differ wildly. For three matrices A(10×100), B(100×5), C(5×50): (AB)C costs 10·100·5 + 10·5·50 = 7500 multiplications; A(BC) costs 100·5·50 + 10·100·50 = 75000.)

This is a different shape of DP from everything else in this chapter — interval DP, where sub-problems are contiguous ranges of the chain, not prefixes. The state dp[i][j] = minimum cost to multiply the chain from matrix i to matrix j.

Recurrence: to compute dp[i][j], try every possible split point k between i and j — multiply the left sub-chain [i..k] and the right sub-chain [k+1..j] separately (costs dp[i][k] and dp[k+1][j]), then multiply the two results together (cost dims[i] × dims[k+1] × dims[j+1]):

int matrix_chain_order(const int dims[], int n) {
    /* dp[i][j]: min cost to multiply matrices i through j */
    int dp[n][n];
    for (int i = 0; i < n; i++) dp[i][i] = 0;     /* single matrix costs nothing */

    /* fill by increasing chain length */
    for (int len = 2; len <= n; len++) {
        for (int i = 0; i <= n - len; i++) {
            int j = i + len - 1;
            dp[i][j] = INT_MAX;
            for (int k = i; k < j; k++) {
                int cost = dp[i][k] + dp[k+1][j]
                         + dims[i] * dims[k+1] * dims[j+1];
                if (cost < dp[i][j]) dp[i][j] = cost;
            }
        }
    }
    return dp[0][n-1];
}

The fill order — by increasing chain length — is the critical design decision. A chain of length len always depends on chains of smaller lengths, so filling in that order guarantees every required sub-problem is ready before it’s needed. This O(n³) solution (three nested loops) is dramatically better than brute-force enumeration of all parenthesizations, which grows as the Catalan number Ω(4ⁿ / n^{3/2}).

Pause and think: Notice that the recurrence “try every split point k” means the answer to dp[i][j] comes not from a single smaller sub-problem (as in Fibonacci or LCS) but from choosing the best among j - i different sub-problem pairs. This is interval DP’s characteristic shape, and once you recognise it you’ll see it recurring in problems like: optimal binary search trees, minimum-cost polygon triangulation, and burst-balloons. The pattern — “split a range at every possible position, take the best” — is always the recurrence; the challenge is always defining the right cost function for a specific split.

Bringing It Together: Practice as a Problem-Solver

Diagnose before you design. For each of the following, decide — and justify in one or two sentences — whether dynamic programming is the right tool, and if so, what the state would be: (a) counting the number of distinct ways to climb a staircase of n steps, taking either 1 or 2 steps at a time; (b) finding the shortest path between two intersections in a city’s road network, where every road has the same length; (c) finding the maximum value of a[i] + b[j] where i < j, given two arrays a and b; (d) sorting an array of a million integers.
Trace it by hand. Fill in the complete dp table for lcs_length("AGGTAB", "GXTXAYB") cell by cell (it’s a 7 × 8 grid — tedious, but illuminating). Then, using only the completed table (no re-derivation), reconstruct one valid longest common subsequence by walking backward from dp[6][7]. What rule did you use, at each step, to decide whether to move diagonally, up, or left?
Convert and compare. Take the memoised fibonacci_top_down and convert it, step by step, into the tabulated fibonacci_bottom_up — but do it as a deliberate exercise in translation: at each step, ask “what is the recursive version implicitly assuming has already been computed, and how do I guarantee that’s true before I need it?” That question, answered carefully and explicitly, is the conversion procedure — and once you’ve done it once by hand, you’ll start seeing the same translation step, almost mechanically, every time you write a memoised solution first and later decide you need a tabulated one.
Reflection. Across this chapter, the single most expensive mistake a beginner makes is reaching for “make a table and fill it in” before nailing down precisely what each cell means. Take any one of the solutions above — knapsack, LCS, or Kadane’s algorithm — cover up the code, and try to state, in one precise sentence, exactly what dp[i][j] (or best_ending_here[i]) represents. If you can state it crisply, the recurrence, the base cases, and the fill order all but write themselves. If you find yourself unable to state it crisply — that, far more than any syntax error, is the signal that you don’t yet have a DP solution; you have only the hope of one. Make “can I state, in one sentence, what each cell of my table means?” the very first question you ask yourself on every DP problem you ever face — it will save you more wasted hours than any other single habit this chapter can offer you.