Greedy Algorithms

Making locally optimal choices and proving when that's enough: activity selection, fractional knapsack, Huffman coding, and the exchange-argument proof technique.

The previous chapter spent a great deal of effort building tables, filling in cells, and carefully considering every combination of choices before committing to an answer — because, in problems like the 0-1 knapsack, the locally tempting choice (grab the most valuable item first!) can lead you into a dead end that only becomes visible several steps later. This chapter is about the opposite kind of problem: ones where, astonishingly, making the single best-looking choice at each step — and never reconsidering it — produces a solution that is not just good, but provably optimal. Algorithms built this way are called greedy algorithms, and the central question this chapter trains you to ask is not “how do I write a greedy algorithm?” (that part is usually easy — almost suspiciously easy) but “how do I know I’m allowed to trust it?”

That second question is what separates a greedy algorithm that happens to work on your test cases from a greedy algorithm that is correct. Greedy reasoning is dangerously seductive precisely because it is simple to state and simple to code — and simplicity has a way of feeling like correctness, even when it isn’t. Much of this chapter, therefore, is devoted not to writing greedy code (which you’ll find takes only a few lines per problem) but to the much harder and more valuable skill of proving that greedy is the right call — and, just as importantly, recognizing the warning signs that it isn’t.

What Makes a Problem “Greedy”?

A problem is amenable to a greedy solution when it has two properties that should sound familiar — they’re close cousins of the two hallmarks of dynamic programming from the last chapter, and noticing the family resemblance, with one crucial difference, is the key to telling the two techniques apart:

Optimal substructure — exactly as in DP: an optimal solution to the whole problem contains optimal solutions to its sub-problems.
The greedy-choice property — and here is the crucial difference: a globally optimal solution can always be reached by making a choice that looks best right now, without needing to know how the rest of the problem will play out, and without ever needing to revisit that choice later.

That second property is the rare and precious one. Dynamic programming exists precisely because, for most optimization problems, that property is false — the locally best choice can sabotage the global optimum, and you only find out several decisions later (the 0-1 knapsack is the canonical example: greedily taking the highest-value item first can leave you with leftover capacity too small for anything else useful, when a different first choice would have packed the sack perfectly). Dynamic programming’s tables exist to handle exactly that risk — to consider every possibility rather than commit early. A greedy algorithm is, in a real sense, a dynamic programming solution for the special, fortunate case where you can mathematically prove you’ll never need to look back. When that proof is available, you get to throw away the table, the recursion, and the O(n · W)-style cost — and walk away with something dramatically simpler and faster. When it isn’t, “greedy” is just another word for “wrong, but fast.”

Pause and think: This framing — greedy as “the lucky special case of DP” — is worth sitting with for a moment, because it inverts a common assumption. Many people meet greedy algorithms first (they’re simpler to code) and dynamic programming second (it feels like the “advanced” version). But conceptually, the relationship runs the other way: DP is the general, safe, always-applicable-when-the-structure-fits technique; greedy is the shortcut you only get to take after you’ve done the harder work of proving it’s safe. Keep this the right way around in your head, and you’ll never be tempted to reach for greedy reasoning out of laziness — only out of justified confidence.

Worked Example: Activity Selection

Here is a problem with an unusually clean greedy solution — clean enough that it makes an excellent first example of both “how simple the algorithm is” and “how much work the proof requires.”

The problem: you’re given n activities, each with a start time and a finish time, and a single resource that can host only one activity at a time (a lecture hall, a meeting room, a single CPU core). Select the largest possible set of mutually non-overlapping activities.

Several greedy strategies suggest themselves immediately — and most of them are wrong, which is exactly why this problem is such a good teaching example:

Pick the activity that starts earliest? Tempting, but a long activity that starts at time 0 might block out three short ones that would all have fit in the same span.
Pick the shortest activity first? Better intuition, but still fails — a short activity sitting awkwardly in the middle of the timeline can fragment the remaining time into two pieces, each too small to be useful, when leaving that slot alone would have allowed two longer non-overlapping activities elsewhere.
Pick the activity that finishes earliest? This one works — and the reason it works, not just the fact that it does, is the entire point of this section.

#include <stdio.h>
#include <stdlib.h>

typedef struct {
    int start;
    int finish;
} Activity;

int compare_by_finish(const void *a, const void *b) {
    return ((const Activity *) a)->finish - ((const Activity *) b)->finish;
}

/* Selects a maximum set of non-overlapping activities; returns how many were chosen,
   and fills `chosen` with their indices into the (now finish-time-sorted) array. */
int select_activities(Activity acts[], int n, int chosen[]) {
    qsort(acts, n, sizeof(Activity), compare_by_finish);

    chosen[0] = 0;
    int count = 1;
    int last_finish = acts[0].finish;

    for (int i = 1; i < n; i++) {
        if (acts[i].start >= last_finish) {   /* no overlap with the last chosen activity */
            chosen[count++] = i;
            last_finish = acts[i].finish;
        }
    }
    return count;
}

Sort once — O(n log n) — then a single linear pass that never backtracks: O(n). The whole algorithm is dominated by the sort. Now for the part that actually matters: why is “always take the activity that finishes soonest” guaranteed to leave you with the largest possible remaining selection — every single time, on every input, with no exceptions?

The Exchange Argument: How to Prove a Greedy Choice Is Safe

The standard tool for proving a greedy-choice property is called an exchange argument, and it has a distinctive shape worth memorizing, because you will reuse this exact shape on every greedy proof you ever construct:

Take any optimal solution. If it doesn’t already make the greedy choice, show that you can swap something in it for the greedy choice — without making the solution any worse. Conclude that “an optimal solution that makes the greedy choice” always exists, so you lose nothing by making that choice yourself.

Applied here: suppose S is some optimal selection of activities, and let a be the activity in S that finishes earliest among all of S’s members, while g is the activity that finishes earliest overall (the greedy algorithm’s first pick). By definition of g, we know g.finish <= a.finish. Now build a new set S' by removing a from S and inserting g in its place. Does S' still consist of non-overlapping activities? Yes — every other activity in S started at or after a.finish (since a was the earliest-finishing member of a valid non-overlapping set), and g.finish <= a.finish, so every other activity in S also starts at or after g.finish; the swap introduces no new conflicts. And S' has exactly the same number of activities as S. So S' is also optimal — and it contains the greedy choice. This shows that some optimal solution always contains the earliest-finishing activity — which is exactly the license the greedy algorithm needed to pick it first without fear of regret.

Once g is locked in, the remaining problem — “select a maximum non-overlapping set from the activities that start after g finishes” — is the very same problem, just smaller (this is the optimal-substructure half of the argument, and it’s what licenses the greedy algorithm to repeat the same reasoning at every step, all the way down).

Try it yourself: Walk back through that exchange argument and find the one specific sentence where the property “g finishes no later than any other candidate” gets used to guarantee “no new conflicts are introduced.” Now ask: would the argument survive if you replaced “finishes earliest” with “starts earliest” everywhere? Try to construct the exact small example (three or four activities) where that substitution breaks the proof — and notice that the example you construct is, in essence, a formal version of the “long activity blocks three short ones” intuition from a few paragraphs ago. A good proof and a good counterexample are often two views of the same insight — and the ability to move fluidly between “why does this work” and “what would make this fail” is one of the most powerful diagnostic instincts you can build.

Worked Example: Fractional Knapsack — The Mirror Image of a Problem You Already Know

Recall the 0-1 knapsack from the previous chapter: items with weights and values, a capacity limit, and the rule “take an item whole, or not at all.” That “whole or not at all” rule is exactly what forced the DP table — it’s what created the possibility of a locally-tempting choice turning out to be globally wrong, several decisions downstream.

Now remove that one rule. Suppose items can be divided — you can take 60% of a sack of rice, or 2.5 litres from a barrel of oil — and you want to maximize total value within the same weight limit. This is the fractional knapsack problem, and changing that single rule changes everything about which technique applies:

#include <stdio.h>
#include <stdlib.h>

typedef struct {
    double weight;
    double value;
} Item;

int compare_by_ratio_desc(const void *a, const void *b) {
    const Item *ia = (const Item *) a;
    const Item *ib = (const Item *) b;
    double ra = ia->value / ia->weight;
    double rb = ib->value / ib->weight;
    if (ra < rb) return 1;             /* descending order: higher value-per-weight first */
    if (ra > rb) return -1;
    return 0;
}

double fractional_knapsack(Item items[], int n, double capacity) {
    qsort(items, n, sizeof(Item), compare_by_ratio_desc);

    double total_value = 0.0;
    double remaining = capacity;

    for (int i = 0; i < n && remaining > 0.0; i++) {
        if (items[i].weight <= remaining) {
            total_value += items[i].value;            /* take the whole item */
            remaining -= items[i].weight;
        } else {
            double fraction = remaining / items[i].weight;
            total_value += items[i].value * fraction; /* take exactly the fraction that fits */
            remaining = 0.0;
        }
    }
    return total_value;
}

Sort by value-per-unit-weight, descending, then greedily fill the sack from the top of that list — taking each item whole until one doesn’t fit, then taking exactly the fraction of that item needed to reach capacity, and stopping. Done. O(n log n), dominated entirely by the sort, with a single linear pass after it.

Why Divisibility Is the Hinge the Whole Proof Swings On

Here is the exchange argument for this problem, and watch closely for the exact moment it needs divisibility: suppose an optimal solution does not take the full available amount of the item with the highest value-per-weight ratio, while it does take some positive amount of an item with a strictly lower ratio. Trade an infinitesimal amount ε of weight — move it from the lower-ratio item into the higher-ratio one. Total weight used is unchanged (you removed ε from one and added ε to the other), but total value strictly increases, because every unit of weight is now contributing at the higher ratio rather than the lower one. This contradicts optimality of the original solution. So an optimal solution must already be taking as much as possible of the highest-ratio item — which is exactly the greedy algorithm’s first move.

Now go back and try to run that exact argument on the 0-1 version. The argument demands that you be able to “trade an infinitesimal amount ε” between two items — and 0-1 knapsack’s central rule, items are atomic, take them whole or not at all, makes that operation outright illegal. The proof doesn’t degrade gracefully or get weaker — it simply has no foothold to begin with. One single rule change — divisibility — is the entire difference between “provably optimal in O(n log n) with five lines of greedy logic” and “needs a full DP table and O(n · W) time.” Sit with how dramatic that is: the problem statement barely changed, but the right tool changed completely, in both technique and asymptotic cost. This is, in miniature, the single most important lesson this chapter has to offer — more important than any individual algorithm in it.

Pause and think: Before reading on, predict the answer to this question and then verify it by trying to adapt the code: does the fractional knapsack greedy strategy — “sort by value-per-weight, take greedily from the top” — give you the correct total value for the 0-1 version too, if you simply refuse to take fractions? (Try it on items {weight: 10, value: 60}, {weight: 20, value: 100}, {weight: 30, value: 120} with capacity 50. The greedy-by-ratio order takes item 1 fully — best ratio — then item 2 fully, totalling 160 for 30 units of weight with 20 to spare, but nothing else fits. Now check: is 160 actually the best the 0-1 version can do with capacity 50? Run your knapsack function from the previous chapter on the same input and compare.) The gap you find between the two answers is, concretely and numerically, the price of losing the divisibility property — not an abstract claim, but a number you can point to.

Worked Example: Huffman Coding — Greedy Choices, Layer by Layer

Here is a beautiful problem where the greedy choice isn’t “pick the best item” but “repeatedly combine the two least-promising things you have, until only one remains” — an inversion that takes a moment to get used to, but that turns out to construct something remarkable: an optimal binary code.

The setup: you want to encode a piece of text using a binary code where no codeword is a prefix of another (this “prefix-free” property is what lets a decoder read a stream of bits and know, unambiguously, exactly where each codeword ends — without any separators). Different characters can have codewords of different lengths. Goal: assign shorter codewords to more frequent characters, to minimize the total encoded length.

The greedy idea: build a binary tree from the bottom up. Start with one leaf per character, labelled with its frequency. Repeatedly take the two nodes with the smallest frequencies, and merge them under a new internal node whose frequency is their sum. Keep going until one tree remains. Each character’s codeword is then the path from the root to its leaf — 0 for “go left,” 1 for “go right.”

#include <stdio.h>
#include <stdlib.h>

typedef struct HuffmanNode {
    char character;              /* meaningless for internal nodes */
    int frequency;
    struct HuffmanNode *left;
    struct HuffmanNode *right;
} HuffmanNode;

HuffmanNode *make_leaf(char c, int freq) {
    HuffmanNode *node = malloc(sizeof(HuffmanNode));
    node->character = c;
    node->frequency = freq;
    node->left = node->right = NULL;
    return node;
}

HuffmanNode *make_internal(HuffmanNode *left, HuffmanNode *right) {
    HuffmanNode *node = malloc(sizeof(HuffmanNode));
    node->character = '\0';
    node->frequency = left->frequency + right->frequency;
    node->left = left;
    node->right = right;
    return node;
}

/* Removes and returns the node with the smallest frequency from an unsorted array of pointers. */
HuffmanNode *extract_min(HuffmanNode *nodes[], int *count) {
    int min_index = 0;
    for (int i = 1; i < *count; i++) {
        if (nodes[i]->frequency < nodes[min_index]->frequency) {
            min_index = i;
        }
    }
    HuffmanNode *result = nodes[min_index];
    nodes[min_index] = nodes[--(*count)];     /* swap last element into the gap */
    return result;
}

HuffmanNode *build_huffman_tree(const char chars[], const int freqs[], int n) {
    HuffmanNode **nodes = malloc(n * sizeof(HuffmanNode *));
    for (int i = 0; i < n; i++) {
        nodes[i] = make_leaf(chars[i], freqs[i]);
    }

    int count = n;
    while (count > 1) {
        HuffmanNode *a = extract_min(nodes, &count);
        HuffmanNode *b = extract_min(nodes, &count);
        nodes[count++] = make_internal(a, b);    /* the merged pair becomes one new candidate */
    }

    HuffmanNode *root = nodes[0];
    free(nodes);
    return root;
}

void print_codes(const HuffmanNode *node, char *path, int depth) {
    if (!node->left && !node->right) {            /* a leaf: print its assigned code */
        path[depth] = '\0';
        printf("'%c': %s\n", node->character, path);
        return;
    }
    path[depth] = '0';
    print_codes(node->left, path, depth + 1);
    path[depth] = '1';
    print_codes(node->right, path, depth + 1);
}

Pause and think: extract_min here scans the whole array every time — O(n) per call, called O(n) times, for an O(n²) tree-building cost overall. You have, by this point in the course, met a data structure purpose-built for “repeatedly remove the smallest item efficiently.” Which one — and what would build_huffman_tree’s running time become if nodes were replaced with it? (If the name doesn’t come to you immediately, page back to the closing sections of the Sorting Algorithms chapter, where a particular tree-shaped structure was used to repeatedly extract the maximum element of a shrinking collection — the minimum-oriented version of that exact structure is the standard, textbook-correct way every real Huffman-coding implementation solves precisely this bottleneck.)

The exchange-argument proof that this greedy merge order is optimal is more intricate than the previous two — it requires showing, by induction on the number of distinct characters, that the two least-frequent characters can always be placed as siblings at the deepest level of some optimal tree (and that, if they aren’t in a given optimal tree, swapping them there cannot increase the total cost). It’s a genuinely satisfying proof to work through carefully — but it’s also long enough that reproducing it in full here would crowd out the broader point of this chapter. Treat tracking it down and working through it line by line as a research exercise: it’s an excellent, focused way to practice reading an exchange argument that’s more elaborate than the two you’ve now constructed for yourself, and a natural bridge toward the kind of literature you’ll need to get comfortable with as your study of algorithms deepens.

When Greedy Fails — and Why Recognizing the Failure Matters More Than Any Single Success

By now you may be developing a slightly uneasy feeling: greedy algorithms look so clean, and their proofs feel so satisfying once found, that it’s tempting to start trusting your gut about when one will work. Resist that temptation actively — it is precisely the trap that catches capable programmers. Here are a few situations engineered to break that gut feeling:

Coin change, with arbitrary denominations. With coins {1, 5, 10, 25} (ordinary currency), “always take the largest coin that fits” happens to produce the minimum number of coins for any amount — a pleasant, provable greedy result. But hand the same algorithm the denominations {1, 3, 4} and ask it to make 6: greedy takes a 4, then has to fill the remaining 2 with two 1s, for three coins total — when 3 + 3 does the job in two. The algorithm didn’t get worse. The input changed the rules underneath it, silently. This is precisely why the general coin-change problem is solved with dynamic programming (it’s a close relative of the knapsack recurrence you met last chapter) — the greedy-choice property simply isn’t a property of the problem; it’s a property of specific, well-behaved inputs to the problem, and conflating the two is the single most common way to ship a subtly-broken greedy algorithm into production.
Longest path in a graph. “Always walk to the unvisited neighbour that looks most promising right now” feels reasonable — and produces no guarantees whatsoever. (Worse: the general longest-path problem is NP-hard, a term you’ll meet formally near the end of this course — there may be no efficient correct algorithm of any shape, greedy or otherwise.)
The general knapsack family, which you’ve now seen from both sides: identical-looking problem statement, and the presence or absence of one small rule (divisibility) is the entire difference between “greedy, O(n log n), five lines” and “DP required, O(n · W), a full table.”

Try it yourself: For each of the three scenarios above, try to articulate — in your own words, in one sentence each — which specific piece of the greedy-choice property breaks down, and why. (For coin change with {1, 3, 4}: does taking the largest coin first ever force you into a worse position later? Construct the smallest amount where it does, smaller than 6 if you can find one, and check whether 6 is really the smallest example or whether you’ve just been handed the easiest one to see.) Building the reflex of asking “can I actually prove the greedy-choice property here, or does this just look like the activity-selection problem?” — every single time you’re tempted to reach for a greedy solution — is, far more than memorizing which famous problems are “the greedy ones,” what will keep you from shipping a beautifully simple, beautifully wrong algorithm into a system that depends on it.

A Decision Framework: “Can I Trust Greedy Here?”

When you encounter a new optimization problem and a greedy strategy suggests itself, work through these questions — roughly in the order a careful problem-solver actually would, rather than the order that makes for the tidiest list:

State the greedy rule precisely — not “pick the best one” (best by what measure, exactly?) but a single, concrete, checkable criterion: “earliest finish time,” “highest value-per-weight ratio,” “smallest combined frequency.”
Try to break it before you try to prove it. Spend real effort hunting for a counterexample — a small, concrete input where the greedy rule’s first choice provably leads somewhere worse than some other first choice would have. This is not pessimism; it’s the fastest possible route to the truth, because a five-minute counterexample search that succeeds saves you from building, testing, and trusting an algorithm that was broken from the start.
If you can’t break it, attempt an exchange argument. Take an arbitrary optimal solution, and try to show that you can always transform it — without making it worse — into one that starts with the greedy choice. If you get stuck partway through, that stuck point is informative: it’s often exactly the spot where a counterexample is hiding, waiting to be reverse-engineered from the very obstruction that stopped your proof.
Confirm optimal substructure: once the greedy choice is fixed, is what remains truly an instance of the same problem, only smaller — with no leftover entanglement between the choice you just made and the sub-problem you’re about to recurse into?
If any step fails, you haven’t wasted your effort — you’ve very likely just located the exact table your dynamic programming solution will need to build, because the place where the greedy proof broke is almost always the place where “the locally best choice might not be globally best” first becomes possible — which is precisely the situation a DP table exists to resolve, by keeping every option alive until enough information arrives to choose correctly among them.

Bringing It Together: Practice as a Problem-Solver

Prove or break it. The “job sequencing with deadlines” problem: given n jobs, each with a deadline and a profit, and the rule that each job takes one unit of time and at most one job can run per time unit, choose a schedule that maximizes total profit from jobs completed by their deadlines. A natural greedy rule: sort jobs by profit, descending, and place each one in the latest available time slot at or before its deadline. Either construct a careful exchange argument showing this rule is safe, or find a concrete counterexample showing it isn’t. (Hint: think hard about what “latest available slot” is doing for you here, and whether a different tie-breaking or placement rule for jobs with equal profit could possibly matter.)
Translate a table into a rule. Take the knapsack function from the Dynamic Programming chapter and the fractional_knapsack function from this one. Run both on the same five- or six-item input with the integer weights and values turned into doubles for the fractional version. Now construct, deliberately, an input where the two algorithms’ answers are identical — and a second input where they differ substantially. What property of your second input forced the divergence? Write that property down in one sentence; you have just written, in your own words, the dividing line between “greedy suffices” and “you need the table.”
Design from scratch. The “minimum number of platforms” problem: given the arrival and departure times of n trains at a station, find the minimum number of platforms needed so that no train ever has to wait. Design a greedy strategy (hint: consider processing arrival and departure events in sorted order, and tracking how many trains are “in the station” at once — a running count that rises on arrivals and falls on departures), implement it in C, and then write out — in your own words, in the style of this chapter’s exchange arguments — why you believe your strategy never overcounts or undercounts the platforms genuinely needed at any moment.
Reflection. Across this entire chapter, the most important sentence was arguably this one: “a greedy algorithm is a dynamic programming solution for the special, fortunate case where you can prove you’ll never need to look back.” Reread the fractional-vs-0-1 knapsack comparison with that sentence in mind, and then write — in two or three sentences, in plain language, as if explaining it to a capable friend who has read the DP chapter but not this one — exactly what “being able to prove you’ll never need to look back” means, concretely, in terms of the divisibility rule. If you can make that explanation land cleanly for someone else, you have firmly internalized the one idea this entire chapter exists to teach — and the rest of the chapter’s specific algorithms become, from that point on, just memorable illustrations of a principle you’ll be able to apply to problems no one has shown you yet.