Complexity Analysis
Big-O notation, best/worst/average cases, and comparing algorithms analytically.
Suppose two classmates each hand you a working solution to the same problem: given an array of a million names, find out whether “Alex” is in it. The first solution checks every name one by one. The second sorts the array once and then jumps straight to where “Alex” would be. Both programs are correct — they both produce the right answer. Run them on an array of ten names and you won’t see any difference; both finish before you can blink. Run them on an array of ten million, and one finishes instantly while the other takes noticeably longer.
This chapter is about developing the vocabulary and the habits of mind to predict that difference before you run either program — to look at an algorithm, or even just sketch one on paper, and reason confidently about how it will behave as the input grows. That skill — not “can I make this work?” but “how will this scale, and can I do better?” — is what separates someone who writes code from someone who designs algorithms.
Why Not Just Time It?
The obvious way to compare two programs is to run them and use a stopwatch. This is tempting, but it is a trap, for three reasons:
- Hardware varies. A program that takes 2 seconds on your laptop might take 0.2 seconds on a server and 20 seconds on a phone. Wall-clock time tells you about the machine as much as the algorithm.
- Implementation details leak in. A sloppy implementation of a great algorithm can lose to a careful implementation of a mediocre one, for small inputs. Timing measures both the idea and the execution, tangled together.
- It doesn’t generalize. Timing a program on an array of 1,000 items tells you almost nothing about how it will behave on an array of 100,000,000 items — and that gap is exactly where the differences that matter start to show up.
What we want instead is a way to describe how the amount of work an algorithm performs grows as the size of its input grows — a description that survives the move from one machine to another, from one programming language to another, from an array of a thousand items to an array of a billion. That description is asymptotic analysis, and its language is Big-O notation.
Pause and think: Before reading on, try to articulate in your own words why “this program ran in 4 seconds” is a much weaker statement than “this program does roughly
n²units of work for an input of sizen.” What information does the second statement carry that the first doesn’t?
Counting Operations, Not Seconds
Instead of measuring time, we count elementary operations — comparisons, assignments, arithmetic operations, array accesses — as a function of the input size, conventionally written n. We don’t care about the exact count (that depends on the machine and the compiler); we care about how the count grows as n grows.
Consider this function, which finds the largest value in an array of n integers:
int find_max(int items[], int n) {
int largest = items[0]; /* 1 operation */
for (int i = 1; i < n; i++) { /* loop runs n-1 times */
if (items[i] > largest) { /* 1 comparison per iteration */
largest = items[i]; /* at most 1 assignment per iteration */
}
}
return largest;
}
If the array has n items, this function performs roughly 1 + (n - 1) + (n - 1) operations in the worst case — call it 2n - 1. Now compare it to a function that checks whether an array contains any duplicate values by comparing every pair:
#include <stdbool.h>
bool has_duplicate(int items[], int n) {
for (int i = 0; i < n; i++) {
for (int j = i + 1; j < n; j++) {
if (items[i] == items[j]) {
return true;
}
}
}
return false;
}
Here, the inner loop runs , then , then times, and so on, for a total of roughly comparisons — about .
Look at how differently these two expressions behave as n grows:
n | 2n - 1 (find_max) | n²/2 (has_duplicate) |
|---|---|---|
| 10 | 19 | 50 |
| 100 | 199 | 5,000 |
| 1,000 | 1,999 | 500,000 |
| 1,000,000 | 1,999,999 | 500,000,000,000 |
At n = 10, the two functions are in the same ballpark. At n = 1,000,000, one does about two million operations and the other does five hundred billion — a gap of five orders of magnitude. The constants (2, -1, /2) become irrelevant; what matters is the shape of the growth: linear versus quadratic. That shape is what asymptotic notation captures.
Big-O: An Upper Bound on Growth
Big-O notation describes an upper bound on how an algorithm’s running time grows. Formally:
A function is if there exist positive constants and such that
Read in plain language: eventually, and up to a constant factor, grows no faster than . The phrase “up to a constant factor” is doing a lot of work here — it’s precisely what lets us say and are “the same kind of thing” (both ), while is a fundamentally different kind of thing ().
A few worked examples to build intuition:
- is . Choose and : for any , .
- is . Choose and : for , .
- is also technically , , and — Big-O is an upper bound, and any larger function is also an upper bound. In practice, though, we always state the tightest bound we can prove, because that’s the one that actually informs a decision.
Try it yourself: Using the definition above, convince yourself that
f(n) = 100(a constant, independent ofn) isO(1). What constantscandn₀would you choose? Now do the same forf(n) = log₂(n) + 5.
Big-Omega and Big-Theta
Big-O bounds growth from above — it’s a worst-case-shaped guarantee (“this won’t be any worse than…”). Two companion notations round out the picture:
- Big-Omega (
Ω) bounds growth from below:f(n)isΩ(g(n))iffgrows at least as fast asg, eventually and up to a constant. It’s a best-case-shaped guarantee (“this won’t be any better than…”). - Big-Theta (
Θ) is a tight bound:f(n)isΘ(g(n))if it is bothO(g(n))andΩ(g(n))— that is,fandggrow at exactly the same rate, up to constants.
For find_max, the function always performs roughly 2n - 1 operations regardless of the input’s contents, so we can say its running time is Θ(n) — a tight characterization. For some algorithms (binary search is the classic example, covered in the next chapter), the best case (Ω(1), the target is the very first element examined) and worst case (O(log n)) genuinely differ, so no single Θ bound captures both extremes — we describe each case separately.
In casual conversation, programmers often say “Big-O” when they really mean “Big-Theta” (a tight bound), because that’s usually the most informative statement to make. We’ll follow that convention loosely in this course, but it’s worth knowing the precise distinction — it will matter when you start proving things rigorously.
The Complexity Zoo: Common Growth Rates
Here are the growth rates you’ll encounter constantly, ordered from “best” to “worst,” together with a typical algorithm that exhibits each one:
| Notation | Name | Example | Doubling n from 1,000 to 2,000… |
|---|---|---|---|
O(1) | Constant | Array index lookup, hash table access | …changes nothing |
O(log n) | Logarithmic | Binary search, balanced BST operations | …adds one extra step |
O(n) | Linear | Scanning an array once | …doubles the work |
O(n log n) | Linearithmic | Merge sort, heap sort, sorting in general | …slightly more than doubles |
O(n²) | Quadratic | Bubble sort, comparing all pairs | …quadruples the work |
O(n³) | Cubic | Naive matrix multiplication | …multiplies work by 8 |
O(2ⁿ) | Exponential | Trying every subset, naive recursive Fibonacci | …squares the work |
O(n!) | Factorial | Trying every permutation (e.g. brute-force travelling salesman) | …becomes astronomically larger |
The jump from polynomial growth (n, n², n³, …) to exponential growth (2ⁿ, n!) is the single most important boundary in this table — and indeed in all of computer science. An algorithm that takes n² microseconds on an input of size 50 finishes in 2.5 milliseconds. An algorithm that takes 2ⁿ microseconds on the same input takes about 35 years. We will return to this boundary in the chapter on NP-completeness, where it becomes the dividing line between “hard but tractable” and “believed to be fundamentally intractable.”
Pause and think: A common intuition trap is believing that a “fast” algorithm with bad complexity will always lose to a “slow” algorithm with good complexity. That’s only true eventually. An
O(n²)algorithm with tiny constants can comfortably outperform anO(n log n)algorithm with large constants for every input size you’ll ever realistically see — which is exactly why many real-world systems switch strategies based on input size (for instance, sorting libraries that fall back to insertion sort for small sub-arrays; we’ll see this in the next chapter). Asymptotic analysis tells you about trends, not about which algorithm wins on your specific machine with your specific data today. Both pieces of information matter, and a good engineer keeps them separate.
A Toolkit for Analyzing Code
You rarely need to invoke the formal definition of Big-O directly. Instead, you build up an estimate by composing a small number of rules:
1. Sequential statements add (and the largest term wins). If a block of code does something that’s O(f(n)) followed by something that’s O(g(n)), the whole block is O(f(n) + g(n)), which simplifies to O(max(f(n), g(n))). Doing one O(n) pass and then one O(n²) pass is O(n²) overall — the linear pass is asymptotically irrelevant.
2. Loops multiply. A loop that runs k times, each iteration doing O(f(n)) work, costs O(k · f(n)). This is why a single loop over the input is O(n), and why nested loops — a loop inside a loop, each running roughly n times — give you O(n · n) = O(n²).
3. Conditionals take the worst branch. An if/else where one branch costs O(f(n)) and the other O(g(n)) costs O(max(f(n), g(n))) in the worst case — but remember that the best case might take the cheaper branch, which is why best-case and worst-case analyses can diverge.
4. Function calls cost what the function costs. If you call a function that’s O(f(n)) inside a loop that runs O(k) times, the total is O(k · f(n)). This rule trips people up constantly: a single linear scan tucked inside a loop — “is this value already in the array I’ve collected so far?” — can silently turn an O(n) algorithm into an O(n²) one.
5. Recursive calls require their own analysis — typically by writing down a recurrence relation that describes the cost in terms of smaller instances of the same problem, and then solving it. We’ll develop this systematically in the chapter on recursion and divide-and-conquer; for now, just notice that “how many times does this function call itself, and how does the input shrink each time?” are exactly the right questions to ask.
Worked example: spotting the dominant cost
typedef struct {
double value;
int category;
} Record;
void summarize(Record records[], int n, double *out_total,
int unique_categories[], int *out_unique_count) {
double total = 0.0;
for (int i = 0; i < n; i++) { /* O(n) */
total += records[i].value;
}
int unique_count = 0;
for (int i = 0; i < n; i++) { /* O(n) outer loop... */
bool seen = false;
for (int j = 0; j < unique_count; j++) { /* ...times an O(k) scan */
if (unique_categories[j] == records[i].category) {
seen = true;
break;
}
}
if (!seen) {
unique_categories[unique_count++] = records[i].category;
}
}
*out_total = total;
*out_unique_count = unique_count;
}
The first loop is a clean O(n). The second loop runs n times, and each iteration scans unique_categories, whose length grows up to k (the number of distinct categories, with k ≤ n) to check “have I already recorded this one?” In the worst case — every record has a different category — that inner scan costs O(n), making the whole second loop O(n²). By rule 1, sequential blocks combine by taking the larger term, so summarize is O(n) + O(n²) = O(n²) overall — even though, reading the code casually, you might say “it’s just two loops, so it’s O(n).”
Try it yourself: Redesign the data structure that backs
unique_categoriesso the “have I seen this before?” check costsO(1)on average instead ofO(k), bringing the whole function down toO(n). (Hint: think about computing an index directly from the category value, rather than scanning to find it. You’ll meet the general-purpose version of this idea — the hash table — in the Data Structures course, but see if you can sketch your own special-purpose version first, assuming category codes are small non-negative integers.)
Best Case, Worst Case, Average Case
The running time of many algorithms depends not just on the size of the input but on its contents and arrangement. Searching for a value that happens to be the first element you check is fast; searching for one that isn’t there at all forces you to look everywhere. We distinguish three perspectives:
- Best case — the most favorable input of a given size. Useful for understanding an algorithm’s potential, but dangerous to rely on, since you rarely control your input.
- Worst case — the most unfavorable input of a given size. This is the perspective we adopt by default, because it gives you a guarantee: “no matter what you throw at this, it will not be slower than this bound.” Guarantees are what you build reliable systems on.
- Average case — the expected running time over some distribution of inputs (often “all inputs equally likely”). This is more informative than the worst case for algorithms whose worst case is rare and pathological — quicksort is the textbook example, and we’ll dig into exactly why in the sorting chapter — but it requires you to state, and justify, an assumption about what “average” input looks like. An average-case analysis is only as trustworthy as that assumption.
Pause and think: Linear search through an unsorted array has a best case of
Ω(1)(the target is first), a worst case ofO(n)(the target is last, or absent), and — assuming the target is equally likely to be at any position — an average case of aboutn/2comparisons, which is stillΘ(n). Notice that the average case has the same asymptotic class as the worst case here, just a smaller constant. When does an average-case analysis actually change the class of an algorithm’s behavior, rather than just its constant factor? Keep this question in your back pocket — quicksort will answer it vividly.
Amortized Analysis: When “Worst Case Per Operation” Is the Wrong Question
Sometimes an individual operation is occasionally expensive, but expensive operations are so rare that the average cost per operation, over a long sequence, is small. Analyzing this requires a different lens: amortized analysis, which asks “what’s the total cost of n operations, divided by n?” rather than “what’s the worst possible cost of one operation?”
The canonical example is a dynamic array — a growable array you build yourself in C with malloc and realloc (the same idea underlies Python’s list, Java’s ArrayList, and C++‘s std::vector under the hood). When you append to a dynamic array that’s full, it must allocate a new, larger backing buffer — typically double the size — and copy every existing element across:
typedef struct {
int *data;
int length;
int capacity;
} IntVector;
void vector_push(IntVector *v, int value) {
if (v->length == v->capacity) {
int new_capacity = (v->capacity == 0) ? 1 : v->capacity * 2;
v->data = realloc(v->data, new_capacity * sizeof(int)); /* copies old data */
v->capacity = new_capacity;
}
v->data[v->length++] = value;
}
That single resizing call is O(n) — it copies every element seen so far. Doesn’t that make vector_push an O(n) operation?
It does not — and working out why is one of the most satisfying “aha” moments in this whole course. Because the capacity doubles each time it fills up, resizes happen when the array holds 1, 2, 4, 8, 16, … elements, and each resize copies that many elements. After pushes, the total copying work across all the resizes is at most:
(This is a geometric series — each term is at most half of the next, so the whole sum is bounded by twice the largest term.) Spread that total cost evenly across all pushes, and each push costs on average — even though any individual push might trigger an O(n) resize. We say the amortized cost of vector_push is O(1).
Try it yourself: What would happen to this argument if
new_capacitywere computed asv->capacity + 10(grow by a fixed amount) instead ofv->capacity * 2(grow by doubling)? Work out the total copying cost afternpushes under that strategy, and compare it to2n. This is exactly the kind of design decision — “grow by doubling vs. grow by a constant” — that separates a dynamic array withO(1)amortized appends from one withO(n)amortized appends. The difference is invisible in a dozen lines of code and enormous in a production system handling millions of operations.
Space Complexity
Everything above concerns time complexity — how the number of operations grows. Space complexity asks the parallel question about memory: how does the amount of extra storage an algorithm needs grow with n? The same Big-O vocabulary applies directly.
find_max uses a single extra variable (largest), regardless of how large the input array is — its space complexity is O(1), often called “in-place” or “constant extra space.” A function that allocates and returns a new array of length n uses O(n) extra space. A naive recursive function that calls itself n times before returning uses O(n) space on the call stack — every nested call keeps its local variables and return address alive in memory until it returns — even if it allocates no other memory with malloc. “How deep does the call stack go?” is a space question as much as “how many calls happen in total?” is a time question, and in C this isn’t an abstraction: a sufficiently deep recursion will exhaust the stack and crash your program with a segmentation fault.
Time and space can often be traded against each other — using more memory to remember previous results (an idea you’ll meet in full force in the Dynamic Programming chapter) can turn an exponential-time algorithm into a polynomial-time one. Recognizing that trade-off, and deciding which resource you can afford to spend, is itself a core algorithmic skill.
Common Pitfalls
As you start analyzing your own code, watch out for these recurring traps:
- “It’s just a loop inside a loop, so it must be
O(n²).” Not necessarily — if the inner loop’s range depends on the outer index and shrinks (as inhas_duplicateabove, or in many graph algorithms), the total work can beO(n)orO(n log n). Always do the sum, don’t just pattern-match on the shape of the code. - “This library function is free.” Operations like
strcat,strlen,memmove, or searching through a linked structure often hide a linear (or worse) scan inside a single, innocent-looking call. Callingstrleninside the condition of a loop that iterates over the same string is a classic way to silently turn anO(n)scan into anO(n²)one. Learn the cost of the standard-library functions you reach for — it’s some of the highest-leverage knowledge you can acquire. - “Big-O tells me which algorithm to use.” It tells you how things will trend as
ngrows large — an essential, but not the only, ingredient in a real engineering decision. Constant factors, memory limits, code complexity, and the actual sizes of inputs you expect to see all matter too. Asymptotic analysis is a lens, not a verdict. - Confusing the number of operations with the size of the numbers involved. An algorithm that does
nadditions of numbers with up tondigits each isn’t doingO(n)work in any meaningful physical sense — each addition itself costs more as the numbers grow. Most of the analysis in introductory courses (and in this one) assumes numbers fit in a fixed-size machine word — anintorlong— an assumption called the unit-cost model. It’s worth knowing you’re making it.
Bringing It Together: Practice as a Problem-Solver
Analyzing complexity is a skill you build by doing it, repeatedly, on code you didn’t write — including code that looks deceptively simple. Before moving on, work through these:
-
Classify each of these by tightest Big-O bound, and justify your answer in one or two sentences each:
- A function that prints every pair
(i, j)withi < jfrom an array ofnitems. - A function that, given a sorted array, repeatedly halves the search range until it finds a target or the range is empty.
- A function that computes the sum
1 + 2 + ... + nby adding numbers one at a time in a loop, versus one that uses the closed-form formulan*(n+1)/2. - A function that checks whether a string is a palindrome by comparing characters from both ends moving inward with two index variables.
- A function that prints every pair
-
Find the bug in this reasoning: “I tested my sorting function on arrays of size 10, 100, and 1,000, and the time roughly quadrupled each time I multiplied the size by 10. So my algorithm must be
O(n²).” Is the conclusion justified? What additional evidence — or what alternative explanation — would make you more or less confident? -
Design challenge: You’re given an array of
nintegers and need to determine whether any two of them sum to a target valueT. The obvious approach checks every pair —O(n²). Can you describe (in plain English or pseudocode — no need to write runnable code yet) a strategy that does this inO(n log n)time? InO(n)time? What does each faster strategy cost you in space, and is that cost worth paying? (This single problem, in its many disguises, is one of the most common building blocks in algorithm design — you will see its shape again and again.)
Carrying these habits forward — counting operations instead of guessing, writing down what grows and how fast, and always asking “can I do better, and what would I have to give up to do it?” — is the foundation everything else in this course is built on.