Trees
Binary trees, BSTs, traversals, deletion, AVL self-balancing, heaps — hierarchical data and the algorithms that keep it efficiently ordered.
Every data structure so far — arrays, linked lists, stacks, queues — is fundamentally linear: one thing after another. Trees break from that shape. A tree organises data hierarchically: one root, branching into children, branching again into their children, all the way down to leaves. This branching structure is precisely what makes trees powerful — at each node you eliminate half (or more) of the remaining candidates, reaching your target in O(log n) steps instead of O(n). File systems, HTML DOMs, compiler parse trees, database indexes, game state trees — all of these exploit the same branching principle. Understanding trees means understanding the structure underneath a huge fraction of the software you interact with every day.
Terminology
A tree is a connected, acyclic graph with a designated root node. Every node except the root has exactly one parent; a node with no children is a leaf:
- Depth of a node — number of edges on the path from the root to that node
- Height of a tree — maximum depth of any leaf; an empty tree has height -1, a single node has height 0
- Subtree — a node and all of its descendants; every node is the root of its own subtree
- Binary tree — every node has at most two children, called left and right
- Complete binary tree — every level is fully filled except possibly the last, which is filled left-to-right
- Balanced tree — height is O(log n); operations that descend root-to-leaf cost O(log n)
Binary Search Trees
A binary search tree (BST) maintains a strict ordering invariant: for every node n, all values in the left subtree are less than n.val, and all values in the right subtree are greater than n.val. This invariant is what makes search efficient — at each node, comparing the target to the current value tells you exactly which subtree to descend into, halving the remaining candidates:
#include <stdlib.h>
#include <stdio.h>
typedef struct BSTNode {
int val;
struct BSTNode *left;
struct BSTNode *right;
} BSTNode;
BSTNode *bst_search(BSTNode *node, int target) {
if (node == NULL || node->val == target) return node;
if (target < node->val) return bst_search(node->left, target);
return bst_search(node->right, target);
}
BSTNode *bst_insert(BSTNode *node, int val) {
if (node == NULL) {
BSTNode *n = malloc(sizeof(BSTNode));
n->val = val;
n->left = n->right = NULL;
return n;
}
if (val < node->val) node->left = bst_insert(node->left, val);
else if (val > node->val) node->right = bst_insert(node->right, val);
/* val == node->val: duplicate, ignore */
return node;
}
Deletion
Deleting from a BST has three cases depending on how many children the target node has:
- Leaf — no children: simply remove it and return
NULL - One child — splice out the node, returning the single child in its place
- Two children — find the in-order successor (the smallest node in the right subtree), copy its value into the current node, then delete the successor from the right subtree
The in-order successor is always the leftmost node in the right subtree — a node with at most one child — so the third case reduces to a simpler sub-problem:
/* Return the node with the minimum value in the subtree rooted at node */
static BSTNode *bst_min_node(BSTNode *node) {
while (node->left != NULL) node = node->left;
return node;
}
BSTNode *bst_delete(BSTNode *node, int val) {
if (node == NULL) return NULL;
if (val < node->val) node->left = bst_delete(node->left, val);
else if (val > node->val) node->right = bst_delete(node->right, val);
else {
/* Found the node to delete */
if (node->left == NULL) { /* case 1 & 2: no left child */
BSTNode *tmp = node->right;
free(node);
return tmp;
}
if (node->right == NULL) { /* case 2: no right child */
BSTNode *tmp = node->left;
free(node);
return tmp;
}
/* case 3: two children — replace with in-order successor */
BSTNode *successor = bst_min_node(node->right);
node->val = successor->val;
node->right = bst_delete(node->right, successor->val);
}
return node;
}
Pause and think: Why use the in-order successor (minimum of the right subtree) rather than the in-order predecessor (maximum of the left subtree) to handle the two-children case? Either works — both produce a valid BST. The successor approach is conventional but not mandatory. Try drawing a small BST and deleting a two-child node using both approaches to convince yourself the result satisfies the BST invariant in each case.
Tree Traversals
A traversal visits every node exactly once in a defined order. Four traversals cover all practical uses:
void inorder(const BSTNode *node) {
if (node == NULL) return;
inorder(node->left);
printf("%d ", node->val); /* root visited between left and right */
inorder(node->right);
}
void preorder(const BSTNode *node) {
if (node == NULL) return;
printf("%d ", node->val); /* root visited before children */
preorder(node->left);
preorder(node->right);
}
void postorder(const BSTNode *node) {
if (node == NULL) return;
postorder(node->left);
postorder(node->right);
printf("%d ", node->val); /* root visited after children */
}
/* Level-order (BFS): visit all nodes at depth d before any at depth d+1 */
void level_order(BSTNode *root) {
if (!root) return;
BSTNode *queue[256];
int front = 0, back = 0;
queue[back++] = root;
while (front < back) {
BSTNode *node = queue[front++];
printf("%d ", node->val);
if (node->left) queue[back++] = node->left;
if (node->right) queue[back++] = node->right;
}
}
| Traversal | Visit order | Key use |
|---|---|---|
| Inorder (L → N → R) | Ascending sorted order | Print BST contents sorted |
| Preorder (N → L → R) | Root before children | Serialise / copy a tree |
| Postorder (L → R → N) | Children before root | Free a tree, evaluate expressions |
| Level-order (BFS) | Top to bottom, left to right | Find shortest path, print by level |
Try it yourself: Build the BST by inserting values in order
{5, 3, 7, 1, 4, 6, 8}. Draw the resulting tree. Then trace each of the four traversals and write down the sequence of values visited. Which traversal produces the sequence1 3 4 5 6 7 8? Which produces5 3 1 4 7 6 8?
BST Degradation
BST performance is O(log n) only when the tree is balanced — when the height is proportional to log₂(n). Inserting already-sorted data destroys this:
Insert: 1, 2, 3, 4, 5
1
\
2
\
3
\
4
\
5
Height is now n − 1 rather than log₂(n). Search, insert, and delete all degrade to O(n). The BST is effectively a linked list. This is not an edge case — inserting sorted data is extremely common (timestamps, sequential IDs, alphabetically ordered words).
Balanced Trees
Self-balancing BSTs restructure themselves on every insertion and deletion to maintain O(log n) height. The two most important designs:
Red-black trees — each node is coloured red or black. A set of colouring rules ensures the height stays ≤ 2 × log₂(n+1). Fewer rotations on average than AVL, making insertions and deletions faster in practice. Used in Java TreeMap, C++ std::map, and Linux kernel scheduler.
AVL trees — first self-balancing design (1962). After every modification, each node on the path from the changed node to the root has its balance factor (left-subtree height minus right-subtree height) checked. If any node has a balance factor outside {-1, 0, +1}, it is immediately fixed with one or two rotations — pointer rewirings that preserve the BST invariant while restoring balance.
AVL Rotations
typedef struct AVLNode {
int val;
int height;
struct AVLNode *left;
struct AVLNode *right;
} AVLNode;
static int avl_height(const AVLNode *n) { return n ? n->height : 0; }
static int avl_bf(const AVLNode *n) { return avl_height(n->left) - avl_height(n->right); }
static void avl_update_height(AVLNode *n) {
int lh = avl_height(n->left), rh = avl_height(n->right);
n->height = 1 + (lh > rh ? lh : rh);
}
/* Right rotation — fixes left-heavy imbalance:
*
* y x
* / \ / \
* x C → A y
* / \ / \
* A B B C
*/
static AVLNode *rotate_right(AVLNode *y) {
AVLNode *x = y->left;
AVLNode *B = x->right;
x->right = y;
y->left = B;
avl_update_height(y);
avl_update_height(x);
return x;
}
static AVLNode *rotate_left(AVLNode *x) {
AVLNode *y = x->right;
AVLNode *B = y->left;
y->left = x;
x->right = B;
avl_update_height(x);
avl_update_height(y);
return y;
}
static AVLNode *avl_rebalance(AVLNode *n) {
avl_update_height(n);
int bf = avl_bf(n);
if (bf > 1) { /* left-heavy */
if (avl_bf(n->left) < 0)
n->left = rotate_left(n->left); /* left-right: fix child first */
return rotate_right(n);
}
if (bf < -1) { /* right-heavy */
if (avl_bf(n->right) > 0)
n->right = rotate_right(n->right); /* right-left: fix child first */
return rotate_left(n);
}
return n;
}
AVLNode *avl_insert(AVLNode *node, int val) {
if (!node) {
AVLNode *n = malloc(sizeof(AVLNode));
n->val = val; n->height = 1; n->left = n->right = NULL;
return n;
}
if (val < node->val) node->left = avl_insert(node->left, val);
else if (val > node->val) node->right = avl_insert(node->right, val);
else return node;
return avl_rebalance(node);
}
Every insertion touches at most O(log n) nodes and performs at most two rotations, each O(1). The guaranteed height bound is ≤ 1.44 × log₂(n+2) — tighter than red-black trees, which is why AVL trees are preferred when lookups dominate and update speed matters less.
Heaps
A heap is a complete binary tree (fully filled except the last row, which fills left-to-right) satisfying the heap property: in a max-heap, every node’s value is ≥ its children’s values. Because a complete binary tree maps perfectly onto an array (the children of element at index i are at 2i+1 and 2i+2; the parent is at (i-1)/2), heaps avoid all pointer overhead:
#define MAX_HEAP 256
typedef struct {
int data[MAX_HEAP];
int size;
} MaxHeap;
void heap_init(MaxHeap *h) { h->size = 0; }
int heap_peek(const MaxHeap *h) { return h->data[0]; }
static void heap_swap(MaxHeap *h, int i, int j) {
int tmp = h->data[i]; h->data[i] = h->data[j]; h->data[j] = tmp;
}
void heap_push(MaxHeap *h, int val) {
int i = h->size++;
h->data[i] = val;
while (i > 0) { /* bubble up */
int parent = (i - 1) / 2;
if (h->data[i] <= h->data[parent]) break;
heap_swap(h, i, parent);
i = parent;
}
}
int heap_pop(MaxHeap *h) {
int max = h->data[0];
h->data[0] = h->data[--h->size]; /* move last element to root */
int i = 0;
for (;;) { /* sift down */
int left = 2*i + 1, right = 2*i + 2, largest = i;
if (left < h->size && h->data[left] > h->data[largest]) largest = left;
if (right < h->size && h->data[right] > h->data[largest]) largest = right;
if (largest == i) break;
heap_swap(h, i, largest);
i = largest;
}
return max;
}
| Operation | Cost |
|---|---|
Insert (heap_push) | O(log n) |
Remove max (heap_pop) | O(log n) |
| Peek max | O(1) |
| Build heap from n elements | O(n) — not O(n log n) |
The heap underlies the priority queue (covered in the Stacks and Queues chapter) and heapsort — an O(n log n) in-place sorting algorithm that needs no auxiliary memory beyond the heap array.
Choosing the Right Tree
| Situation | Use |
|---|---|
| Ordered lookup, no rebalancing | BST (careful about input order) |
| Ordered lookup, arbitrary input | AVL tree or red-black tree |
| Priority queue, max/min retrieval | Heap |
| Key–value lookup without ordering | Hash table (next chapter) |
| Both ordering and fast lookup | Balanced BST |
Bringing It Together: Practice as a Problem-Solver
-
BST deletion in full. Start with an empty BST and insert
{10, 5, 15, 3, 7, 12, 18}. Draw the tree after each insertion. Now delete 10 (a two-child node). Trace throughbst_deletestep by step: identify the in-order successor, copy its value, and show the tree after the successor is deleted from the right subtree. Draw the final tree and verify the BST invariant holds at every node. -
Traversal prediction. Without running code, predict the output of all four traversal functions (
inorder,preorder,postorder,level_order) on the tree from exercise 1 (after deletion). Then verify by tracing the functions by hand. Identify which traversal would be most useful if you wanted to (a) print all values in sorted order, (b) count the total number of nodes, (c) make a copy of the tree in the same shape. -
AVL insertion trace. Insert values
{3, 2, 1}into an empty AVL tree, one at a time. After inserting 3 and 2, show the balance factors. After inserting 1, show which node becomes unbalanced, identify the rotation case (left-left, left-right, right-left, or right-right), and draw both the before and after trees. Verify that the final height is 1, not 2. -
Heap for top-K. You receive a stream of n integers one at a time and need to return the k largest at the end, using only O(k) memory (not O(n)). Describe a strategy using a min-heap of size k: what do you do when the heap has fewer than k elements? What do you do when it is full and the new element is larger than the heap’s minimum? What do you do when it is full and the new element is smaller? Why does a min-heap work better than a max-heap for this problem?