# Balanced BSTs

## Background

Suppose we want to store a collection of distinct integers (henceforth, we call them *keys*, for no good reason except that I can refer to them with the đź”‘ emoji) with support for:

`insert(x)`

insert`x`

into the collection if it is not there already`delete(x)`

deleting`x`

from the collection if it is present`search(x)`

determine if`x`

belongs to the collection

The motivation for building binary search trees is to capture a happy middle-ground between the following extremes:

- if I store my data fully sorted, then it is hard to insert and delete
^{1}, but easy to search^{2}, but - if I store my data in chaos-mode, then it is easy to insert and delete
^{3}, but hard to search^{4}.

Binary search trees are the natural answer to the question:

If binary search was a data structure, what would it look like?

To achieve O(\log n) complexity for search *and* insertion (*and* deletion), we want a relatively â€ślooseâ€ť sense of structure on the data: it should be fluid enough for insertion and deletion to be manageable, but meaningful enough for searches to be fast.

One way to do this is to store everything in a binary tree where every node `u`

stores a key k_u with the assurance that:

- all nodes that belong to the subtree rooted at the right child of
`u`

have keys whose values are*greater*than k_u; - all nodes that belong to the subtree rooted at the left child of
`u`

have keys whose values are*smaller*than k_u.

With this structure in place, `search(x)`

works exactly as you would expect: if the binary search tree is rooted at r, check if `x`

equals k_r: if yes, then weâ€™ve found what we are looking for, so abort and celebrate; on the other hand, if not, then repeat the following until youâ€™ve either found `x`

or have run out of elements in the tree:

- if
`x`

is greater than k_r, continue searching in the đźŚł rooted at the right child of r - if
`x`

is smaller than k_r, continue searching in the đźŚł rooted at the left child of r

Notice that if you hit a leaf without a match, you can safely conclude that `x`

does not belong to the collection. Also notice that the complexity of search is proportional to the length of the longest root-to-leaf path in the tree^{5} in the worst case.

To `insert(x)`

, first `search(x)`

: if `x`

is found, thereâ€™s nothing to do; if `x`

is not found, then make note of the leaf where the search stopped â€” say \ell: then if `x`

is greater than k_\ell, introduce a right child of \ell and assign to it the key value `x`

; on the other hand if `x`

is less than k_\ell, introduce a left child of \ell and assign to it the key value `x`

. Notice the tree with the newly added node is still a valid binary search tree.

What about `delete(x)`

? If `x`

is not in the collection, thereâ€™s nothing to do. If `x`

does belong to the collection and is associated with a leaf node, then the node can simply be knocked out, no further action required. However, what if the node whose key value is `x`

â€” say v â€” is an serious one, with a left and a right child? One possibility is to swap `x`

with an appropriate value ascribed to some leaf node, and *then* knock out the leaf: which now carries the value of `x`

. Which leaf should we swap with? Itâ€™s clear that it canâ€™t be just *any* leaf: but you could pick the smallest value in the right subtree or the largest value in the left subtree â€” both of these reside and leaves, and the swaps with them would leave you with a valid BST after the deed is done.

To find the smallest value in the right subtree, you can take the right turn from v and then go left until you are stuck. If you prefer the largest value in left subtree instead, take the left turn from v and go right until you are stuck. Either of these should work fine. You can find some examples of deletions in a BST here.

Notice that the complexity of all three operations are proportional to the height of the tree: so if we can control the height, we control the complexity of these operations! Notice also that the height of a BST â€” if built by implementing the algorithms above â€” can be as bad as linear in the number of keys^{6}, which is no good. Play around with this visualization of BSTs to get a feel!

## (2,3) Trees

One way of controlling the height of a BST is to simply demand it: this is called solving a problem by definition :) What if we could insist on having BSTs where the underlying tree structure must have the following additional property:

- all nodes have either 0 or two children
- all leaves are at the same distance from the root.

This would automatically force the structure of the underlying tree to be *complete* i.e, if the tree has height h, any node whose distance from the root is less than h must be a branching node (indeed, if not, then it is a leaf by property (1), but then we will have a violation of property (2)). Why is this great? Because if you have n nodes packed into a complete tree of height h, then the height must, in fact, be O(\log n) â€” which is what we wanted.

There is, however, a small catch with this: such trees may not exist for some values of n, the total number of keys that we are trying to accommodate :( We can fix this in two ways: add some dummy keys with values that either duplicate existing values till we have a feasible number of keys to work with (ugh), or make room for *multi-ary* search trees (cool!).

Since we want the elegant approach, letâ€™s push our definition further:

- all nodes have either 0, two, or three children
- nodes that have two children store
onekey value and satisfy the standard BST property- nodes with three children store
twokey values, say`a`

and`b`

; and further:

- the subtree rooted at the left-most child only hosts values smaller than
`a`

- the subtree rooted at the right-most child only hosts values larger than
`b`

- the subtree rooted at the middle child only hosts values strictly between
`a`

and`b`

- all leaves are at the same distance from the root.

Adapting `search(x)`

to multi-ary trees is extremely natural.

If the tree is rooted at r and r is a node with two children, proceed as before.

Otherwise, suppose r hosts the values a_r and b_r. Then check if `x`

equals either a_r or b_r: if yes, then weâ€™ve found what we are looking for, so abort and celebrate; on the other hand, if not, then repeat the following until youâ€™ve either found `x`

or have run out of elements in the tree:

- if
`x`

is greater than b_r, continue searching in the đźŚł rooted at the right-most child of r - if
`x`

is smaller than a_r, continue searching in the đźŚł rooted at the left-mostt child of r - otherwise, continue searching in the đźŚł rooted at the middle child of r

What about `insert(x)`

? We search for `x`

as before. If found, thereâ€™s nothing to be done. If not, there are two scenarios:

- The search terminates at a leaf that has one key value.
- The search terminates at a leaf that has two key values.

Note that *unlike* before, we cannot just have the leaf sprout a child to accommodate the new key value (remember that all leaves have to be at the same distance from the root). So our algorithm *pushes upwards*, and works as follows:

If thereâ€™s room in the current node (i.e, the node is one with two children or a leaf with one key value), accommodate the new value into that node, reorganizing as necessary. In particular, when a node with two children + one value transforms into one with three children + two values, then the subtrees have to be appropriately reorganized.

If thereâ€™s

**no**room in the current node, then move up to the parent. If thereâ€™s no parent: then youâ€™ve hit the root! If the root turns out to be crowded too, then youâ€™ll have to split the root (note that this causes the height of the tree to increase, but does not happen too often), but if the root has room then follow the procedure in the previous step.

## Footnotes

can be as bad as linear timeâ†©ď¸Ž

can be as good as logarithmic timeâ†©ď¸Ž

constant time!â†©ď¸Ž

can be as bad as linear timeâ†©ď¸Ž

Since itâ€™s going to be awkward to keep talking about the â€ślength of the longest root-to-leaf path in the treeâ€ť, we will simply refer to this as the

*height*of the tree.â†©ď¸ŽTry, for example, inserting 1,2,3,\ldots,n in that order.â†©ď¸Ž