Trees
In recursion.html#exprtrees we
looked at some simple tree structures used for expressing "hierarchical"
information: expressions:
2*(3+5)
*
/ \
2 +
/ \
3 5
A tree consists of nodes, each of which has zero or more subnodes.
A binary tree is one in which each node has up to two
subnodes.
Binary tree nodes thus contain:
- data
- a left subtree
- a right subtree
The code might look like this (from bintree.cs)
class treenode {
private T data_;
private treenode left_, right_;
public treenode(T d, treenode l, treenode r) {data_ = d; left_ = l; right_ = r;}
public T data() {return data_;}
public treenode left() {return left_;}
public treenode right() {return right_;}
}
A binary tree is ordered if there is an ordering
relation on the data and, for each node, every data value appearing in the
left subtree is less than the node's own data, and every data value
appearing in the right subtree is greater than the node's own data.
Quite often, eg for expression trees, no ordering is involved.
The root node is at the top of the tree. Nodes with no
subtrees (ie in which the subtree fields are null) are leaf
nodes. Other nodes are interior nodes.
The depth of a node is how far it is, in node-to-node
links, from the root. Level n of a tree consists of all
nodes with depth n.
Examples:
- Abstract expression trees
- Parse trees
- Ancestry trees
- Decision trees (Bailey p 288)
None of these is ordered!
Tree Traversal
To traverse a tree is to visit each node, perhaps
printing out the data values or perhaps taking some other action.
Traversal methods are usually recursive, and usually the initial call is
on the tree's root.
Three common forms of traversal are preorder, inorder,
or postorder, depending on whether a node's data is
visited before, between or after the left and right subtrees are visited.
All of these are considered to be depth-first traversal.
Here is a sample inorder traversal
Demo:
traverse the expression tree for 4+x*(y+1)
inorder
postorder (cf Jan
Łukasiewicz)
traverse the following tree from inttree.cs:
6
/ \
4 8
/ \ / \
1 5 7
37
We can also do breadth-first traversal, by going along
levels: 6, 4, 8, 1, 5, 7, 37.
Example code for inorder traversal:
Tree Building
How do we build trees? One way is from the bottom up; that is much easier
of the class treenode is public. We could do this (assuming treenode is
equivalent to treenode):
treenode n1 = new treenode("a", null, null);
treenode n2 = new treenode("3", null, null);
treenode n3 = new treenode("*", n1, n2);
treenode n4 = new treenode("1", null, null);
treenode root = new treenode("+", n3,
n4);
What does this build?
We can build ordered trees by inserting in order; see inttree.cs:
public void insert(int val) {
if (root_ == null) {
root_ = new treenode(val, null, null);
return;
}
rinsert(val, root_);
}
// recursive insert: does not get called with t==null
private void rinsert(int val, treenode t) {
if (val == t.data() ) return; // should we do something else?
if (val < t.data()) {
if (t.left() == null) {
t.setleft(new treenode(val, null, null));
} else {
rinsert(val, t.left());
}
} else { // val > t.data()
if (t.right() == null) {
t.setright(new treenode(val, null, null));
} else {
rinsert(val, t.right());
}
}
}
If a treenode is null, we cannot modify it; we can only modify the
parent. Bailey addresses this in some contexts by including a pointer to
the parent node, but that won't help us here.
Note that the tree we get by inserting values depends on the order of
insertion.
In-class demos:
- try inserting the values in order
- what orders lead to the same tree?
- convert to preorder traversal
Lab 4: use this idea to implement a StrList that has
O(log(n)) times for both search and insert.
n-ary trees and ordered n-ary trees
A tree doesn't have to be binary to be ordered. But the number of data
elements at each node must be one less than the number of subtrees.
An n-ary tree has, for each node, a list of subnodes:
class Treenode {
private T data;
private List> children;
...
}
Different nodes of a tree can have different numbers of subnodes.
An ordered n-ary tree means that if a node has n child
nodes it also has a List of n-1 data values ⟨d0,..., dn-2⟩,
and all data values within child node kk-1 and
dk. That is, the ⟨d0,..., dn-2⟩ divide the
data into n intervals, and the subtrees correspond to these intervals.
As a Consider:
37
/ \
(23
34)
(43 56 71 88)
/
|
\
/ | \ \ \
/ |
\
/ | \
\ \
16
29
35
39 47
60 75 103
html documents as a data structure (Document Object Model, or DOM)
The structure is basically that of an n-ary tree
Recursion on trees
Let us suppose we have an inttree. We want one of the following:
- The depth of the tree
- The maximum value in the tree
- The sum of all the numbers in the tree
- Whether 7 is in the tree
The last one can be done efficiently (O(log N)) if we know the tree is
sorted. Otherwise, it is the same as the others: we proceed recursively,
checking the left and right subtrees and then the root node.
private static int sum(treenode t) {
if (t==null) return 0;
else return sum(t.left()) + t.data() + sum(t.right())
}
Traversing Trees
What if we want to implement traversing a tree until we find a particular
value, n? The interface method might be
public string
Get(int n) {
if (n>=count_) return
null;
getstr_ = null;
get(n,root_);
return getstr_;
}
Now the recursive helper method get(n, treenode p) must be defined. How
would we do this?
In traversers.cs, I have implemented
several different ways to traverse a tree.
1. inTraverse(): recursive traversal, now with a depth parameter
2. MyEnumerator: using IEnumberable, and yield return,
with recursion. This makes use of the idea of coroutines.
There is a demo in demoEnumerator(int n).
3. inTraverseStack(): stack-based iterative traversal, using a "trick"
A slightly more natural version of the stack-based iteration is this;
however, it requires being able to push both treenodes and data as
separate entities. This is
4. inTraverseStackAlt()
s.pushNode(root_);
while (!s.isEmpty()) {
Object x = s.pop();
if (x is treenode) {
treenode t = x as
treenode;
if (t.right() !=
null) s.pushNode(t.right());
s.pushData(t.data());
if (t.left() !=
null) s.pushNode(t.left());
} else {
string str = x as
string;
Console.WriteLine(str);
}
}
Next we replace the Stack with a Queue. (Normally the queue operations are
not called push() and pop() but rather enqueue() and dequeue(), but I
wanted a very simple drop-in replacement for a stack.
5. inTraverseQueue
6. inTraverseQueueAlt
What order do these visit the nodes of the tree? (#5 is a little broken;
the "real" version is #6).
As a related question, why are there so few actual problems that are
solved with Queues?
Then we return using the stack to the iterator approach, with
7. MyEnumerator2
The final mechanism uses a stack and a simple iterator-like interface:
8. NextSetup()/Next():
Tree Balancing
The problem with ordered trees is that the worst-case depth is O(N), rather
than O(log N). We would like to make sure that the tree remains at least
somewhat balanced as nodes are added (remaining balanced as nodes are
deleted is a related problem, which we will not consider). Some algorithms
in widespread use are AVL trees, Black-Red trees, Bayer trees (B-trees), and
Skew trees.
Tree Rotations
Tree rotations are a simple reorganization of a tree or subtree. The
idea is to consider a (subtree) root node A and one child B (say the left
child); the child B moves up to the root and the former root A
moves down. The root node A may be a subnode of a larger tree; since none of
the values in the A-subtree change, any such larger tree would remain
ordered. The transformation shown here, from the left to the right diagram,
is a right rotation about A (the
topmost node A moves down to the right); an example of a left rotation
(about B) would be the reverse.
Right rotation:
A
B
/
\
/
\
B
T3
=> T1 A
/
\
/
\
T1
T2
T2
T3
This is legal because we have T1 ≤ B ≤ T2 ≤ A ≤ T3 (where T1,T2,T3 represent
all values in those subtrees)
It is straightforward to write code to implement a rotation. The ordered
node data type is D. Note that we swap the data in the node occupied by A
and B; we have to preserve the node occupied originally by A because nodes
above may point to it.
TreeNode rotateRight(TreeNode A) {
if (A==null) return;
TreeNode B = A.left();
if (B==null) return;
TreeNode T1 = B.left(), T2 = B.right(), T3 =
A.right();
D temp = B.data();
B.setData(A.data());
A.setData(temp);
B.setLeft(T2); // this is
now the node that holds A's data
B.setRight(T3);
A.setLeft(T1); // this is
now the node that holds B's data
A.setRight(B);
}
Compare Bailey, p 355.
Splay Trees
Splay trees are binary search trees where, on every access,
we move the value in question (if found) to the root, in an operation known
as splaying. Note that accesses are
now mutation operations! The idea is that frequently accessed values will
gravitate over time towards the root.
A predecessor to splay trees was rotate-to-root,
in which we can move a node up to the root through a series of rotations,
each time rotating the node with its (new) parent. This does succeed in
bringing a given value to the root.
Splay Trees modify the rotate-to-root strategy with the addition of
"grandfather" rotations. Rotations occur in pairs (except the last). Let x
be a node, p the parent, and g the grandparent. If x is the left subnode of
p and p is the right subnode of g (or x is right and p is left), we rotate
around p and then g as in the rotate-to-root method. This is sometimes
called the zig-zag step, as the two rotations are in different directions.
However, if x is left and p is left, we rotate first around g and then
around p (note the reversed order!), in a step sometimes known as "zig-zig".
Consider all the nodes of the tree on the path between x and the root. After
splaying x up to the root, the depth of x is, of course, now 0. However, the
average depth of all the nodes on that root-to-x path is now halved. Thus,
while splaying doesn't necessarily improve the tree balance, it does move a
number of nodes closer to the root.
Examples:
- splaying a leaf in a balanced tree (eg 6 3 8 2 5 7 9)
- splaying a leaf in a degenerate tree (2 3 5 6 7 8 9)
To insert into a splay tree, we first find the node y which, under ordinary
insertion, x would be inserted immediately below. Y is then splayed to the
root, at which time we insert x above y. Assume that x would have been
inserted to the right of y. This means x>y, but also that xy. In inserting x at the root, we make y its left
subnode, and move y's right subtree to the right of x.
Splaying wrecks havoc on iterators. The main problem is that normally
accesses are iterator-safe (that is, it is safe to access values in a data
structure while an iterator is "in progress"; you just can't insert)
, but here accesses are not safe.
AVL Trees
These are named for Adelson-Velskii and Landis, from their 1962 paper. The
idea behind AVL trees is that at each node we store a value representing
depth(left_subtree) - depth(right_subtree); we'll call this the balance
factor. We will then use rotations to keep the balance factor
small.
We can compute balance factors easily enough using recursion, but it is
better to cache the value at each node to avoid excessive
computational time. Our goal is to maintain the balance factor for every
node as -1, 0, or 1. When we insert a node, we have to do appropriate
rotations to maintain the balance factor for every ancestor to the new node
(and also be sure that the rotations do not introduce any unbalancing of
their own).
As we work up the path from the newly inserted value to the tree root, we
consider the new balance factor of each node. If it is -1, 0, or 1, we do
nothing. If it is -2 or 2, we do rotations.
Let X be the node in question, with right and left subnodes R and L. If the
balance factor of x is +2, then the left subtree L is too deep. We know L
has a balance factor of -1, 0, or 1, but it matters which. Let the left
child of L be LL, and the right child be LR. The tree now looks like this:
X
/
\
L R
/
\
LL LR
(Note that LL, LR, and R are entire
subtrees, not just nodes. However, we know that their depths are
all similar, because of the balance-factor requirement.)
We will eventually do a right rotation about X; however
we might first have to do a left
rotation about L. Doing the right rotation about X would leave us with:
X
L
/
\
/
\
L R
=> LL X
/
\
/
\
LL LR
LR
R
If the balance factor of L had been 0 or 1, this is sufficient. Assume for a
moment that the balance factor of L is 1, so depth(LL) = depth(LR)+1.
Because X has balance-factor +2, we know depth(R) = depth(LL)-2. After the
rotation, the depth of LL has decremented by 1, the total depth of LR is
unchanged, and the total depth of R has incremented by 1 to match exactly
the depth of LR. So the balance factor of L (the new root) is now 0. If the
original balance factor of L had been 0, the post-rotation balance factor of
L becomes -1. Either way, it still works.
But if LR is deeper than LL (depth(LL)+1 = depth(LR)), then the new L is
unbalanced. So we first do a left
rotation about L to move some of LR up:
X
X
LR
/
\
/
\
/
\
L R
=> LR
R
=> L
X
/
\
/
\
/
\ / \
LL LR
L
LRR
LL
LRL LRR R
/
\
/
\
LRL
LRR
LL LRL
We argue that LR now has balance factor 0 or 1, and so when we do the right
rotation about X as the next step, the previous argument shows that we have
achieved AVL balance at the new root (which will be LR). We know that
depth(LL) +1 = max(depth(LRL), depth(LRR))
Example
Start with the tree
6
/ \
3
8
/ \ / \
2 5 7 9
Now add 10, 11, 12, 13, 14, 15, which I'll do in hex with A B C D E F.
Add A; no rotations are needed:
6
/ \
3 8
/
\ / \
2 5 7 9
\
A
After adding B, below A above, we need to rotate around node 9 above:
6
/ \
3 8
/
\ / \
2 5 7 A
/ \
9 B
After adding C, below B above, we will need to rotate around node 8 above:
6
/ \
3 A
/
\ / \
2 5 8 B
/ \ \
7 9 C
Now we add D below C, and rotate around B to get
6
/ \
3 A
/
\ / \
2 5 8 C
/ \ / \
7 9 B D
Now we add E below D, and this time the node we rotate around is all the way
at the root, 6:
A
/ \
6 C
/ \ / \
3 8 B D
/ \ /
\ \
2 5 7
9 E
Actually, this is just a little
misleading; all the rebalancing rotations above involved having the deepest
tree to the "right-right" of the pivot node; that is, the right subtree of
the right subtree. In that case, a simple rotation (to the left) about the
pivot node is all we need. The "left-left" case is similar. However, the
right-left (and left-right) cases are not quite as simple: here, we need to
do a preliminary rotation around the right subtree (right child) first.
Suppose the tree is:
6
/ \
3 9
/ \
8 A
We now insert a 7, making the tree unbalanced (BF = - 2) at node 6:
6
/ \
3 9
/ \
8 A
/
7
Rotation about 6 alone gives:
9
/ \
6 A
/
\
3 8
/
7
This still has a balance factor of +/- 2! This rotation did not help!
However, the AVL rule in this case is to first
rotate about the 9 (the right child), and then
about the 6. Also note that the rotations are in different directions, and
the BF sign changes (it is +1 at the node labeled 9, and -2 at the node
labeled 6).
After rotating right about the 9 we
get:
6
/ \
3 8
/ \
7 9
\
A
Then rotating left about the 6, we
get
8
/ \
6 9
/
\ \
3 7 A
This tree has had its balance
improved.
Bayer Trees
Generally these are known as B-trees. Bayer named them that in his paper,
though he did not spell out what the B stood for. B-trees are not
binary trees; in fact, you might look at them as evidence that being binary
makes life much harder.
B-trees have what I will call an order, B. (Some books, and Wikipedia, call this degree 2B.) Each interior node
(other than the root) has k nodes, B<=k<=2B, and k+1 children. The k
values, a0...ak-1, divide the leaf data into k+1
categories: x<a0, ai<x<ai+1 for
i=0..k-1, and ak+1<x. These categories form the k+1 children;
thus, a B-tree is still an ordered tree
even if it is not binary. All leaf nodes are the same depth in the tree.
Some B-tree visualizations:
Examples of how ordered trees with more than one data item per node might
look
Bayer's idea is that when a node becomes full, we split it in half, and push
the median element up a level. The
pushed-up node may cause a split in the parent as well; we keep pushing
until the process stops or we end up pushing a new root node.
Bayer trees grow deeper only when a new root node is pushed up.
Insertion of new values always starts by finding the appropriate leaf node
for that value. If that leaf node still has room, the value is inserted;
otherwise, push-up is used as necessary.
B-tree of order 1
Insert 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
into a B tree of order 1
Insert 4,17,8,7,2,19,13,15,3,10,1,16,20,9,18,11,14,5,12,6
B-tree of order 2 (2-4 values per node; also known as a 2-4 tree)
Red-Black Trees
These are binary trees that, like AVL trees, remain reasonably balanced
because we do restructuring on each insert. The restructuring runs in time
O(log height). Here's the wikipedia
definition:
- A node is either red or black.
- The root is black.
- All leaves are black.
- Both children of every red node are black.
- Every simple path from a given node to any of its descendant leaves
contains the same number of black nodes.
A black node can have black subnodes. Note that 4 implies that the parent of
a red node is black.
Let the black-height of the tree be the number of black nodes in any path
from the root to a leaf, constant as per property 5. Property 4 ensures
that, along any path, the path
length is no more than twice the black-height.
B-Tree analogy
We create a B-tree-like structure by consolidating each black node together
with any red offspring. Let us refer to the consolidation of a black node
with its red children as a B-node; it then follows from rule 5 that the tree
made from the B-nodes has all paths of the same length. It follows from rule
4 that the offspring of any red offspring are black, and so if we identify a
B-node in the original tree, the immediate children of this node must all be
black.
The catch is that the tree we create this way isn't a proper B-tree. A
B-tree of order 1 has 1 or 2 data values per node, and 2 or 3 subtrees; a
B-tree of order 2 has 2, 3 or 4 data values per non-root node with 3, 4 or 5
children. The B-like tree here has 1, 2 or 3 data values per node, and 2, 3
or 4 subtrees.
Given a B-like tree with 1-3 data values per node, maintaining the B-tree
property of all paths having equal length, we can convert it to a red-black
tree as follows:
- If a B-node has 1 value, it becomes a single black node
- If a B-node has two values, it becomes two nodes. The upper one will
be black and the lower one red; it does not matter which is which.
- If a B-node has three values, the middle value becomes a black node
and the two other values become red nodes representing the roots of the
left and right subtrees.
Realistically, there is no reason to prefer the red-black formulation to a
proper B-tree formulation. There is no especial benefit to having trees be
binary.
Inserting into a B-like tree with 1-3 data values per node is similar to
insertion into a regular B-tree, except that when a node is "full" with
three values, and we split it and push up a middle value, the split will be
into pieces of sizes 1 and 2. It does not matter which piece is which; if a
node contains values (10,15,20) and we add 18, we can push up either the 15,
with split nodes (10) and (18,20), or the 18, with split nodes (10,15) and
(20).
Red-Black Node Insertion
Let us refer to the sibling of the parent of a node as the uncle;
the uncle of a node, if it exists, is always unique because the tree is
binary. The first step is to add the new node to the tree in the usual
binary-tree-insertion manner; we must then rotate and recolor in order to
restore red-blackness. The important red-black requirements are 4 and
especially 5.
As we work up the tree, we will denote the current node by
N. Originally, N is the leaf node where the new data value is inserted, but
N then takes values along the path from that leaf to the root. When we
originally install the leaf node, we will color it red. As we move upwards,
the node designated current will always be red.
We will also let P be the parent of N, G the grandparent and U the uncle. We
will follow the five cases of Wikipedia.
G
/ \
U
P
/ \
N
Case 1: N is the root of the tree. We color it black and are done.
Case 2: P is black. In this case we can leave N red and be done.
Case 3: P and U are both red. Then we can change P and U both to black and
change G to red. Property 4 now holds for G, and property 5 holds for the
tree because all paths formerly through black node G now pass through
exactly one of the newly black nodes U and P.
Case 4: P is red but U is black, and N is the right child of P and P is the
left child of G (or N is the left child of P and P is the right child of G):
G
/ \
P
U
/ \
N
/
\
In this case we start with a left rotation about P:
G
/ \
N
U
/ \
P
The number of black nodes along any path through P or N is unchanged. Rule
4, however, fails. We fix this by setting the just-lowered P to current
and moving to case 5, below
Case 5: P is red, U is black, N is the left child of P which is the left
child of G (or N is right of P which is right of G):
G
/ \
P
U
/ \
N
T
In this case we do a right rotation about G, changing P to black and G to
red:
P
/ \
N
G
/ \
T U
Property 4 now holds, and so does property 5 for the tree now rooted at P:
the number of black nodes along any path through N, T or U is unchanged
although which specific nodes are black has changed.