In recursion.html#exprtrees we looked at some simple tree structures used for expressing "hierarchical" information: expressions:

2*(3+5)

*

/ \

2 +

/ \

3 5

A tree consists of

A

Binary tree nodes thus contain:

- data
- a left subtree
- a right subtree

The code might look like this (from bintree.cs)

class treenode{

private T data_;

private treenode left_, right_;

public treenode(T d, treenode l, treenode r) {data_ = d; left_ = l; right_ = r;}

public T data() {return data_;}

public treenode left() {return left_;}

public treenode right() {return right_;}

}

A binary tree is **ordered** if there is an ordering
relation on the data and, for each node, every data value appearing in the
left subtree is less than the node's own data, and every data value
appearing in the right subtree is greater than the node's own data.

Quite often, eg for expression trees, no ordering is involved.

The **root** node is at the top of the tree. Nodes with no
subtrees (ie in which the subtree fields are null) are **leaf**
nodes. Other nodes are **interior** nodes.

The **depth** of a node is how far it is, in node-to-node
links, from the root. **Level n **of a tree consists of all
nodes with depth n.

Examples:

- Abstract expression trees
- Parse trees
- Ancestry trees
- Decision trees (Bailey p 288)

None of these is ordered!

To **traverse** a tree is to visit each node, perhaps
printing out the data values or perhaps taking some other action.
Traversal methods are usually recursive, and usually the initial call is
on the tree's root.

Three common forms of traversal are **preorder**, **inorder**,
or **postorder**, depending on whether a node's data is
visited before, between or after the left and right subtrees are visited.
All of these are considered to be **depth-first** traversal.
Here is a sample inorder traversal

Demo:

traverse the expression tree for 4+x*(y+1)

inorder

postorder (cf Jan Łukasiewicz)

traverse the following tree from simpletrees.zip:

6/ \

4 8

/ \ / \

1 5 7 37

We can also do

Example

How do we build trees? One way is from the bottom up; that is much easier
of the class treenode is public. We could do this (assuming treenode is
equivalent to treenode

treenode n1 = new treenode("a", null, null);

treenode n2 = new treenode("3", null, null);

treenode n3 = new treenode("*", n1, n2);

treenode n4 = new treenode("1", null, null);

treenode **root** = new treenode("+", n3,
n4);

What does this build?

We can build **ordered** trees by inserting in order:

public void insert(int val) { if (root_ == null) { root_ = new treenode(val, null, null); return; } rinsert(val, root_); } // recursive insert: does not get called with t==null private void rinsert(int val, treenode t) { if (val == t.data() ) return; // should we do something else? if (val < t.data()) { if (t.left() == null) { t.setleft(new treenode(val, null, null)); } else { rinsert(val, t.left()); } } else { // val > t.data() if (t.right() == null) { t.setright(new treenode(val, null, null)); } else { rinsert(val, t.right()); } } }

If a treenode is null, we cannot modify it; we can only modify the parent. Bailey addresses this in some contexts by including a pointer to the parent node, but that won't help us here.

Note that the tree we get by inserting values depends on the order of insertion.

In-class demos:

- try inserting the values in order
- what orders lead to the same tree?
- convert to preorder traversal

An n-ary tree has, for each node, a

class Treenode

private T data;

private List

...

}

Different nodes of a tree can have different numbers of subnodes.

An

As a Consider:

37

/ \

(23 34) (43 56 71 88)

/ | \ / | \ \ \

/ | \ / | \ \ \

16 29 35 39 47 60 75 103

html documents as a data structure (Document Object Model, or DOM)

The structure is basically that of an n-ary tree

Let us suppose we have an inttree. We want one of the following:

- The depth of the tree
- The maximum value in the tree
- The sum of all the numbers in the tree
- Whether 7 is in the tree

The last one can be done efficiently (O(log N)) if we know the tree is sorted. Otherwise, it is the same as the others: we proceed recursively, checking the left and right subtrees and then the root node.

private static int sum(treenode t) {

if (t==null) return 0;

else return sum(t.left()) + t.data() + sum(t.right())

}

What if we want to implement traversing a tree until we find a particular
value, n? The interface method might be

public String
Get(int n) {

if (n>=count_) return
null;

getstr_ = null;

get(n,root_);

return getstr_;

}

Now the recursive helper method get(n, treenode p) must be defined. How
would we do this?

In the BlueJ project traversers, I have implemented
several different ways to traverse a tree.

1. inTraverse(): recursive traversal, now with a depth parameter

3. inTraverseStack(): stack-based iterative traversal, using a "trick"

A slightly more natural version of the stack-based iteration is this;
however, it requires being able to push both treenodes and data as
separate entities. This is

4. inTraverseStackAlt()

s.pushNode(root_);

while (!s.isEmpty()) {

Object x = s.pop();

if (x is treenode) {

treenode t = x as
treenode;

if (t.right() !=
null) s.pushNode(t.right());

s.pushData(t.data());

if (t.left() !=
null) s.pushNode(t.left());

} else {

string str = x as
string;

Console.WriteLine(str);

}

}

Next we replace the Stack with a Queue. (Normally the queue operations are
not called push() and pop() but rather enqueue() and dequeue(), but I
wanted a very simple drop-in replacement for a stack.

5. inTraverseQueue

6. inTraverseQueueAlt

What order do these visit the nodes of the tree? (#5 is a little broken;
the "real" version is #6).

As a related question, why are there so few actual problems that are
solved with Queues?

For the next traverser, we define a Next() method that uses the stack-based approach:

7. NextSetup()/Next()

The drawback to that is there can be only one active iteration at a time. The final version solves that, and also enables for-each loops:

8. java Iterator (allowing a for-each loop), internally based on a stack.

/ \ / \

B T3 => T1 A

/ \ / \

T1 T2 T2 T3

This is legal because we have T1 ≤ B ≤ T2 ≤ A ≤ T3 (where T1,T2,T3 represent all values in those subtrees)

It is straightforward to write code to implement a rotation. The ordered node data type is D. Note that we swap the data in the node occupied by A and B; we have to preserve the node occupied originally by A because nodes above may point to it.

TreeNode

if (A==null) return;

TreeNode

if (B==null) return;

TreeNode

D temp = B.data();

B.setData(A.data());

A.setData(temp);

B.setLeft(T2); // this is now the node that holds A's data

B.setRight(T3);

A.setLeft(T1); // this is now the node that holds B's data

A.setRight(B);

}

Compare Bailey, p 355.

A predecessor to splay trees was rotate-to-root, in which we can move a node up to the root through a series of rotations, each time rotating the node with its (new) parent. This does succeed in bringing a given value to the root.

Splay Trees modify the rotate-to-root strategy with the addition of "grandfather" rotations. Rotations occur in pairs (except the last). Let x be a node, p the parent, and g the grandparent. If x is the left subnode of p and p is the right subnode of g (or x is right and p is left), we rotate around p and then g as in the rotate-to-root method. This is sometimes called the zig-zag step, as the two rotations are in different directions. However, if x is left and p is left, we rotate first around g and then around p (note the reversed order!), in a step sometimes known as "zig-zig".

Consider all the nodes of the tree on the path between x and the root. After splaying x up to the root, the depth of x is, of course, now 0. However, the average depth of all the nodes on that root-to-x path is now halved. Thus, while splaying doesn't necessarily improve the tree balance, it does move a number of nodes closer to the root.

Examples:

- splaying a leaf in a balanced tree (eg 6 3 8 2 5 7 9)
- splaying a leaf in a degenerate tree (2 3 5 6 7 8 9)

To insert into a splay tree, we first find the node y which, under ordinary insertion, x would be inserted immediately below. Y is then splayed to the root, at which time we insert x above y. Assume that x would have been inserted to the right of y. This means x>y, but also that x

Splaying wrecks havoc on iterators. The main problem is that normally accesses are iterator-safe (that is, it is safe to access values in a data structure while an iterator is "in progress"; you just can't insert) , but here accesses are not safe.

We can compute balance factors easily enough using recursion, but it is better to

As we work up the path from the newly inserted value to the tree root, we consider the new balance factor of each node. If it is -1, 0, or 1, we do nothing. If it is -2 or 2, we do rotations.

Let X be the node in question, with right and left subnodes R and L. If the balance factor of x is +2, then the left subtree L is too deep. We know L has a balance factor of -1, 0, or 1, but it matters which. Let the left child of L be LL, and the right child be LR. The tree now looks like this:

X

/ \

L R

/ \

LL LR

(Note that LL, LR, and R are entire subtrees, not just nodes. However, we know that their depths are all similar, because of the balance-factor requirement.)

We will eventually do a right rotation about X; however we might first have to do a left rotation about L. Doing the right rotation about X would leave us with:

X L

/ \ / \

L R => LL X

/ \ / \

LL LR LR R

If the balance factor of L had been 0 or 1, this is sufficient. Assume for a moment that the balance factor of L is 1, so depth(LL) = depth(LR)+1. Because X has balance-factor +2, we know depth(R) = depth(LL)-2. After the rotation, the depth of LL has decremented by 1, the total depth of LR is unchanged, and the total depth of R has incremented by 1 to match exactly the depth of LR. So the balance factor of L (the new root) is now 0. If the original balance factor of L had been 0, the post-rotation balance factor of L becomes -1. Either way, it still works.

But if LR is deeper than LL (depth(LL)+1 = depth(LR)), then the new L is unbalanced. So we first do a left rotation about L to move some of LR up:

X X LR

/ \ / \ / \

L R => LR R => L X

/ \ / \ / \ / \

LL LR L LRR LL LRL LRR R

/ \ / \

LRL LRR LL LRL

We argue that LR now has balance factor 0 or 1, and so when we do the right rotation about X as the next step, the previous argument shows that we have achieved AVL balance at the new root (which will be LR). We know that

depth(LL) +1 = max(depth(LRL), depth(LRR))

Example

Start with the tree

6

/ \

3 8

/ \ / \

2 5 7 9

Now add 10, 11, 12, 13, 14, 15, which I'll do in hex with A B C D E F.

Add A; no rotations are needed:

6

/ \

3 8

/ \ / \

2 5 7 9

\

A

After adding B, below A above, we need to rotate around node 9 above:

6

/ \

3 8

/ \ / \

2 5 7 A

/ \

9 B

After adding C, below B above, we will need to rotate around node 8 above:

6

/ \

3 A

/ \ / \

2 5 8 B

/ \ \

7 9 C

Now we add D below C, and rotate around B to get

6

/ \

3 A

/ \ / \

2 5 8 C

/ \ / \

7 9 B D

Now we add E below D, and this time the node we rotate around is all the way at the root, 6:

A

/ \

6 C

/ \ / \

3 8 B D

/ \ / \ \

2 5 7 9 E

Actually, this is just a little misleading; all the rebalancing rotations above involved having the deepest tree to the "right-right" of the pivot node; that is, the right subtree of the right subtree. In that case, a simple rotation (to the left) about the pivot node is all we need. The "left-left" case is similar. However, the right-left (and left-right) cases are not quite as simple: here, we need to do a preliminary rotation around the right subtree (right child) first. Suppose the tree is:

6

/ \

3 9

/ \

8 A

We now insert a 7, making the tree unbalanced (BF = - 2) at node 6:

6

/ \

3 9

/ \

8 A

/

7

Rotation about 6 alone gives:

9

/ \

6 A

/ \

3 8

/

7

This still has a balance factor of +/- 2! This rotation did not help! However, the AVL rule in this case is to first rotate about the 9 (the right child), and then about the 6. Also note that the rotations are in different directions, and the BF sign changes (it is +1 at the node labeled 9, and -2 at the node labeled 6).

After rotating right about the 9 we get:

6

/ \

3 8

/ \

7 9

\

A

Then rotating left about the 6, we get

8

/ \

6 9

/ \ \

3 7 A

This tree has had its balance improved.

B-trees have what I will call an

Some B-tree visualizations:

Examples of how ordered trees with more than one data item per node might look

Bayer's idea is that when a node becomes full, we split it in half, and push the median element up a level. The pushed-up node may cause a split in the parent as well; we keep pushing until the process stops or we end up pushing a new root node.

Bayer trees grow deeper

B-tree of order 1

Insert 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 into a B tree of order 1

Insert 4,17,8,7,2,19,13,15,3,10,1,16,20,9,18,11,14,5,12,6

B-tree of order 2 (2-4 values per node; also known as a 2-4 tree)

- A node is either red or black.
- The root is black.

- All leaves are black.
- Both children of every red node are black.
- Every simple path from a given node to any of its descendant leaves contains the same number of black nodes.

Let the black-height of the tree be the number of black nodes in any path from the root to a leaf, constant as per property 5. Property 4 ensures that, along any path, the path length is no more than twice the black-height.

The catch is that the tree we create this way isn't a proper B-tree. A B-tree of order 1 has 1 or 2 data values per node, and 2 or 3 subtrees; a B-tree of order 2 has 2, 3 or 4 data values per non-root node with 3, 4 or 5 children. The B-like tree here has 1, 2 or 3 data values per node, and 2, 3 or 4 subtrees.

Given a B-like tree with 1-3 data values per node, maintaining the B-tree property of all paths having equal length, we can convert it to a red-black tree as follows:

- If a B-node has 1 value, it becomes a single black node
- If a B-node has two values, it becomes two nodes. The upper one will be black and the lower one red; it does not matter which is which.
- If a B-node has three values, the middle value becomes a black node and the two other values become red nodes representing the roots of the left and right subtrees.

Inserting into a B-like tree with 1-3 data values per node is similar to insertion into a regular B-tree, except that when a node is "full" with three values, and we split it and push up a middle value, the split will be into pieces of sizes 1 and 2. It does not matter which piece is which; if a node contains values (10,15,20) and we add 18, we can push up either the 15, with split nodes (10) and (18,20), or the 18, with split nodes (10,15) and (20).

As we work up the tree, we will denote the

We will also let P be the parent of N, G the grandparent and U the uncle. We will follow the five cases of Wikipedia.

G

/ \

U P

/ \

Case 1: N is the root of the tree. We color it black and are done.

Case 2: P is black. In this case we can leave N red and be done.

Case 3: P and U are both red. Then we can change P and U both to black and change G to red. Property 4 now holds for G, and property 5 holds for the tree because all paths formerly through black node G now pass through exactly one of the newly black nodes U and P.

Case 4: P is red but U is black, and N is the right child of P and P is the left child of G (or N is the left child of P and P is the right child of G):

G

/ \

P U

/ \

N

/ \

In this case we start with a left rotation about P:

G

/ \

N U

/ \

P

The number of black nodes along any path through P or N is unchanged. Rule 4, however, fails. We fix this by setting the just-lowered P to

Case 5: P is red, U is black, N is the left child of P which is the left child of G (or N is right of P which is right of G):

G

/ \

P U

/ \

N T

In this case we do a right rotation about G, changing P to black and G to red:

P

/ \

N G

/ \

T U

Property 4 now holds, and so does property 5 for the tree now rooted at P: the number of black nodes along any path through N, T or U is unchanged although which