In recursion.html#exprtrees we looked at some simple tree structures used for expressing "hierarchical" information: expressions:

2*(3+5)

*

/ \

2 +

/ \

3 5

A tree consists of

A

Binary tree nodes thus contain:

- data
- a left subtree
- a right subtree

The code might look like this (from bintree.cs)

class treenode{

private T data_;

private treenode left_, right_;

public treenode(T d, treenode l, treenode r) {data_ = d; left_ = l; right_ = r;}

public T data() {return data_;}

public treenode left() {return left_;}

public treenode right() {return right_;}

}

A binary tree is **ordered** if there is an ordering
relation on the data and, for each node, every data value appearing in the
left subtree is less than the node's own data, and every data value
appearing in the right subtree is greater than the node's own data.

Quite often, eg for expression trees, no ordering is involved.

The **root** node is at the top of the tree. Nodes with no
subtrees (ie in which the subtree fields are null) are **leaf**
nodes. Other nodes are **interior** nodes.

The **depth** of a node is how far it is, in node-to-node
links, from the root. **Level n **of a tree consists of all
nodes with depth n.

Examples:

- Abstract expression trees
- Parse trees
- Ancestry trees
- Decision trees (Bailey p 288)

None of these is ordered!

To **traverse** a tree is to visit each node, perhaps
printing out the data values or perhaps taking some other action.
Traversal methods are usually recursive, and usually the initial call is
on the tree's root.

Three common forms of traversal are **preorder**, **inorder**,
or **postorder**, depending on whether a node's data is
visited before, between or after the left and right subtrees are visited.
All of these are considered to be **depth-first** traversal.
Here is a sample inorder traversal

Demo:

traverse the expression tree for 4+x*(y+1)

inorder

postorder (cf Jan Łukasiewicz)

traverse the following tree from simpletrees.zip:

6/ \

4 8

/ \ / \

1 5 7 37

We can also do

Example

How do we build trees? One way is from the bottom up; that is much easier
of the class treenode is public. We could do this (assuming treenode is
equivalent to treenode

treenode n1 = new treenode("a", null, null);

treenode n2 = new treenode("3", null, null);

treenode n3 = new treenode("*", n1, n2);

treenode n4 = new treenode("1", null, null);

treenode **root** = new treenode("+", n3,
n4);

What does this build?

We can build **ordered** trees by inserting in order:

public void insert(int val) { if (root_ == null) { root_ = new treenode(val, null, null); return; } rinsert(val, root_); } // recursive insert: does not get called with t==null private void rinsert(int val, treenode t) { if (val == t.data() ) return; // should we do something else? if (val < t.data()) { if (t.left() == null) { t.setleft(new treenode(val, null, null)); } else { rinsert(val, t.left()); } } else { // val > t.data() if (t.right() == null) { t.setright(new treenode(val, null, null)); } else { rinsert(val, t.right()); } } }

If a treenode is null, we cannot modify it; we can only modify the parent. Bailey addresses this in some contexts by including a pointer to the parent node, but that won't help us here.

Note that the tree we get by inserting values depends on the order of insertion.

In-class demos:

- try inserting the values in order
- what orders lead to the same tree?
- convert to preorder traversal

An n-ary tree has, for each node, a

class Treenode

private T data;

private List

...

}

Different nodes of a tree can have different numbers of subnodes.

An

As a Consider:

37

/ \

(23 34) (43 56 71 88)

/ | \ / | \ \ \

/ | \ / | \ \ \

16 29 35 39 47 60 75 103

html documents as a data structure (Document Object Model, or DOM)

The structure is basically that of an n-ary tree

Let us suppose we have an inttree. We want one of the following:

- The depth of the tree
- The maximum value in the tree
- The sum of all the numbers in the tree
- Whether 7 is in the tree

The last one can be done efficiently (O(log N)) if we know the tree is sorted. Otherwise, it is the same as the others: we proceed recursively, checking the left and right subtrees and then the root node.

private static int sum(treenode t) {

if (t==null) return 0;

else return sum(t.left()) + t.data() + sum(t.right())

}

What if we want to implement traversing a tree until we find a
particular value, n? The interface method might be

public
String Get(int n) {

if (n>=count_)
return null;

getstr_ = null;

get(n,root_);

return getstr_;

}

Now the recursive helper method get(n, treenode p) must be defined.
How would we do this?

In the BlueJ project traversers,
I have implemented several different ways to traverse a tree.

1. inTraverse(): recursive traversal, now with a depth parameter

3. inTraverseStack(): stack-based iterative traversal, using a
"trick"

A slightly more natural version of the stack-based iteration is
this; however, it requires being able to push both treenodes and
data as separate entities. This is

4. inTraverseStackAlt()

s.pushNode(root_);

while (!s.isEmpty()) {

Object x = s.pop();

if (x is treenode) {

treenode t
= x as treenode;

if
(t.right() != null) s.pushNode(t.right());

s.pushData(t.data());

if
(t.left() != null) s.pushNode(t.left());

} else {

string str
= x as string;

Console.WriteLine(str);

}

}

Next we replace the Stack with a Queue. (Normally the queue
operations are not called push() and pop() but rather enqueue() and
dequeue(), but I wanted a very simple drop-in replacement for a
stack.

5. inTraverseQueue

6. inTraverseQueueAlt

What order do these visit the nodes of the tree? (#5 is a little
broken; the "real" version is #6).

As a related question, why are there so few actual problems that are
solved with Queues?

For the next traverser, we define a Next() method that uses the stack-based approach:

7. NextSetup()/Next()

The drawback to that is there can be only one active iteration at a
time. The final version solves that, and also enables for-each
loops:

8. java Iterator (allowing a for-each loop), internally based on a
stack.

/ \ / \

B T3 => T1 A

/ \ / \

T1 T2 T2 T3

This is legal because we have T1 ≤ B ≤ T2 ≤ A ≤ T3 (where T1,T2,T3 represent all values in those subtrees)

It is straightforward to write code to implement a rotation. The ordered node data type is D. Note that we swap the data in the node occupied by A and B; we have to preserve the node occupied originally by A because nodes above may point to it.

TreeNode

if (A==null) return;

TreeNode

if (B==null) return;

TreeNode

D temp = B.data();

B.setData(A.data());

A.setData(temp);

B.setLeft(T2); // this is now the node that holds A's data

B.setRight(T3);

A.setLeft(T1); // this is now the node that holds B's data

A.setRight(B);

}

Compare Bailey, p 355.

A predecessor to splay trees was rotate-to-root, in which we can move a node up to the root through a series of rotations, each time rotating the node with its (new) parent. This does succeed in bringing a given value to the root.

Splay Trees modify the rotate-to-root strategy with the addition of "grandfather" rotations. Rotations occur in pairs (except the last). Let x be a node, p the parent, and g the grandparent. If x is the left subnode of p and p is the right subnode of g (or x is right and p is left), we rotate around p and then g as in the rotate-to-root method. This is sometimes called the zig-zag step, as the two rotations are in different directions. However, if x is left and p is left, we rotate first around g and then around p (note the reversed order!), in a step sometimes known as "zig-zig".

Consider all the nodes of the tree on the path between x and the root. After splaying x up to the root, the depth of x is, of course, now 0. However, the average depth of all the nodes on that root-to-x path is now halved. Thus, while splaying doesn't necessarily improve the tree balance, it does move a number of nodes closer to the root.

Examples:

- splaying a leaf in a balanced tree (eg 6 3 8 2 5 7 9)
- splaying a leaf in a degenerate tree (2 3 5 6 7 8 9)

To insert into a splay tree, we first find the node y which, under ordinary insertion, x would be inserted immediately below. Y is then splayed to the root, at which time we insert x above y. Assume that x would have been inserted to the right of y. This means x>y, but also that x

Splaying wrecks havoc on iterators. The main problem is that normally accesses are iterator-safe (that is, it is safe to access values in a data structure while an iterator is "in progress"; you just can't insert) , but here accesses are not safe.

We can compute balance factors easily enough using recursion, but it is better to

As we work up the path from the newly inserted value to the tree root, we consider the new balance factor of each node. If it is -1, 0, or 1, we do nothing. If it is -2 or 2 (the worst it can be after a single insertion), we do rotations.

Let X be the node in question, with right and left subnodes R and L. If the balance factor of X is +2, then the left subtree L is too deep. We know L has a balance factor of -1, 0, or 1, but it matters which. Let the left child of L be LL, and the right child be LR. The tree now looks like this:

X

/ \

L R

/ \

LL LR

(Note that LL, LR, and R are entire subtrees, not just nodes. However, we know that their depths are all similar, because of the balance-factor requirement.)

We will eventually do a right rotation about X; however we might first have to do a left rotation about L. Doing the right rotation about X would leave us with:

X L

/ \ / \

L R => LL X

/ \ / \

LL LR LR R

If the balance factor of L had been 0 or 1, this is sufficient. Assume for a moment that the balance factor of L is 1, so depth(LL) = depth(LR)+1. Because X has balance-factor +2, we know depth(R) = depth(LL)-2. After the rotation, the depth of LL has decremented by 1, the total depth of LR is unchanged, and the total depth of R has incremented by 1 to match exactly the depth of LR. So the balance factor of L (the new root) is now 0. If the original balance factor of L had been 0, the post-rotation balance factor of L becomes -1. Either way, it still works.

But if LR is deeper than LL (depth(LL)+1 = depth(LR)), then the new L is unbalanced. So we first do a left rotation about L to move some of LR up:

X X LR

/ \ / \ / \

L R => LR R => L X

/ \ / \ / \ / \

LL LR L LRR LL LRL LRR R

/ \ / \

LRL LRR LL LRL

We argue that LR now has balance factor 0 or 1, and so when we do the right rotation about X as the next step, the previous argument shows that we have achieved AVL balance at the new root (which will be LR). We know that

The general rule is that we need to do two rotations (about L and then X
in the first diagram) if the balance factors at L and at X have different
sign (*eg* +2 at X and -1 at L, or -2 at X and +1 at L).

Example

Start with the tree

6

/ \

3 8

/ \ / \

2 5 7 9

Now add 10, 11, 12, 13, 14, 15, which I'll do in hex with A B C D E F.

Add A; no rotations are needed:

6

/ \

3 8

/ \ / \

2 5 7 9

\

A

After adding B, below A above, we need to rotate around node 9 above:

6

/ \

3 8

/ \ / \

2 5 7 A

/ \

9 B

After adding C, below B above, we will need to rotate around node 8 above:

6

/ \

3 A

/ \ / \

2 5 8 B

/ \ \

7 9 C

Now we add D below C, and rotate around B to get

6

/ \

3 A

/ \ / \

2 5 8 C

/ \ / \

7 9 B D

Now we add E below D, and this time the node we rotate around is all the way at the root, 6:

A

/ \

6 C

/ \ / \

3 8 B D

/ \ / \ \

2 5 7 9 E

Actually, this is just a little misleading; all the rebalancing rotations above involved having the deepest tree to the "right-right" of the pivot node; that is, the right subtree of the right subtree. In that case, a simple rotation (to the left) about the pivot node is all we need. The "left-left" case is similar. However, the right-left (and left-right) cases are not quite as simple: here, we need to do a preliminary rotation around the right subtree (right child) first. Suppose the tree is:

6

/ \

3 9

/ \

8 A

We now insert a 7, making the tree unbalanced (BF = - 2) at node 6:

6

/ \

3 9

/ \

8 A

/

7

Rotation about 6 alone gives:

9

/ \

6 A

/ \

3 8

/

7

This still has a balance factor of +/- 2! This rotation did not help! However, the AVL rule in this case is to first rotate about the 9 (the right child), and then about the 6. Also note that the rotations are in different directions, and the BF sign changes (it is +1 at the node labeled 9, and -2 at the node labeled 6).

After rotating right about the 9 we get:

6

/ \

3 8

/ \

7 9

\

A

Then rotating left about the 6, we get

8

/ \

6 9

/ \ \

3 7 A

This tree has had its balance improved.

Bayer's co-author was McCreight, who in 2013 said the following:

Bayer and I were in a lunchtime where we get to think [of] a name. And ... B is, you know ... We were working for Boeing at the time, we couldn't use the name without talking to lawyers. So, there is a B. [The B-tree] has to do with balance, another B. Bayer was the senior author, who [was] several years older than I am and had many more publications than I did. So there is another B. And so, at the lunch table we never did resolve whether there was one of those that made more sense than the rest. What really lives to say is: the more you think about what the B in B-trees means, the better you understand B-trees.

*order*,
d. Each interior node (other than the root) has k nodes,
d<=k<=2d, and k+1 children. The k values, a_{0}...a_{k-1},
divide the leaf data into k+1 categories: x<a_{0},
a_{i}<x<a_{i+1} for i=0..k-1, and a_{k+1}<x.
These categories form the k+1 children; thus, a B-tree is
still an ordered tree
even if it is not binary. All leaf nodes are the same
depth in the tree.

**degree**, which is the maximum
number of children a node can have. A B-tree of order d,
as above, has degree 2d+1.

Some B-tree visualizations:

- https://www.cs.usfca.edu/~galles/visualization/BTree.html
- http://slady.net/java/bt/view.php
(maybe doesn't work any more)

Bayer's idea is that when a node becomes overfull, we split it in half, and push the median element up a level. The pushed-up node may cause a split in the parent as well; we keep pushing until the process stops or we end up pushing a new root node.

Bayer trees grow deeper

B-tree of order 1

Insert 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 into a B tree of order 1

Insert 4,17,8,7,2,19,13,15,3,10,1,16,20,9,18,11,14,5,12,6

B-tree of order 2 (2-4 values per node; also known as a 2-4 tree)

Before proceeding to red-black trees we consider 2-4 trees, which are a form of B-trees. The tree is ordered, and each node is allowed to have between two and four children. The number of data values at a node is one less than the number of children. An additional requirement is that all leaf nodes must be at the same depth.

Insertion is similar to a B-tree. We start by adding the new value u to the appropriate leaf node, finding that leaf node using the tree ordering. Leaf nodes can contain between 1 and 3 data values. If adding u would give the leaf node four data values, we push the median value of the node up to the parent, and split the node into two (those less than the median and those greater than the median, which go on the right and left respectively of the pushed-up value).

This may cause the parent node to be too big, in which case we repeat the
process. If we must add a new node at every level, then we end up
splitting the existing root node and creating a new root above it.

- A node is either red or black.
- The root is black.

- All leaves are black.
- Both children of every red node are black (that is, a parent and child cannot both be red)
- Every simple path from a given node to any of its descendant leaves contains the same number of black nodes.

From a red-black tree we can create a 2-4 tree consolidating each black node together with any black grandchildren of a red child. If a node N is black and has one red child, this will give N three black children. If N has two red children, it will now have four black children.

We can convert a 2-4 tree back to a red-black tree in a unique way if we
require that the red-black tree be **left-leaning**; that
is, that if a black node has a red and a black child, then the red child
is the left one. Given a node N in the 2-4 tree, we do the following:

- if N has two children, we leave it alone.
- if N has three children n1, n2 and n3, we create a new red child node to N's left and move n1 and n2 so they are children of the red node. N's right child is n3.
- if N has four children, we create two red child nodes. The first red node gets the first two of N's children, and the second red node gets the second two.

We know how to insert into a 2-4 tree so as to maintain the 2-4 property. Because the 2-4 property is functionally equivalent to the red-black property for the corresponding red-black tree, we can now maintain red-black trees.

We create a B-tree-like structure by consolidating each black node
together with any red offspring. Let us refer to the consolidation of a
black node with its red children as a B-node; a B-node will now have
between 1 and 3 data values, and between 2 and 4 children. It then follows
from rule 5 that the tree made from the B-nodes has all paths of the same
length. It follows from rule 4 that the offspring of any red offspring are
black, and so if we identify a B-node in the original tree, the immediate
children of this node must all be black.

The catch is that the tree we create this way isn't *quite* a
proper B-tree. A B-tree of order 1 has 1 or 2 data values per node, and 2
or 3 subtrees; a B-tree of order 2 has 2, 3 or 4 data values per non-root
node with 3, 4 or 5 children. The B-*like* tree here has 1, 2 or 3
data values per node, and 2, 3 or 4 subtrees.

Given a B-like tree with 1-3 data values per node, maintaining the B-tree
property of all paths having equal length, we can convert it to a
red-black tree as follows:

- If a B-node has 1 value, it becomes a single black node
- If a B-node has two values, it becomes two nodes. The upper one will be black and the lower one red; it does not matter which is which.
- If a B-node has three values, the middle value becomes a black node and the two other values become red nodes representing the roots of the left and right subtrees.

Inserting into a B-like tree with 1-3 data values per node is similar to insertion into a regular B-tree, except that when a node is "full" with three values, and we split it and push up a middle value, the split will be into pieces of sizes 1 and 2. It does not matter which piece is which; if a node contains values (10,15,20) and we add 18, we can push up either the 15, with split nodes (10) and (18,20), or the 18, with split nodes (10,15) and (20).

Realistically, there is no reason to prefer the red-black formulation to a proper B-tree formulation. There is no especial benefit to having trees be binary.

Here is the algorithm for "directly" inserting into a red-black tree,
without converting to a 2-4 tree or a B-like tree

**uncle**; the uncle of a
node, if it exists, is always unique because the tree is
binary. The first step is to add the new node to the tree
in the usual binary-tree-insertion manner; we must then
rotate and recolor in order to restore red-blackness. The
important red-black requirements are 4 and especially 5.

We will also let P be the parent of N, G the grandparent and U the uncle. We will follow the five cases of Wikipedia.

G

/ \

U P

/ \

Case 1: N is the root of the tree. We color it black and are done.

Case 2: P is black. In this case we can leave N red and be done.

Case 3: P and U are both red. Then we can change P and U both to black and change G to red. Property 4 now holds for G, and property 5 holds for the tree because all paths formerly through black node G now pass through exactly one of the newly black nodes U and P.

Case 4: P is red but U is black, and N is the right child of P and P is the left child of G (or N is the left child of P and P is the right child of G):

G

/ \

P U

/ \

N

/ \

In this case we start with a left rotation about P:

G

/ \

N U

/ \

P

The number of black nodes along any path through P or N is unchanged. Rule 4, however, fails. We fix this by setting the just-lowered P to

Case 5: P is red, U is black, N is the left child of P which is the left child of G (or N is right of P which is right of G):

G

/ \

P U

/ \

N T

In this case we do a right rotation about G, changing P to black and G to red:

P

/ \

N G

/ \

T U

Property 4 now holds, and so does property 5 for the tree now rooted at P: the number of black nodes along any path through N, T or U is unchanged although which