Comp 271 Lab 5 - Tree-based dictionaries



The end result is again to count the occurrence of each word in the file. This time we'll use a tree as the underlying data structure.

The project is in, together with three data files: wirth.text, tiny.text, and spacedoctor.text (a summary of General Relativity using just the 1000 most common English words).

To read words we will use the Tokenizer class in; the token() method returns each word as a string. To initialize it, use
    Tokenizer t = new Tokenizer("wirth.text");
It also happens  to return punctuation and (if the punctuation isn't consistent with mini-java) the string "ILLEGAL TOKEN"; you should skip these. So to get the individual (alphabetic) words, you can do this:
    string word = t.token();
    while (word == null  || word == "ILLEGAL TOKEN" || !Char.IsLetter(word[0])) {   // why check word==null?
if (word == null) break;
 word = t.token(); }
The basic idea of counting words is as follows, where getword() returns the next alphabetic word as above or, if the end of the file has been reached, null. The code shown here, however, uses the built-in HashMap class:
    HashMap<String,Integer> d = new HashMap<String,Integer>();
 String word = getword(); while (word != null) { if (d.containsKey(word)) {
int c = d.get(word);
d.put(word, c+1); } else { d.put(word,1); } word = getword(); }
Your job is for the above loop to work after d is replaced with a TreeDict , which you will implement. This is a tree-based Map/Dictionary class.

Implementing a Dictionary with an Ordered Binary Tree

You are to implement the following:

I've started all of these for you; typically the public version calls the private version that also has a TreeNode as a parameter. containsKey() is probably the simplest: if you get to a null TreeNode, the word is not found. If you find the word at TreeNode p, return true. Otherwise search the left or right subnodes of p, recursively. Note the use of k.compareTo(p.key()), which returns a negative value, zero, or a positive value depending on whether k<p.key(), k==p.key() or k>p.key(), respectively (but you can't use <, == or > on String objects in Java).

get() is a lot like containsKey(), except that if you find the word at TreeNode p you return the value stored there (p.value()), rather than true. For a null node, I returned -1. Again, if you're at a non-null node and the word you're looking for is not at that node, search the left or the right subtree, recursively, as appropriate. For containsKey() and get(), I added return statements necessary to make it compile; edit these out.

update() is more like rinsert(), in that the recursive version, with parameter p of type TreeNode, should never be called with p==null. The problem is that you may have to insert the new word as p's left or right child, which don't exist if p is null.

For print(), use inTraverse() but get rid of the indentation. Here is the output for tiny.text:

filename: tiny.text
a  (2)
an  (1)
attribute  (1)
can  (2)
confidence  (1)
depend  (1)
for  (1)
is  (2)
mechanism  (1)
of  (1)
on  (1)
or  (1)
organization  (1)
person  (1)
reliable  (1)
that  (3)
the  (1)
trust  (1)
worthy  (1)
you  (2)
your  (1)

Because this project involves more than one file, your best bet for submission is as a .zip file.