Comp 271 Lab 4 - Tree-based dictionaries
Goals
- use a tree to implement a Map (dictionary)
- Count the occurrence of each word in a file
- Print the words and their occurrences in alphabetical order
Overview
The end result is again to count the occurrence of each word in the file.
This time we'll use a tree as the underlying data structure.
The project is in countwords2.zip, together
with three data files: wirth.text, tiny.text, and spacedoctor.text
(a summary of General Relativity using just the 1000 most common English
words).
To read words we will use the Tokenizer class in tokenizer.java; the token()
method returns each word as a string. To initialize it, use
Tokenizer t = new Tokenizer("wirth.text");
It also happens to return punctuation and (if the punctuation isn't
consistent with mini-java) the string "ILLEGAL TOKEN"; you should skip
these. So to get the individual (alphabetic) words, you can do this:
string word = t.token();
while (word == null || word == "ILLEGAL TOKEN" || !Char.IsLetter(word[0])) { // why check word==null?
if (word == null) break;
word = t.token();
}
The basic idea of counting words is as follows, where getword()
returns the next alphabetic word as above or, if the end of the file has
been reached, null. The code shown here, however, uses the built-in HashMap
class:
HashMap<String,Integer> d = new HashMap<String,Integer>();
...
String word = getword();
while (word != null) {
if (d.containsKey(word)) {
int c = d.get(word);
d.put(word, c+1);
} else {
d.put(word,1);
}
word = getword();
}
Your job is for the above loop to work after d is replaced with a TreeDict ,
which you will implement. This is a tree-based Map/Dictionary class.
Implementing a Dictionary with an Ordered Binary Tree
You are to implement the following:
- boolean containsKey(String w):
returns true if w is found as a word in the dictionary.
- int get(string w):
returns the int corresponding to w. You could return 0 if not
found, but int cannot be null. The get() method should return the
current count corresponding to the given word.
- void put(String w, int count):
adds or updates the count associated with the word w. You may, if you
wish, make a distinction between put(w,c) for new entries and
update(w,c) for existing entries.
- void print(): prints each
word and its corresponding count. This is actually done for you; it is
inTraverse(), except you have to get rid of the indentation.
I've started all of these for you; typically the public version calls the
private version that also has a TreeNode as a parameter. containsKey() is
probably the simplest: if you get to a null TreeNode, the word is not found.
If you find the word at TreeNode p, return true. Otherwise search the left
or right subnodes of p, recursively. Note the use of k.compareTo(p.key()),
which returns a negative value, zero, or a positive value depending on
whether k<p.key(), k==p.key() or k>p.key(), respectively (but you
can't use <, == or > on String objects in Java).
get() is a lot like containsKey(), except that if you find the word at
TreeNode p you return the value stored there (p.value()), rather than true.
For a null node, I returned -1. Again, if you're at a non-null node and the
word you're looking for is not at that node, search the left or the right
subtree, recursively, as appropriate. For containsKey() and get(), I added
return statements necessary to make it compile; edit these out.
update() is more like rinsert(), in that the recursive version, with
parameter p of type TreeNode, should never be called with p==null. The
problem is that you may have to insert the new word as p's left or right
child, which don't exist if p is null.
For print(), use inTraverse() but get rid of the indentation. Here is the
output for tiny.text:
filename:
tiny.text
a (2)
an (1)
attribute (1)
can (2)
confidence (1)
depend (1)
for (1)
is (2)
mechanism (1)
of (1)
on (1)
or (1)
organization (1)
person (1)
reliable (1)
that (3)
the (1)
trust (1)
worthy (1)
you (2)
your (1)
Because this project involves more than one file, your best bet for
submission is as a .zip file.