Comp 388 Lab 5 - Tree-based lookup

Goals

Overview

The end result is to count the occurrence of each word in the file wirth.text, as with Lab 4. Now, however, we will build the dictionary with a tree.

As an alternative input file, here's an article by Randall Munro (of xkcd fame) entitled The Space Doctor's Big Idea. It is about Einstein's theory of General Relativity, and is supposedly written using only the 1,000 most common English words. How many unique words does it contain? (I've removed contractions.)

To read words we will use the Tokenizer class in tokenizer.cs; the token() method returns each word as a string. To initialize it, use
    Tokenizer t = new Tokenizer("wirth.text");
It also happens  to return punctuation and (if the punctuation isn't consistent with mini-java) the string "ILLEGAL TOKEN"; you should skip these. So to get the individual (alphabetic) words, you can do this:
    string word = t.token();
    while (word == null  || word == "ILLEGAL TOKEN" || !Char.IsLetter(word[0])) {   // why check word==null?
if (word == null) break;
 word = t.token(); }
The project's files are the following:
The previous lab outlined the basic idea of counting words. Now we're doing it with a different underlying data structure.

The strtree.cs file contains a class strtree.cs. Nodes have a string field called data_ and an integer (count) field called wordcount_. It also contains the following methods:

You will have to build an interface more suitable for counting words. Here is a suggestion.

You do not have to make your dictionary a generic one, with <K,V> types.

Your Print() method should print the words in alphabetical order. This is the main reason for the ordered-tree implementation; last week's hash-dictionary implementation didn't maintain the words in any sensible order.

Be sure you don't call any methods on a null word!