Comp 388 Lab 5 - Tree-based dictionaries
Goals
- use a tree to implement a Dictionary
- Count the occurrence of each word in a file
- Print the words and their occurrences in alphabetical order
Overview
The end result is to count the occurrence of each word in the file wirth.text
(a classic essay on software engineering).
To read words we will use the Tokenizer class in tokenizer.cs;
the token() method returns each word as a string. To initialize it, use
Tokenizer t = new Tokenizer("wirth.text");
It also happens to return punctuation and (if the punctuation isn't
consistent with mini-java) the string "ILLEGAL TOKEN"; you should skip
these. So to get the individual (alphabetic) words, you can do this:
string word = t.token();
while (word == null || word == "ILLEGAL TOKEN" || !Char.IsLetter(word[0])) { // why check word==null?
if (word == null) break;
word = t.token();
}
The basic idea of counting words is as follows, where getword()
returns the next alphabetic word as above or else null. The code shown here,
however, uses the built-in Dictionary class:
Dictionary<string,int> d = new Dictionary<string,int>();
string word = getword();
while (word != null) {
if (d.ContainsKey(word)) {
d[word] += 1;
} else {
d.Add(word,1);
}
word = getword();
}
foreach (KeyValuePair<string,int> entry in d) Console.WriteLine("{0}: {1}", entry.Key, entry.Value);
The built-in Dictionary class defines operator[] for its purposes; you can
do that, but a simpler idea is to use a conventional interface; here is one
possibility:
- bool ContainsKey(string w): returns true if w is found as a word in
the dictionary.
- int Get(string w): returns the int corresponding to w. You could
return 0 if not found, but int cannot be null (alternatively, you could
use int?). The Get() method would look somewhat like Dictionary.TryGetValue(),
but would return the result as an ordinary function return rather than
by using an out parameter.
- void Put(string w, int count): adds or updates the count associated
with the word w. Many implementations, though, prefer to keep adding a new
(K,V) pair separate from updating the V of an existing (K,V)
pair.
- void Print(): prints each word (in order, since you used a tree!) and
its corresponding count
You do not have to make your dictionary a generic one,
with <K,V> types.
Your Print() method should print the words in alphabetical order.
This is the main reason for the ordered-tree implementation; you can in
principle use an ordinary built-in Dictionary but then you would have to
sort the output.
Be sure you don't call any methods on a null word!
Implementing a Dictionary with an Ordered Binary Tree
For the tree implementation, the treenode of Lab 4's strtree.cs
already has fields for both string
data_ and int count_.
You would use the strtree.cs inTraverse()
method, slightly modified to include count_,
to print the data, and a modification of the insert()
method for Put().
You must to provide a search method for ContainsKey()
and Get(). Recursive search is
easy to do; here is a simple version that returns the treenode in which the
string is found. The Get()
method would then return the treenode's count()
field; the ContainsKey()
method would return true if the search()
method here returned a non-null value.
private treenode search(string s, treenode t) {
if (t==null) return null;
if (t.data() == s) return t;
else if (s.CompareTo(t.data()) < 0) return search(s, t.left());
else return search(s, t.right());
}