Comp 271 Lab 3 - Your Own Hash Map

Goals

Overview

The end result is to count the occurrence of each word in the file wirth.text (a classic essay on software engineering). An alternative file, for when you need to print debugging information, is tiny.text. A third file is spacedoctor.text, which is an explanation of general relativity using only the 1000 most common English words. Verify that not more than 1000 different words are used! (All three files are in the project zipfile below.)

The idea is to read each word, and look it up in a hash table that uses linked lists as buckets (that is, the bins that things are put into by hashcode). If the word is already present, increment its count; otherwise install the word with a count of 1. The linked lists are based on class Cell, below, which has space for the word, its count, and the next Cell. This form of implementation exposes all the internals of Cell, but as everything is still "hidden" in class HashDict, that's ok. Arguably ok, at least.

To read words we will use the Tokenizer class in Tokenizer.java; the token() method returns each word as a string. To initialize it, use
    Tokenizer t = new Tokenizer(filename);
It also happens  to return punctuation and (if the punctuation isn't consistent with mini-java) the string "ILLEGAL TOKEN"; you should skip these. So to get the individual (alphabetic) words, you can do this:
    string word = t.token();
    while (word == null  || word.equals("ILLEGAL TOKEN") || !Character.isLetter(word.charAt(0))) {   // why check word==null?
if (word == null) break;
 word = t.token(); }
The project is in three java files (plus wirth.text and tiny.text), all within an IntelliJ project in countwords1.zip.
The basic idea of counting words is as follows, where getword() returns the next alphabetic word as above or, if the end of the file has been reached, null. The code shown here, however, uses the built-in HashMap class:
    HashMap<String,Integer> d = new HashMap<String,Integer>();
...
 String word = getword(); while (word != null) { if (d.containsKey(word)) {
int c = d.get(word);
d.put(word, c+1); } else { d.put(word,1); } word = getword(); }
// the Dark Runes:
 for (Map.Entry<String,Integer> entry : d.entrySet()) System.out.printf("%s: %d%n", entry.getKey(), entry.getValue());
Your job is for the above loop to work after d is replaced with a HashDict, which you will define. This is a hashtable-based Map/Dictionary class that uses linked lists for the "buckets". You are to implement the following:
The get/put interface is optional, actually. Note that in the word-found case in the code above, the hashmap is searched for the word three times! If you can improve on that, go for it. One approach is to have get() return a reference to the Cell containing the word, which can then be incremented without a second put(). (The drawback here is that Cell, below, isn't really meant to be exposed to the user programmer.) Another approach is an increment() method to do a second search and increment the count in one swoop. A third approach might be a method install_or_increment_count(), which would be called without a previous containsKey() search.

Be sure you don't call any methods on a null word!

Implementing a Dictionary with a Hash Table

Our hash table can't be just the words themselves; we also need a place to keep count. Therefore the objects in the linked lists will ultimately be the following:

    private class Cell {
       private String data_;
       private int count_;
       private Cell next_;
       public Cell(String s, Cell n) {data_ = s; next_ = n;}
       public String data() {return data_;}
       public int count() {return count_;}
       public void count(int c) {count_ = c;}
       public Cell next() {return next_;}
    }
The "buckets" of the hash table will now be of type Cell, which corresponds to a linked list of Cell. The table itself is declared like this:

    Cell [] HTable

To initialize this, see the code in HashDict.java. If word w, with hash(w) = h, is not found, you will add it to the front of list HTable[h]. You do that like this:

   
HTable[h] = new Cell(w, 1, HTable[h]);

The 1 here is the initial count. The old list is the second HTable[h], which now plays the role of the list following the newly inserted Cell.

The hash function will be the following (as before, we have a little extra work to make sure it returns values >= 0):
    private int hash(string s) {
	int val = s.GetHashCode() % HMAX;
	if (val < 0) val += HMAX;
	return val;
    }
I have provided a search method for containsKey(). You must write get() , put() and print(). The first two will have a structure similar to containsKey(); print will have a structure like this:
        for (int i=0; i<HMAX; i++) {
            p = HTable[i];
while (p!= null) {
// print p.data(), p.count()
p = p.next();
           }
       }
Here is the output of a run on the file tiny.text:

trust:  1
a:  2
or:  1
depend:  1
confidence:  1
for:  1
worthy:  1
is:  2
your:  1
an:  1
the:  1
reliable:  1
that:  3
can:  2
person:  1
organization:  1
of:  1
attribute:  1
mechanism:  1
you:  2
on:  1

Because this project involves more than one file, your best bet for submission is as a .zip file.