Comp 271 Lab 4 - Our Own Hash Map



The end result is to count the occurrence of each word in the file wirth.text (a classic essay on software engineering). An alternative file, for when you need to print debugging information, is tiny.text. Both files are in the BlueJ folder.

The idea is to read each word, and look it up in a hash table that uses linked lists as buckets. If the word is already there, increment its count; otherwise install the word with a count of 1. The linked lists are based on class Cell, below, which has space for the word, its count, and the next Cell. This form of implementation exposes all the internals of Cell, but as everything is still "hidden" in class HashDict, that's ok.

To read words we will use the Tokenizer class in; the token() method returns each word as a string. To initialize it, use
    Tokenizer t = new Tokenizer(filename);
It also happens  to return punctuation and (if the punctuation isn't consistent with mini-java) the string "ILLEGAL TOKEN"; you should skip these. So to get the individual (alphabetic) words, you can do this:
    string word = t.token();
    while (word == null  || word.equals("ILLEGAL TOKEN") || !Character.isLetter(word.charAt(0))) {   // why check word==null?
if (word == null) break;
 word = t.token(); }
The project is in three java files (plus wirth.text and tiny.text), all within
The basic idea of counting words is as follows, where getword() returns the next alphabetic word as above or, if the end of the file has been reached, null. The code shown here, however, uses the built-in HashMap class:
    HashMap<String,Integer> d = new HashMap<String,Integer>();
 String word = getword(); while (word != null) { if (d.containsKey(word)) {
int c = d.get(word);
d.put(word, c+1); } else { d.put(word,1); } word = getword(); }
// the Dark Runes:
 for (Map.Entry<String,Integer> entry : d.entrySet()) System.out.printf("%s: %d%n", entry.getKey(), entry.getValue());
Your job is for the above loop to work after d is replaced with a HashDict, which you will define. This is a hashtable-based Map/Dictionary class that uses linked lists for the "buckets". You are to implement the following:
Be sure you don't call any methods on a null word!

Implementing a Dictionary with a Hash Table

Our hash table can't be just the words themselves; we also need a place to keep count. Therefore the objects in the linked lists will ultimately be the following:

    private class Cell {
       private String data_;
       private int count_;
       private Cell next_;
       public Cell(String s, Cell n) {data_ = s; next_ = n;}
       public String data() {return data_;}
       public int count() {return count_;}
       public void count(int c) {count_ = c;}
       public Cell next() {return next_;}
The "buckets" of the hash table will now be of type Cell, which corresponds to a linked list of Cell. The table itself is declared like this:

    Cell [] HTable

To initialize this, see the code in If word w, with hash(w) = h, is not found, you will add it to the front of list HTable[h]. You do that like this:

HTable[h] = new Cell(w, 1, HTable[h]);

The 1 here is the initial count. The old list is the second HTable[h], which now plays the role of the list following the newly inserted Cell.

The hash function will be the following (as before, we have a little extra work to make sure it returns values >= 0):
    private int hash(string s) {
	int val = s.GetHashCode() % HMAX;
	if (val < 0) val += HMAX;
	return val;
I have provided a search method for containsKey(). You must write get() , put() and print(). The first two will have a structure similar to containsKey(); print will have a structure like this:
        for (int i=0; i<HMAX; i++) {
            p = HTable[i];
while (p!= null) {
// print, p.count()
p =;
Here is the output of a run on the file tiny.text:

trust:  1
a:  2
or:  1
depend:  1
confidence:  1
for:  1
worthy:  1
is:  2
your:  1
an:  1
the:  1
reliable:  1
that:  3
can:  2
person:  1
organization:  1
of:  1
attribute:  1
mechanism:  1
you:  2
on:  1

Because this project involves more than one file, your best bet for submission is as a .zip file.