Comp 271 Lab 4 - Our Own Hash Map
Goals
- Use a hash table to implement a Map (Dictionary)
- Implement the hash table with linked lists
- Count the occurrence of each word in a file
- Print the words and their occurrences (not necessarily in alphabetical
order)
Overview
The end result is to count the occurrence of each word in the file
wirth.text (a classic essay on software engineering). An alternative file,
for when you need to print debugging information, is tiny.text. Both files
are in the BlueJ folder.
The idea is to read each word, and look it up in a hash table that uses
linked lists as buckets. If the word is already there, increment its count;
otherwise install the word with a count of 1. The linked lists are based on
class Cell, below, which has space for the word, its count, and the next
Cell. This form of implementation exposes all the internals of Cell, but as
everything is still "hidden" in class HashDict, that's ok.
To read words we will use the Tokenizer class in Tokenizer.java; the token()
method returns each word as a string. To initialize it, use
Tokenizer t = new Tokenizer(filename);
It also happens to return punctuation and (if the punctuation isn't
consistent with mini-java) the string "ILLEGAL TOKEN"; you should skip
these. So to get the individual (alphabetic) words, you can do this:
string word = t.token();
while (word == null || word.equals("ILLEGAL TOKEN") || !Character.isLetter(word.charAt(0))) { // why check word==null?
if (word == null) break;
word = t.token();
}
The project is in three java files (plus wirth.text and tiny.text), all within countwords1.zip.
- CountWords.java: contains the
top-level Main()
- HashDict.java:
contains the hash-based counting structure
- Tokenizer.java: as
above
The basic idea of counting words is as follows, where getword()
returns the next alphabetic word as above or, if the end of the file has
been reached, null. The code shown here, however, uses the built-in HashMap class:
HashMap<String,Integer> d = new HashMap<String,Integer>();
...
String word = getword();
while (word != null) {
if (d.containsKey(word)) {
int c = d.get(word);
d.put(word, c+1);
} else {
d.put(word,1);
}
word = getword();
}
// the Dark Runes:
for (Map.Entry<String,Integer> entry : d.entrySet()) System.out.printf("%s: %d%n", entry.getKey(), entry.getValue());
Your job is for the above loop to work after d is replaced with a HashDict, which you will define. This is a hashtable-based Map/Dictionary class that uses linked lists for the "buckets". You are to implement the following:
- boolean containsKey(String w):
returns true if w is found as a word in the dictionary.
- int get(string w):
returns the int corresponding to w. You could return 0 if not
found, but int cannot be null. The get() method should return the current count corresponding to the given word.
- void put(String w, int count):
adds or updates the count associated with the word w. You may, if you wish, make a distinction between put(w,c) for new entries and update(w,c) for existing entries.
- void print(): prints each
word and its corresponding count. Do this with an outer i<HMAX loop that goes through each element of HTable, and an inner p!=null loop that goes through each element of the linked list.
Be sure you don't call any methods on a null word!
Implementing a Dictionary with a Hash Table
Our hash table can't be just the words themselves; we also need a place
to keep count. Therefore the objects in the linked lists will ultimately be the
following:
private class Cell {
private String data_;
private int count_;
private Cell next_;
public Cell(String s, Cell n) {data_ = s; next_ = n;}
public String data() {return data_;}
public int count() {return count_;}
public void count(int c) {count_ = c;}
public Cell next() {return next_;}
}
The "buckets" of the hash table will now be of type Cell, which corresponds to a linked list of Cell.
The table itself is declared like this:
Cell []
HTable
To initialize this, see the code in HashDict.java. If word w, with hash(w) = h, is not found, you will add it to the front of list HTable[h]. You do that like this:
HTable[h] = new Cell(w, 1, HTable[h]);
The 1 here is the initial count. The old list is the second HTable[h], which now plays the role of the list following the newly inserted Cell.
The hash function will be the following (as before, we have a little extra work to make
sure it returns values >= 0):
private int hash(string s) {
int val = s.GetHashCode() % HMAX;
if (val < 0) val += HMAX;
return val;
}
I have provided a search method for containsKey().
You must write get() , put() and print().
The first two will have a structure similar to containsKey(); print will
have a structure like this:
for (int i=0; i<HMAX; i++) {
p = HTable[i];
while (p!= null) {
// print p.data(), p.count()
p = p.next();
}
}
Here is the output of a run on the file tiny.text:
trust: 1
a: 2
or: 1
depend: 1
confidence: 1
for: 1
worthy: 1
is: 2
your: 1
an: 1
the: 1
reliable: 1
that: 3
can: 2
person: 1
organization: 1
of: 1
attribute: 1
mechanism: 1
you: 2
on: 1
Because this project involves more than one file, your best bet for submission is as a .zip file.