Comp 388 Lab 4 - Hash lookup

Goals

Overview

The end result is to count the occurrence of each word in the file wirth.text (a classic essay on software engineering). An alternative file, for when you need to print debugging information, is tiny.text.

To read words we will use the Tokenizer class in tokenizer.cs; the token() method returns each word as a string. To initialize it, use
    Tokenizer t = new Tokenizer("wirth.text");
It also happens  to return punctuation and (if the punctuation isn't consistent with mini-java) the string "ILLEGAL TOKEN"; you should skip these. So to get the individual (alphabetic) words, you can do this:
    string word = t.token();
    while (word == null  || word == "ILLEGAL TOKEN" || !Char.IsLetter(word[0])) {   // why check word==null?
if (word == null) break;
 word = t.token(); }
The project is in three files:
The basic idea of counting words is as follows, where getword() returns the next alphabetic word as above or, if the end of the file has been reached, null. The code shown here, however, uses the built-in Dictionary class:
    Dictionary<string,int> d = new Dictionary<string,int>();
    string word = getword();
    while (word != null) {
	if (d.ContainsKey(word)) {
	    d[word] += 1;
	} else {
	     d.Add(word,1);
	}
	word = getword();
    }
    foreach (KeyValuePair<string,int> entry in d) Console.WriteLine("{0}:  {1}", entry.Key, entry.Value);
The built-in Dictionary class defines operator[] for its purposes; you can do that, but a simpler idea is to use a conventional interface. We will also avoid using generic types like <string,int>. Here is one possibility:
Be sure you don't call any methods on a null word!

Implementing a Dictionary with a Hash Table

Our hash table can't be just the words themselves; we also need a place to keep count. Therefore the objects in the table will ultimately be the following:

class StrIntPair {
private string _word;
private int _count;
public StrIntPair(string w, int c) {_word = w; _count = c;}
public string getWord() { return _word; }
public int getCount() { return _count; }
public void setCount(int c) { _count = c; }
}
The "buckets" of the hash table will now be of type List<StrIntPair>. The table itself will be declared like this:

    List<StrIntPair>[] HTable

To initialize this, we set HTable =  new List<StrIntPair>[HMAX] and then initialize each HTable[i] (why do we need this?)

    for (int i=0; i<HMAX; i++) HTable[i] = new List<StrIntPair>();

The hash function will be the following (we have a little extra work to make sure it returns values >= 0):
    private int hash(string s) {
	int val = s.GetHashCode() % HMAX;
	if (val < 0) val += HMAX;
	return val;
    }
I have provided a search method for ContainsKey(). You must write Get() , Put() and Print(). The first two will have a structure similar to ContainsKey(); Print will have a structure like this:
        for (int i=0; i<HMAX; i++) {
            foreach(StrIntPair sip in HTable[i]) {
// print sip.getWord(), sip.getCount()            
            }
       }