Comp 271 Lab 10: hash maps

Comp 271 lab 0x0A - hash maps

Goals

implement a hash map
optimization through caching

Overview

Start with the hashmap.zip project. You are to implement the by-now-usual TMap interface using the hashtable approach:

get(K key)
put(K key, D data)
containsKey(K key)
clear
size

With these implemented, you should be able to run the WordCount.wordCounter() method and (with the definition of WHITESPACE that I included) get 375 unique words in the Wirth essay. I've also provided you the file small.text, for debugging; it has 10 unique words and 20 words total.

When using a hashtable and you want to look up a key, you do the following:

calculate int hashcode = key.hashCode()
calculate the position in the table, hashindex = hashcode % hashTable.length;
Search the list of Nodes pointed to by hashTable[hashindex];

Due to a problem with the possibility that hashcode could be negative, I've given you a hindex(K key) method that calculates the table index. Note that the Node class definition is a simple inner-class defined in HashMap.java. It represents a singly linked list; new entries are always to be inserted at the front of the list: hashTable[hashindex] = new Node(key, data, hashTable[hashindex]). Lists are searched using the searchKey(K key) method, also provided.

Additional features

You are also to implement an iterator for the hashmap class; you proceed linearly along each list in sequence. The hasNext() is a little tricky; you have to find the next entry to tell if it's there. (It's also possible to do this by keeping careful count of how many entries have been returned, versus size()).

The trickiest part is what to do when you've reached the end of one of the linked lists; I've given you findNonNull() that moves to the next nonempty "bucket", or to the end of the table.

Finally, for extra credit you can implement table resizing. If the table is too big, space is wasted; if it is too small, the lists are too long. I've provided a resize() method as a "hook"; it is called when size() exceeds 50% of hashTable.length. This 50% ratio is sometimes called the load factor; the java.util.HashMap implementation resizes when the load factor is 75% by default but you can set this in one variant of the constructor.

Note that when resizing, you simply create a new table, and re-enter everything in the old table. You can recalculate the hashCode of every key value, but caching this value in the Node makes for faster resizing.

To submit your project, create a zipfile and email it to me at pld@cs.luc.edu.