Tokenizer t = new Tokenizer("wirth.text");It also happens to return punctuation and (if the punctuation isn't consistent with mini-java) the string "ILLEGAL TOKEN"; you should skip these. So to get the individual (alphabetic) words, you can do this:
string word = t.token(); while (word == null || word == "ILLEGAL TOKEN" || !Char.IsLetter(word[0])) { // why check word==null?The project is in three files:
if (word == null) break;
word = t.token(); }
Dictionary<string,int> d = new Dictionary<string,int>(); string word = getword(); while (word != null) { if (d.ContainsKey(word)) { d[word] += 1; } else { d.Add(word,1); } word = getword(); } foreach (KeyValuePair<string,int> entry in d) Console.WriteLine("{0}: {1}", entry.Key, entry.Value);The built-in Dictionary class defines operator[] for its purposes; you can do that, but a simpler idea is to use a conventional interface. We will also avoid using generic types like <string,int>. Here is one possibility:
Our hash table can't be just the words themselves; we also need a place to keep count. Therefore the objects in the table will ultimately be the following:
class StrIntPair {The "buckets" of the hash table will now be of type List<StrIntPair>. The table itself will be declared like this:
private string _word;
private int _count;
public StrIntPair(string w, int c) {_word = w; _count = c;}
public string getWord() { return _word; }
public int getCount() { return _count; }
public void setCount(int c) { _count = c; }
}
private int hash(string s) { int val = s.GetHashCode() % HMAX; if (val < 0) val += HMAX; return val; }I have provided a search method for ContainsKey(). You must write Get() , Put() and Print(). The first two will have a structure similar to ContainsKey(); Print will have a structure like this:
for (int i=0; i<HMAX; i++) {
foreach(StrIntPair sip in HTable[i]) {
// print sip.getWord(), sip.getCount()
}
}