Tokenizer t = new Tokenizer("wirth.text");
It also happens to return punctuation and (if the punctuation isn't
consistent with mini-java) the string "ILLEGAL TOKEN"; you should skip
these. So to get the individual (alphabetic) words, you can do this: string word = t.token();
while (word == null || word == "ILLEGAL TOKEN" || !Char.IsLetter(word[0])) { // why check word==null?
if (word == null) break;
word = t.token();
}
The project is in three files: Dictionary<string,int> d = new Dictionary<string,int>();
string word = getword();
while (word != null) {
if (d.ContainsKey(word)) {
d[word] += 1;
} else {
d.Add(word,1);
}
word = getword();
}
foreach (KeyValuePair<string,int> entry in d) Console.WriteLine("{0}: {1}", entry.Key, entry.Value);
The built-in Dictionary class defines operator[] for its purposes; you can
do that, but a simpler idea is to use a conventional interface. We will also
avoid using generic types like <string,int>.
Here is one possibility:Our hash table can't be just the words themselves; we also need a place to keep count. Therefore the objects in the table will ultimately be the following:
class StrIntPair {
private string _word;
private int _count;
public StrIntPair(string w, int c) {_word = w; _count = c;}
public string getWord() { return _word; }
public int getCount() { return _count; }
public void setCount(int c) { _count = c; }
}
The "buckets" of the hash table will now be of type List<StrIntPair>.
The table itself will be declared like this: private int hash(string s) {
int val = s.GetHashCode() % HMAX;
if (val < 0) val += HMAX;
return val;
}
I have provided a search method for ContainsKey().
You must write Get() , Put() and Print().
The first two will have a structure similar to ContainsKey(); Print will
have a structure like this: for (int i=0; i<HMAX; i++) {
foreach(StrIntPair sip in HTable[i]) {
// print sip.getWord(), sip.getCount()
}
}