Lab 7: Python strings & dictionaries

Comp 150, Dordal, March 17, 2006 (St Patrick's Day)

Goals:

Counting words

Suppose you have a file, consisting of a short text passage; for example: The prayer, probably appropriately, contains a lot more repetition. (Btw, you can either click on the above links to view them, or right-click and do a "save link as" to save them. I've deleted punctuation.) Anyway, suppose you want to know how many distinct words a file contains, or how often each word occurs. There are two main steps:

Step 1: Strings and lists

This is done through a sequence of string operations: (you will need import string): Try this with the sonnet. Then look at words and len(words) (that is, just type these at the python prompt). Do you get something sensible?

Try to write a function to do the above steps, that takes the filename as parameter (maybe) and returns words (you don't have to do this, but if you don't then you have to show me your results rather than email them):

def getwords(fname):
     text = open(fname, 'r').read()
     ...
     return words

Step 2: Dictionaries

Now we need to count how often each word occurs, which basically means for each word in the list checking to see if we've seen it before and updating the count. Python dictionaries form a really snazzy way to do this. Here's a simple dictionary example (ready for copy/paste into python); note that dictionaries, unlike most variables, must be created before use.

dict = {}                 # CREATE the dictionary
dict["foo"] = 1
dict["bar"] = 2
dict["baz"] = 1
dict["foo"] += 2          # increment "foo"'s count
Try this and then type dict and see what it looks like: the words are keys allowing the lookup of numeric values.

We'll call the dictionary of wordcounts counts. To add new words to counts and increment existing words, the following works (where w is the word in question):

if w in counts:
    counts[w] += 1       # increment; same as counts[w] = counts[w] + 1
else:
    counts[w] = 1        # create new entry
You can't use counts[w]+=1 if counts[w] isn't already present because the "+=" incrementing operator will need the pre-existing value and there isn't one in this case.

Do the above for each word w in words: Hint: use a for w in words: loop. If you make this into a function, it will start like
    def dictify(words):
       counts = {}   #
create the dictionary
       for w in words:

Now we have to analyze the dictionary. It's moderately large. You can print it nicely with

for w in counts:
    print w, counts[w]
Other things you can do:

Email me your python file, or, if you do everything at the console, just show me your final steps.

Much of this lab comes from Zelle's book, pp 370-373