Comp 150-001, TTh, 11:00-2:00, DH-339, Class 11

  1. Default routing
  2. Virtual Machines
  3. Python dictionaries; word-counting
  4. AI

Network routing tables: default route
Most real routing tables do not list every destination. They list some destinations (usually local ones), and then have a default destination, to which is routed all traffic not otherwise covered (thus including all "unknown" traffic). For a site with an connection to the Internet provided by their ISP, their default route is towards the ISP. Only "core" routers (or routers in the "default-free" zone) have tables with no default entry (therefore requiring that every network be listed).

Artificial Intelligence

Chapter 13 of Dale & Lewis: Artificial Intelligence, or AI (sections 13.1, 13.2, 13.5)

There was a 2001 movie with AI as its title, but that was weird. David is a robot boy who loves his human "mommy"

Classic: 2001, Stanley Kubrick/ Arthur C Clarke 1968
    HAL the computer goes nutso.
    Nominal plot of movie: space flight to Jupiter, where there's a Monolith.
    Real plot: HAL is tormented by conflict:
    So he tries to kill them.

What is AI?

The "Turing Test"

Turing's 1950 paper, Computing Machinery and Intelligence, in the journal MIND,
outlined the "imitation game", now known as the Turing test.

Version 1:

Man:     A
Woman:    B
Interrogator:    C

C talks to A & B via teletype.
C's goal: guess who is who

A's goal: make C be wrong


Version 2:

A is a MACHINE
B is a person
C is the same
A tries to make C guess wrong.

There's lots of effort expended on the computer version of this, but all fail tremendously: the usual constraint is that all conversation must be within a given restrictive domain, and this just isn't realistic.

Loebner prize

A competition to write programs that participate in the Turing test.
One 2005 entry (Alice, by Richard Wallace)
http://loebner.net/Prizef/2005_Contest/Alice/ALICE-Judge1.htm

Hi Alice, if you are indeed human, you should be able to tell me which is bigger, my big toe or a Boeing 747

[12:05:51] Alice:
Hi there Judge 1. Um, I think I sometimes am.I am a real person.  As you are sitting in front of your computer screen, you may be wondering if I am a computer or a person talking.

[12:06:10] Judge 1:
Can you answer the question about the toe and the 747?

[12:06:50] Alice:
Please rephrase the question and try asking me again.  As you hear the sound of keys typing you may begin to feel more comfortable chatting with me.

[12:07:12] Judge 1:
Which is bigger, a 747 or my big toe.  Come on, answer it.

[12:08:01] Alice:
The obvious one.  Please rephrase your question, Judge 1.  The feel of the keys on the tips of your fingers may remind you of some questions you wanted to ask me about.

The winner was Rollo Carpenter's Jabberwacky:
http://loebner.net/Prizef/2005_Contest/Jabberwacky/Jabberwacky_Judge_session1.htm

2005-09-18-10-01-22 PROGRAM: Hello, my name is George and I am the human.

2005-09-18-10-02-57 JUDGE: Hi George, cslkjslw weicmslw weos, right?

2005-09-18-10-03-02 PROGRAM: Don't repeat your name, csthiang!

2005-09-18-10-03-36 JUDGE: If lsjkel lwehlx wouldn't you say?

2005-09-18-10-03-38 PROGRAM: No, he's still good.

2005-09-18-10-03-55 JUDGE: Who.

2005-09-18-10-04-11 PROGRAM: You and me.

2005-09-18-10-04-44 JUDGE: If you are human, tell me which is larger, a salt crystal or the Empire State Building.

2005-09-18-10-04-57 PROGRAM: It's caring about another person's well being more than your own.

2005-09-18-10-07-01 JUDGE: What if I care about another person's well being more than I should?

2005-09-18-10-07-01 PROGRAM: How would you do that?

2005-09-18-10-07-43 JUDGE: Which is further away from us, Queens or Inner Mongolia?

2005-09-18-10-07-57 PROGRAM: HA! HA! That was stupid. Tell me a GOOD joke this time.

2005-09-18-10-08-57 JUDGE: You say you are human but you can't answer a simple question.

2005-09-18-10-09-01 PROGRAM: Because I am alive.

2005-09-18-10-10-35 JUDGE: You are not following the rules. You are supposed to convince me that you are a human, not convince me that you are mysterious.


Not a lot better.

These are sometimes called "chatterbots".

Compare with the following hypothetical sample in Turing's 1950 paper:

Judge: Please write me a sonnet on the subject of the Forth Bridge.
Respondent:  Count me out on this one. I never could write poetry.
Judge: Add 34957 to 70764.
Respondent: (pause for about 30 seconds) 105621 [note correct answer is 105721]
Judge: Do you play chess?
Respondent: Yes.
Judge: I have K at my K1, and no other pieces. You have only K at K6 and R at R1. It is your move. What do you play?
Respondent: (pause for about 15 seconds) R-R8 mate.

The October 2008 winner was Elbot, by Fred Roberts and Artificial Solutions.

[2008-10-12 08:40:40.187] Judge: Hello.
[2008-10-12 08:40:43.453] Respondent: A very non-mechanical hello to you, also.
[2008-10-12 08:41:00.062] Judge: Are you sure?
[2008-10-12 08:41:04.343] Respondent: Speaking of truth, I say this could be beneficial to me.
[2008-10-12 08:41:48.906] Judge: What could be beneficial to you?
[2008-10-12 08:41:54.156] Respondent: That question was rather indiscreet and I prefer not to answer it!
[2008-10-12 08:42:48.875] Judge: So I have already know that you are a real human. [pld: note grammar trick]
[2008-10-12 08:42:56.109] Respondent: Can we agree to abide by a minimum standard of civilized behavior? I will not listen to insults!
[2008-10-12 08:43:58.328] Judge: Oh. I am sorry.
[2008-10-12 08:44:05.578] Respondent: The expression 'oh' dates to Middle English as used in the 13th century. What a surprising etymology!
[2008-10-12 08:44:51.312] Judge: Really?
[2008-10-12 08:44:53.531] Respondent: Then again, maybe not.
[2008-10-12 08:45:22.062] Judge: Seems the knowledge is rather sophasticate. [pld: note spelling trick!]
[2008-10-12 08:45:27.359] Respondent: This is confusing. Why bring intellectual perception into this?

====================================================================================

What do people do?

    Think effortlessly: visual recognition, natural-language processing, conversation
    Think deeply: chess, reasoning, poetic metaphor

Trivial problem:
    Given a collection of photographs, put those with a cat in pile A; the rest in pile B

    This is ENORMOUSLY difficult for computers.

    Even OCR is not easy, and often uses very specific assumptions about fonts.

Language processing

Turing's "test" is based on this. There are lots of other AI areas of research, but language processing is relatively concrete. Some issues:
    analyze grammar
    resolve ambiguous pronouns
        Stan went to the restaurant. The waiter brought his meal.
        Afterwards he left a big tip. [Stan or the waiter?]

    resolve ambiguous references
        the chickens are ready to eat [who eats whom here?]

    resolve ambiguous prepositional phrases
        the cat is sleeping on the tv in pajamas  [who is wearing pajamas? The cat or the tv?]

    resolve words with multiple meanings
        time flies like an arrow, but fruit flies like banana
        ron lies asleep in his bed [v tells the truth?]

    resolve situations
        Sally was fed up. She got up from her table, leaving just enough to cover the check.
        The waitress sneered as she walked out
        [Who walked out? What did Sally leave on the table? Why did the waitress sneer?]


These are HARD.



Easy language processing (relatively easy, anyway):

    put the red block on top of the blue block:    EASY!

    who is the father of the oldest sister of sam?

    what is the average sale amount for salespeople in Nebraska?

================

Parsing is hard.

Restricted Turing test:

same thing as Turing's original test, except questions submitted to the human must be "parsed" by a
computerized judge & found to be within the relevant area.

================

ELIZA session on p 430 of Dale & Lewis
Eliza was based on pattern matching, which is marginally more sophisticated than the tech-support chatterbot's keyword matching. Patterns have variables (beginning with ? in the example below); if the pattern matches, the pieces matched by these variables can be recycled into the response.

Pattern:(my ?X me ?Y)
Data:   (my FATHER MADE me COME HERE)

Result: (your ?X you ?Y)
    (your FATHER MADE you COME HERE)

Keyword matches: DEPRESSED ANGRY HELP ....
More: MOTHER FATHER ...
    TELL ME MORE ABOUT YOUR FAMILY
    WHAT ELSE COMES TO MIND WHEN YOU THINK ABOUT YOUR ?X


Pattern: (I am ?X)
Response: (DO YOU THINK COMING HERE WILL HELP YOU NOT TO BE ?X)

Pattern: (?X like ?Y)
WHAT RESEMBLANCE DO YOU SEE?

(you are ?X but ?Y)
WHAT MAKES YOU THINK I AM ?X
DOES IT PLEASE YOU TO THINK I AM ?X

Weitzenbaum was horrified at the number of people who really thought ELIZA was intelligent, and (worse) was understanding them.

Clearly, pattern-matching is suggestive but NOT intelligent.


Scripts

Fast-forward 10-15 years to SCRIPTS: sequences of events with lots of "slots" to be filled in.

Restaurant script:
Slots begin with an action (ENTER, ORDERS, SERVES, PAYS below). Slots are ACTOR (who's doing the thing, that is, the subject), OBJECT (what it's being done to, the direct object), PLACE (where), or various prepositional phrases modifying the action (eg TO). (Note that prepositional phrases modifying the actor or object should properly be included as part of those slots). Some slots place restrictions on the slot-filler; eg requiring that the actor be a person, for example.

(ENTERS  (ACTOR ?PERSON) (PLACE RESTAURANT))
(ORDERS (ACTOR ?PERSON) (OBJECT ?FOOD) (ORDER_TAKER ?WAITER))
(SERVES (ACTOR ?WAITER) (OBJECT ?FOOD) (TO ?PERSON))
(PAYS (ACTOR ?PERSON)...

Goal: be able to print a summary and answer questions.

Actions
Actors: people, etc
    by name, by role, specific v nonspecific

    Even degree of specificity is unclear: in a story, "sam" might denote a character precisely but in a factual article we might wonder which "sam smith"


Things
    food, money, clothing, cars,


More on SCRIPTS
For some cases (superficial language translation), it suffices to do English_text => Internal_format => Spanish_text

BUT often English_text => Internal_format requires fairly deep "common knowledge" to succeed.

SOMETIMES, though less often, Internal_format => Spanish requires additional common knowledge.

More info about ACTORS:

They can be assigned emotional/mental states, either as consequences of the script or as "inputs" to the script
    happy, hungry, angry



Social situations:
    Someone enters store
        selects items
        goes to checkout
        pays
        leaves

    Clothing store: try things on?

How do we represent common knowledge?
How do we apply it in translating to Internal_format?

SAM represented knowledge as a bunch of SCRIPTS.

Ambiguity again

Pronoun-antecedent ambiguity examples:

What is flying:
Who is they?
What is it?


Processing newspaper articles

Can we build a data structure representing the article?
Can we ANSWER QUESTIONS about the article?
This involves understanding the questions as well as manipulating the database

Who is story about
Was anyone hurt? Who?
Was anyone arrested?


Processing short (very short) stories: need for a model of social motivation


At what point can we use a DICTIONARY? Probably not early.
Consider "which is bigger, a 747 or a grain of rice"?
Where in the dictionary would it say a rice grain is small?

============

Two issues in natural language:
  1. Translating a sentence to a "frame" structure by resolving pronouns, ambiguities, modifiers, references

  2. Being able to answer questions about a story.


For #1, we can try both (or more) alternates and see which makes more sense. Database of basic facts resolves most sentences in isolation.

Expert Systems

These have been more successful than natural-language processing. Dale & Lewis has a section on them, 13.3. Note that many natural language systems use some form of "expert system" internally, to encode the world knowledge that is necessary for language interpretation.

Databases

Dale & Lewis 12.3
You can keep a simple database as a spreadsheet, that is, as a single table. Columns represent fields; rows represent one record about an entity. For example, see the Movie database on p 389 of D&L, or the customer table on p 390.

But there's a problem. Suppose we try to keep track of movie rentals with one big table, containing columns Title, Rating, Renter_Name,  Address, and Year. It might look like this:
title
rating
name
address
year
2001
PG-13
Peter
Rogers Park
1998
Monty Python
PG-13
Peter
Rogers Park
1999
2001
PG-13
Peter
Shabbona
2000
AI
PG-13
Peter
Shabbona
2002
The problem is that Peter moved, and the old address is still embedded in some of the old records!

The generally accepted right way to do this is to have separate tables for movie information and for customer information, and then to have a third table of just the MovieID, CustomerID, and the date (in D&L, Fig 12.9 on p 391, the DateDue is also added here).

The CustomerID is the key for the Customer table. The MovieID is the key for the Movie table. Only the keys appear in the Rents table; note that the key to Rents is probably the three columns CustomerID, MovieID, and DateRented. If we want to update a customer's address, we do so only once, in the Customer table. The real problem with the table above is that the address should  depend on the name, but because the name is not the key, it is difficult to enforce that.

To put it another way, in our original table there was a non-key dependency: the address depends on the name, but the name is not a key. (The rating is also a non-key dependency on the title). The general rule is that whenever there is a dependency of one field on another within a larger table, eg (name, address)
This process is called normalization.

The standard mechanism for working with database tables is Structured Query Language, SQL, usually pronounced Sequel. Actually, "Sequel" was the trade name of an early IBM product that later became SQL, so this isn't a case of trying to sound out letters.

Virtual Machines

How are windows and linux programs different?  Both consist of generic sequences of x86 machine code, more or less interchangeable, interspersed with OS-specific system calls, the latter generally made through trap instructions, below.

We discussed earlier the two-level user/supervisor cpu model:
Now suppose you want to run Windows under Linux.
Method 1: interpreted machine. You write software to interpret each instruction of the Windows code. Alas, this tends to be slow by a factor of about 10.

Method 2:
When the windows process begins to run, replace the linux trap handler with a modified windows trap handler. The modified handler is formed by jumping to the regular windows trap handler, but first switching back to user mode! We also make one further set of additions to the windows trap/exception handlers, below.

Now the windows trap handler runs, as part of an ordinary linux process. It does the usual validation/testing stuff, but eventually it reaches a privileged operation: an I/O instruction, or an update of a page table, or an update to video RAM. At this point, the linux trap mechanism gains control, and now the second part of the VM gains control: the special trap/exception handlers for these privileged operations. These replace the video-RAM write with a window-update, or an I/O call with a linux system call to handle the I/O (perhaps replacing a physical disk with a "disk file" that is part of the linux filesystem), or a virtual-memory page-table update with an update consistent with the physical memory allocated to the VM.

The end result: the windows system call appears to have completed normally, even though the "host system" is running linux.

And this is all you need to do to get windows to run as a "guest" system on a linux host!

(There are a few x386 instructions that don't quite work properly for the above scheme: the windows trap handler can execute instructions that work in user mode, but inform the windows handler (that thinks it is running natively) of something that it "should not" see. One approach is to scan the windows trap handler for such instructions when initially loading it, and then modify them in place.)


   



Python

def collatz(n):
    count = 0
    while n!=1:
        if n%2 == 0: n=n/2
        else: n=3*n+1
        count += 1
    return count


index versus value: lab 3
    vals = map (collatz, range(1000))
    max(vals)
    vals.index(178)

map (function, list)
filter (function, list)
    [x for x in list if function(x)]
reduce(function(x,y), list)

face(): returns a LIST of drawing pieces? Each drawn with x.draw(w)?
    for x in L: x.draw(w)
Here's an example:

from graphics import *

def face(p):
   leyep = p.clone()
   leyep.move(-40, -20)
   reyep = p.clone()
   reyep.move(40, -20)
   nosep = p.clone()
   nosep.move(0,5)
   m1 = p.clone()
   m1.move(-35, 45)
   m2 = p.clone()
   m2.move(35, 45)
   mc = p.clone()
   mc.move(0,60)
   face = []
   face = face + [Circle(p, 100)]
   face = face + [Circle(leyep, 10)]
   face = face + [Circle(reyep, 10)]
   face = face + [Circle(nosep, 5)]
   face = face + [Line(m1, mc)]
   face = face + [Line(mc, m2)]
   return face;

w = GraphWin("pld", 400, 400)

for x in face(Point(200, 200)): x.draw(w)

Word-Count example

How many unique words are there in, say Shakespeare's Sonnet 18?

Shall I compare thee to a summer's day
Thou art more lovely and more temperate
Rough winds do shake the darling buds of May
And summer's lease hath all too short a date
Sometime too hot the eye of heaven shines
And often is his gold complexion dimmed
And every fair from fair sometime declines,
By chance, or nature's changing course untrimmed
But thy eternal summer shall not fade
Nor lose possession of that fair thou ow'st
Nor shall death brag thou wander'st in his shade
When in eternal lines to time thou grow'st
So long as men can breathe, or eyes can see
So long lives this, and this gives life to thee

Counting the words themselves is pretty easy, but how do we detect duplicates?
You can try this with the sonnet above, and see what words is. Here's all this rolled into one function (you need import string first):
def getwords():
    text = open("sonnet18", 'r').read()
    text = string.lower(text)
    text = string.replace(text, '\n', ' ')        # replace newlines with spaces
    words = string.split(text)                # split into words
    return words

Next, we create a dictionary:
    wcounts = {}       

Now we'll write a function to add all the word to the dictionary:
def addwords(wlist):
    for w in wlist:
        if w in wcounts:
            wcounts[w] += 1
        else:
            wcounts[w] = 1

We can then fill the dictionary with
    w = getwords()
    addwords(w)

Demo: type "wcounts" to see the dictonary

Here's a simple function to print those entries in the dictionary for which the count is >= threshold:

def printcounts(thresh):
    for w in wcounts:
        if wcounts[w]>=thresh: print w, wcounts[w]

Demo: try it.