Comp 271 Week 3


Vector

Bailey in §4.2.2 does a similar example to MyList; he calls it Vector. Some differences:

SetVector: §3.7

using vectors/Mylists to implement an abstract Set. Note the more limited set of operations; there is no get() and no set(). (There is still an iterator, but there are no guarantees of the order returned.)  Note how many other things I had to pull in to get this to work.

add() now works very differently. Could we improve addAll()? On the face of it, to form the union of two sets A and B of size N, we need N2 equality comparisons: each element of A has to be compared with each element of B to determine if it is already there. This cost is sometimes said to be O(N2) if we don't care if it's N2, or N2/2, or 3N2.

Later we'll make this faster with hashing. Brief summary: choose a relatively large M, maybe quite a bit larger than N. Define h(obj) = hashcode(obj) % M. Now choose a big array ht (for hash table) of size M, initially all nulls. For each a in A, do something with ht[hash(a)] to mark the table. Then, for each b in B, if ht[hash(b)] is still null, put it in; it's not a duplicate! If ht[hash(b)] is there already, then we have to check "the long way", but in general we save a great deal.

Is there an intersect option?

Pre- and Post-conditions

A precondition is something that must be true before a method is executed, or else we're not guaranteed sensible results. Virtually all API documentation is filled with preconditions and postconditions.

Bailey's Assert library has methods for this. When an error occurs, an exception is thrown. Note how this would be a simpler way to handle array-bounds checking!

Example: see the use of pre- and post-conditions in the Vector class of Chapter 3, pp 51-52. They're not set in code! However, some preconditions appear in the actual library, commented out.

Matrix

Things to note:

Thursday

Matrix questions

Iterator note: iterators seldom involve loops internally. They are used in loops, but they do not generally contain them.

A simple example of a precondition is that the function Math.sqrt(double x) requires that x>=0. The postcondition is something that is true afterwards, on the assumption that the precondition held (in this case, that the value returned is a "good" floating-point approximation to the square root of x). Note that sometimes precondition X is replaced in java with the statement that "an exception is thrown if X is false"; this is probably best thought of as amounting to the same thing.

An invariant is a statement that is both pre and post: if it holds at the start, then it still holds at the end. The classic example is a loop invariant.

    int sum = 0;
    int n=0;
    while (n<=100) {    // invariant: sum = 1+2+...+n
       n += 1;
       sum += n;
    }

We're not going to obsess about these, but they're good to be familiar with. Most loop invariants are either not helpful or are hard to write down; sometimes, however, they can really help clear up what is going on.

Back to the Ratio class

The gcd() method is recursive: it calls itself. How could we create an iterative (looping) version? Here's one possibility.
// pre: a>=0, b>=0
int gcd(int a, int b) {
    while (a>0 && b>0) {
       if (a>=b) a = a % b;
       else b = b % a;
    }
    if (a==0) return b; else return a;


Is there an invariant we can use here? Basically, the gcd of a and b never changes. How do we write that?

Chapter 5: analysis

Big-O notation: a function f(n) is said to be O(g(n)) for some other function g(n) if, eventually, f(n)<= C * g(n). See Figure 5.1 on page 83.
O(n), O(n2). Some examples:
O(1): appending to the end of an ArrayList
O(n): inserting in the "middle" of an ArrayList requires O(n) moves. (What is the "middle")?
    Recall that 1+2+...+n is O(n2).
Searching an ArrayList for a randomly located value requires O(n) comparisons.
Adding an element to a SetVector takes O(n) comparisons
Taking the union or intersection of two sets is O(n2).
Building a list up by inserting each element at the front (or inserting each element at random) is O(n2). (This is the last example on page 87.)
Finding if a number is prime by checking every k < sqrt(n) is O(n1/2).

Binary Search

Suppose we're searching for something in a sorted array A with N elements? Say, an array of String in alphabetical order? How can we use the fact that things are in order? It's more or less the same way we look things up in the dictionary.
// searching for the i for which A[i] = x
int search(int x) {
    int lo = 0, hi = N-1;      // we know that either A[i]=x and lo<=i<=hi, or else x is not found.
    do {
          int mid = (lo+hi)/2;          // lo <=mid<hi;
          if (A[mid] == x)   return mid;
          if (A[mid]<x) lo = mid+1;       // upper half; lo <=hi
          else hi = mid-1;                       // could result in hi = lo-1!
    while (lo < hi);
    if (lo==hi && a[lo]==x) return lo;
    return -1;

There are two issues here: one is the invariant for the loop, which is the first comment above, and the other is that the number of times we can divide N in half before the loop terminates is log N. (Strictly speaking, that would be log2(N), but it doesn't much matter.)

We will frequently encounter algorithms with running time O(log N) or O(N log N).
   

Table of Factors

This is the example on page 88. Let us construct a table of all the k<=n and a list of all the factors (prime or not) of k, and ask how much space is needed. This turns out to be n log n. The running time to construct the table varies with how clever the algorithm is, it can be O(n2) [check all i<k for divisibility], O(n3/2) [check all i<sqrt(k)], or O(n log n) [Sieve of Eratosthenes].

Space in a string

The answer depends on whether we're concerned with the worst case or the average case (we are almost never interested in the best case). If the average case, then the answer typically depends on the probability distribution of the data.

More complexity

A function is said to be polynomial if it is O(nk) for some fixed k; quadratic growth is a special case.
So far we've been looking mainly at running time. We can also consider space needs.