Comp 271 Week 3

Vector

Bailey in §4.2.2 does a similar example to MyList; he calls it Vector. Some differences:

Vector uses an additive capacityIncr; that is, we add a certain amount whenever we need to grow. Note though that if capacityIncr = 0, then we fall back to doubling each time we need to grow.
remove(int index) and remove(EltType element) are defined. The first removes the element by position, the second by the element's own identity.
contains() replaces our indexOf; it is simpler.
Vector extends AbstractList. What would be involved in getting MyList to do that?
Note how much of Bailey's library has to be pulled in just to get Vector to work. There is a simpler way: java packages, and the creation of precompiled Java ARchive files called jar files and ending with ".jar"

SetVector: §3.7

using vectors/Mylists to implement an abstract Set. Note the more limited set of operations; there is no get() and no set(). (There is still an iterator, but there are no guarantees of the order returned.) Note how many other things I had to pull in to get this to work.

add() now works very differently. Could we improve addAll()? On the face of it, to form the union of two sets A and B of size N, we need N² equality comparisons: each element of A has to be compared with each element of B to determine if it is already there. This cost is sometimes said to be O(N²) if we don't care if it's N², or N²/2, or 3N².

Later we'll make this faster with hashing. Brief summary: choose a relatively large M, maybe quite a bit larger than N. Define h(obj) = hashcode(obj) % M. Now choose a big array ht (for hash table) of size M, initially all nulls. For each a in A, do something with ht[hash(a)] to mark the table. Then, for each b in B, if ht[hash(b)] is still null, put it in; it's not a duplicate! If ht[hash(b)] is there already, then we have to check "the long way", but in general we save a great deal.

Is there an intersect option?

Pre- and Post-conditions

A precondition is something that must be true before a method is executed, or else we're not guaranteed sensible results. Virtually all API documentation is filled with preconditions and postconditions.

Bailey's Assert library has methods for this. When an error occurs, an exception is thrown. Note how this would be a simpler way to handle array-bounds checking!

Example: see the use of pre- and post-conditions in the Vector class of Chapter 3, pp 51-52. They're not set in code! However, some preconditions appear in the actual library, commented out.

Matrix

Things to note:

We're using ArrayList to build a 2-D structure.
There's no analogue to ArrayList.add(E e); we have to add entire rows or columns or else the Matrix will no longer be neatly rectangular. Note that we add new rows and columns "empty", that is, populated with nulls.
Not having an add() is a problem for being a Collection. In fact, if you look at the javadoc page for java.util.Collection, add() is there listed as optional. This is why. There is no corresponding issue with Iterator. (BTW, if you declare Matrix to implement Collection, and don't want to include an add() method, your best bet is to include it, but have it throw the UnsupportedOperationException; see the Collection javadoc for more detail.)
Because the generic class uses ArrayLists, not arrays, we don't have any problem using the element type E directly throughout. When we created the Vector and MyList classes, we had that annoying need to use Object when creating arrays even when we wanted EltType.
Matrix.print(int fieldwidth) is a handy way of generating output. Note the parameter. Because of the parameter, making this into toString() is tricky.
Note how Matrix.print() actually works.
java.util does not in fact include a Matrix type.
addRow() versus addColumn()
How do we know all the rows are the same length?

Thursday

Matrix questions

Iterator note: iterators seldom involve loops internally. They are used in loops, but they do not generally contain them.

A simple example of a precondition is that the function Math.sqrt(double x) requires that x>=0. The postcondition is something that is true afterwards, on the assumption that the precondition held (in this case, that the value returned is a "good" floating-point approximation to the square root of x). Note that sometimes precondition X is replaced in java with the statement that "an exception is thrown if X is false"; this is probably best thought of as amounting to the same thing.

An invariant is a statement that is both pre and post: if it holds at the start, then it still holds at the end. The classic example is a loop invariant.

    int sum = 0;
    int n=0;
    while (n<=100) {    // invariant: sum = 1+2+...+n
       n += 1;
       sum += n;
    }

We're not going to obsess about these, but they're good to be familiar with. Most loop invariants are either not helpful or are hard to write down; sometimes, however, they can really help clear up what is going on.

Back to the Ratio class

The gcd() method is recursive: it calls itself. How could we create an iterative (looping) version? Here's one possibility.

// pre: a>=0, b>=0

int gcd(int a, int b) {

    while (a>0 && b>0) {

        if (a>=b) a = a % b;

        else b = b % a;

    }

    if (a==0) return b; else return a;

Is there an invariant we can use here? Basically, the gcd of a and b never changes. How do we write that?

Chapter 5: analysis

Big-O notation: a function f(n) is said to be O(g(n)) for some other function g(n) if, eventually, f(n)<= C * g(n). See Figure 5.1 on page 83.
O(n), O(n²). Some examples:
O(1): appending to the end of an ArrayList
O(n): inserting in the "middle" of an ArrayList requires O(n) moves. (What is the "middle")?
Recall that 1+2+...+n is O(n²).
Searching an ArrayList for a randomly located value requires O(n) comparisons.
Adding an element to a SetVector takes O(n) comparisons
Taking the union or intersection of two sets is O(n²).
Building a list up by inserting each element at the front (or inserting each element at random) is O(n²). (This is the last example on page 87.)
Finding if a number is prime by checking every k < sqrt(n) is O(n^1/2).

Binary Search

Suppose we're searching for something in a sorted array A with N elements? Say, an array of String in alphabetical order? How can we use the fact that things are in order? It's more or less the same way we look things up in the dictionary.
// searching for the i for which A[i] = x
int search(int x) {
    int lo = 0, hi = N-1;      // we know that either A[i]=x and lo<=i<=hi, or else x is not found.
    do {
          int mid = (lo+hi)/2;          // lo <=mid<hi;
          if (A[mid] == x)   return mid;
          if (A[mid]<x) lo = mid+1;       // upper half; lo <=hi
          else hi = mid-1;                       // could result in hi = lo-1!
    while (lo < hi);
    if (lo==hi && a[lo]==x) return lo;
    return -1;

There are two issues here: one is the invariant for the loop, which is the first comment above, and the other is that the number of times we can divide N in half before the loop terminates is log N. (Strictly speaking, that would be log₂(N), but it doesn't much matter.)

We will frequently encounter algorithms with running time O(log N) or O(N log N).

Table of Factors

This is the example on page 88. Let us construct a table of all the k<=n and a list of all the factors (prime or not) of k, and ask how much space is needed. This turns out to be n log n. The running time to construct the table varies with how clever the algorithm is, it can be O(n²) [check all i<k for divisibility], O(n^3/2) [check all i<sqrt(k)], or O(n log n) [Sieve of Eratosthenes].

Space in a string

The answer depends on whether we're concerned with the worst case or the average case (we are almost never interested in the best case). If the average case, then the answer typically depends on the probability distribution of the data.

More complexity

A function is said to be polynomial if it is O(n^k) for some fixed k; quadratic growth is a special case.
So far we've been looking mainly at running time. We can also consider space needs.