Data Structures Week 2

Comp 388-005 Week 2

Lewis Tower 415, 4:15-8:15

Welcome

Readings:

    Bailey chapter 1, on objects generally.
    Bailey chapter 2, on assertions. We will come back to this later; you can skim it for now.
    Bailey chapter 3, on a Vector class

    Morin chapter 1, sections 1.1 and 1.2
        One slight peculiarity of Morin is that he refers to the array-based List implementation of chapter 2 as an ArrayStack.

Class rule #1: all class data fields should be private. This way, you who writes the class has full control over its internal structure. Bailey prefers protected, but there is considerable argument against that.

Note that protected is often considered to be a bad idea, despite Bailey's enthusiasm; we will use private instead!

Who are we keeping data private (or protected) from? The idea is that you are the class implementor, and some other programmer (possibly you at a later date) is the client programmer. See Bailey §1.9.

Back to the Ratio class

The gcd() method on Bailey page 9 is recursive: it calls itself. How does this work?

There are a few separate issues. First, we note that gcd(a,b) = gcd(a,b%a), always; any divisor of a and b is a divisor of b%a (which has the form b-ka), and any divisor of a and b%a is a divisor of b.

The second issue, though, is how it can even be legal for a function to call itself. Internally, the runtime system handles this by creating a separate set of local variables for each call to gcd(). This is done on the so-called runtime stack. This means that different calls to gcd(), with different parameter values, don't interact or interfere.

Finally, there's the question of whether rgcd() ever returns. One way to prove this is to argue that the first parameter to rgcd() keeps getting smaller. We stop when it reaches 0, as it must. The atomic case in the recursion is the case that involves no further recursive calls; in the gcd() example it is the case when a==0.

How could we create an iterative (looping) version? Here's one possibility.

// pre: a>=0,
      b>=0, a>0 or b>0

      int gcd(int a, int b) {

          while (a>0 && b>0) {

             if (a>=b) a = a % b;

             else b = b % a;

          }

          if (a==0) return b; else return a;

Hangman

The Hangman example (with embedded class WordList) starts at page 18. What is different about the WordList class? How do words get accessed?

This is in §1.6; part of the goal here is the example in §1.8 of an Interface. On p. 20 is the basic interface of WordList as a standalone class. On pp 22-23, an interface Structure is defined and WordList is then declared to implement that interface:
public class WordList implements Structure
That's a Java/C# feature; C++ doesn't quite have "implements".

A Java/C# class can extend just one parent class, but can implement multiple interfaces. In particular, a WordList could extend, say, StrList, and also implement Structure.

C++ does in fact allow classes to extend from multiple parents; this is called multiple inheritance. The general case is not nearly as useful as one might think; most (almost all?) reasonable examples of multiple inheritance involve cases where all but one of the inheritances is really an "implement".

Big-O notation and Bailey Chapter 5: Analysis

See lists.html#bigO

Binary Search

See sorting.html#binsearch

Pre- and Post-conditions

Bailey addresses these in Chapter 2.

A simple example of a precondition is that the function Math.sqrt(double x) requires that x>=0. The postcondition is something that is true afterwards,on the assumption that the precondition held (in this case, that the value returned is a "good" floating-point approximation to the squareroot of x). Note that sometimes precondition X is replaced in Java with the statement that "an exception is thrown if X is false"; this is probably best thought of as amounting to the same thing.

Note that it is up to the caller of a function to verify the precondition. Sometimes (though not always) the function verifies that the preconditions hold.

An invariant is a statement that is both pre and post: if it holds at the start, then it still holds at the end. The classic example is a loop invariant.

    int sum = 0;
    int n=0;
    while (n<=100) {    // invariant: sum = 1+2+...+n
       n += 1;
       sum += n;
    }

We're not going to obsess about these, but they're good to be familiar with. Most loop invariants are either not helpful or are hard to write down; sometimes, however, they can really help clear up what is going on.

Consider again the Ratio class. One version of the gcd() method was recursive: it calls itself. But we also had an iterative version:

// pre: a>=0,
      b>=0

      int gcd(int a, int b) {

          while (a>0 && b>0) {

             if (a>=b) a = a % b;

             else b = b % a;

          }

          if (a==0) return b; else return a;

Is there an invariant we can use here? Basically, the gcd of a and b never changes. How do we write that?

Object Semantics

See objects.html#semantics.

Linked List

See lists.html#linked.

List-related examples:

Table of Factors

This is the example on Bailey page 88. Let us construct a table of all the k<=n and a list of all the factors (prime or not) of k, and ask how much space is needed. This turns out to be n log n. The running time to construct the table varies with how clever the algorithm is, it can be O(n²) [check all i<k for divisibility], O(n^3/2) [check all i<sqrt(k)], or O(n log n) [Sieve of Eratosthenes].

Space in a string

The answer depends on whether we're concerned with the worst case or the average case (we are almost never interested in the best case). If the average case, then the answer typically depends on the probability distribution of the data.

More complexity

A function is said to be polynomial if it is O(n^k) for some fixed k; quadratic growth is a special case.
So far we've been looking mainly at running time. We can also consider space needs.

Chapter 6: Sorting

See sorting.html#sorting

We did not look at the details of QuickSort.

Recursion starts at Bailey page 94
See recursion.html

We got as far as a first look at factorial(n).