Data Structures Week 2

Comp 271-400 Week 2

Lewis Tower 415, 4:15-8:15

Welcome

Readings:

    Bailey chapter 1, on objects generally.
    Bailey chapter 2, on assertions. We will come back to this later; you can skim it for now.
    Bailey chapter 3, on a Vector class

    Morin chapter 1, sections 1.1 and 1.2
        One slight peculiarity of Morin is that he refers to the array-based List implementation of chapter 2 as an ArrayStack.

There is no difference between System.out.format() and System.out.printf().

Downey and Mayfield discuss ArrayList starting on page 216.

Class rule #1: all class data fields should be private. This way, you who writes the class has full control over its internal structure. Bailey prefers protected, but there is considerable argument against that.

Note that protected is often considered to be a bad idea, despite Bailey's enthusiasm; we will use private instead!

Who are we keeping data private (or protected) from? The idea is that you are the class implementor, and some other programmer (possibly you at a later date) is the client programmer. See Bailey §1.9.

Back to the Ratio class

The gcd() method on Bailey page 9 is recursive: it calls itself. How does this work?

There are a few separate issues. First, we note that gcd(a,b) = gcd(a,b%a), always; any divisor of a and b is a divisor of b%a (which has the form b-ka), and any divisor of a and b%a is a divisor of b.

The second issue, though, is how it can even be legal for a function to call itself. Internally, the runtime system handles this by creating a separate set of local variables for each call to gcd(). This is done on the so-called runtime stack. This means that different calls to gcd(), with different parameter values, don't interact or interfere.

Finally, there's the question of whether rgcd() ever returns. One way to prove this is to argue that the first parameter to rgcd() keeps getting smaller. We stop when it reaches 0, as it must. The atomic case in the recursion is the case that involves no further recursive calls; in the gcd() example it is the case when a==0.

How could we create an iterative (looping) version? Here's one possibility.

// pre: a>=0,
      b>=0, a>0 or b>0

      int gcd(int a, int b) {

          while (a>0 && b>0) {

             if (a>=b) a = a % b;

             else b = b % a;

          }

          if (a==0) return b; else return a;

Hangman

The Hangman example (with embedded class WordList) starts at page 18. What is different about the WordList class? How do words get accessed?

This is in §1.6; part of the goal here is the example in §1.8 of an Interface. On p. 20 is the basic interface of WordList as a standalone class. On pp 22-23, an interface Structure is defined and WordList is then declared to implement that interface:
public class WordList implements Structure
That's a Java/C# feature; C++ doesn't quite have "implements".

A Java/C# class can extend just one parent class, but can implement multiple interfaces. In particular, a WordList could extend, say, StrList, and also implement Structure.

C++ does in fact allow classes to extend from multiple parents; this is called multiple inheritance. The general case is not nearly as useful as one might think; most (almost all?) reasonable examples of multiple inheritance involve cases where all but one of the inheritances is really an "implement".

Some classes with two fields

Ratio

Both fields are integers. There are accessors for numerator and denominator, but no mutators. Also, the numerator and denominator stored may not equal the numerator and denominator supplied by the constructor.

Point

Again, both fields are integers. There are accessors for both x and y, but, again, no mutators. However, the x and y supplied by the constructor are the x and y actually used.

BankAccount

Bailey includes a BankAccount example on page 11. There are two fields: an account_number and a balance (Bailey has the account_number be a String, though it could be an integer as well).

There is no mutator for the account field; we do not anticipate changing that. Also, there is no field mutator for balance; we can deposit() and withdraw() money, but we can't just set the balance to whatever we want. In a real banking system, this helps the programmers verify that when money is deposited, it has to come from somewhere else.

Notice also the pre- and post-conditions for the methods. These are a good idea, though they take some getting used to and "trivial" preconditions can be inscrutable.

Also note the .equals() method.

Association

The Association class in §1.5 (class on p 16, example on p 15) is simply a "pair", <key,value>, where we provide accessors for both fields but a mutator only for the value. That is, we do not allow ourselves to change the key (however, we can create a new Association object with a new key). We also provide an equals() method. (The two fields here can be any type.)

Note that there are two constructors for this class. Furthermore, the single-parameter constructor calls the two-param constructor; if we restructure the underlying implementation only the two-param constructor needs to be rewritten.

How does Association differ from Point?

Rectangle

Bailey's Rectangle class (page 22) contains two Point objects. There are mutators to set the left x-coordinate, and the width (and presumably the lower y-coordinate and the height). But none of these is directly a field mutator for the two internal Point objects with which the rectangle is constructed.

Note the drawOn() precondition that the window is a valid one. Note also the relatively rich set of operations. Note also that for left() and width() the accessor and the mutator have the same name! Java distinguishes between the two by the presence of the parameter. Some people find this approach helpful; others find it too confusing by half. The most common naming strategy is probably left() for the accessor and setLeft() for the mutator.

Normally, drawing a Rectangle is a primitive operation in the graphics library.

Big-O notation and Bailey Chapter 5: Analysis

See lists.html#bigO

Binary Search

See sorting.html#binsearch

Pre- and Post-conditions

Bailey addresses these in Chapter 2.

A simple example of a precondition is that the function Math.sqrt(double x) requires that x>=0. The postcondition is something that is true afterwards,on the assumption that the precondition held (in this case, that the value returned is a "good" floating-point approximation to the squareroot of x). Note that sometimes precondition X is replaced in Java with the statement that "an exception is thrown if X is false"; this is probably best thought of as amounting to the same thing.

Note that it is up to the caller of a function to verify the precondition. Sometimes (though not always) the function verifies that the preconditions hold.

An invariant is a statement that is both pre and post: if it holds at the start, then it still holds at the end. The classic example is a loop invariant.

    int sum = 0;
    int n=0;
    while (n<=100) {    // invariant: sum = 1+2+...+n
       n += 1;
       sum += n;
    }

We're not going to obsess about these, but they're good to be familiar with. Most loop invariants are either not helpful or are hard to write down; sometimes, however, they can really help clear up what is going on.

Consider again the Ratio class. One version of the gcd() method was recursive: it calls itself. But we also had an iterative version:

// pre: a>=0,
      b>=0

      int gcd(int a, int b) {

          while (a>0 && b>0) {

             if (a>=b) a = a % b;

             else b = b % a;

          }

          if (a==0) return b; else return a;

Is there an invariant we can use here? Basically, the gcd of a and b never changes. How do we write that?

Object Semantics

See objects.html#semantics.

Linked List

See lists.html#linked.

List-related examples:

Table of Factors

This is the example on Bailey page 88. Let us construct a table of all the k<=n and a list of all the factors (prime or not) of k, and ask how much space is needed. This turns out to be n log n. The running time to construct the table varies with how clever the algorithm is, it can be O(n²) [check all i<k for divisibility], O(n^3/2) [check all i<sqrt(k)], or O(n log n) [Sieve of Eratosthenes].

Space in a string

The answer depends on whether we're concerned with the worst case or the average case (we are almost never interested in the best case). If the average case, then the answer typically depends on the probability distribution of the data.

More complexity

A function is said to be polynomial if it is O(n^k) for some fixed k; quadratic growth is a special case.
So far we've been looking mainly at running time. We can also consider space needs.

Chapter 6: Sorting

See sorting.html#sorting

Introduction to C++

Here are a few notes on this: Intro to C++

What about installing it?

Macs sometimes have Xcode. Or you can get it from the Apple App Store. Here's a link to a simple hello-world Xcode project.

Recursion

Recursion starts at Bailey page 94
See recursion.html