Comp 271-400 Week 1
Crown 103, 4:15-8:15
Welcome
Readings:
Bailey chapter 1, on objects generally.
Bailey chapter 2, on assertions. We will come back to
this later; you can skim it for now.
Bailey chapter 3, on a Vector class
Morin chapter 1, sections 1.1 and 1.2
One slight peculiarity of Morin is
that he refers to the array-based List implementation of chapter 2 as an
ArrayStack.
Primary text: Bailey, online, and maybe Morin, also online.
What the heck is a data structure? Any structure that
involves data! Generally, data that is in addition to that needed to
implement the structure itself.
On page 1 of Morin there is a list of some applications of data structures:
- opening a file and finding the right data blocks on a disk
- Looking up someone in your phone's contacts list
- Logging in to a social network
- Looking up something with Google (or Bing)
- Calling 911
All of these examples in some sense involve search, and
retrieval of the correct data.
Data structures themselves can be a general-purpose container,
like a list, or an application-specific structure. Note,
however, that it is usually a good idea to try to make application-specific
structures into something more general, if only to separate conceptually the
app-specific part from the general part.
Examples we will look at this semester
- Array-based list
- linked list
- ordered tree
- unordered expression tree
- dictionary / tree
- dictionary / hash table
- set / hash table
A couple we won't do much with:
- directed graph
- undirected graph
What else will we do?
We will look at algorithms, specifically for searching
and sorting. The "obvious" algorithms can be much improved on. We will
quantify this by looking at the approximate running time
of algorithms.
We will look at a few advanced programming ideas, though these will be
limited.
We will look at recursion, and how to handle complex
recursive data structures.
As an example of recursion, and as an example of how programming works
"under the hood", we will look at a simple compiler for
a simple language, miniJava.
We will look at how objects and polymorphism
can simplify programming.
Finally, we may look at Java's dynamic memory management (with
garbage collection), and compare it to, say, the C++ approach.
Interfaces
In object-oriented languages, we implement data structures such as in the
list above with classes. The class interface specifies
the publicly available operations, in the form of methods
(or, especially in C++, member functions) that the class provides. The
interface is all the client programmer
needs to know; the implementation can be changed at will. It is not
uncommon, for example, for a programmer to change a list implementation from
array to linked, or vice-versa.
Morin discusses interfaces in Section 1.2. Examples: queue, list, set,
sorted set
Classes let us:
- Keep the client-programmer interface abstract
- Separate interface from implementation
- Allow the implementation to be improved later
Long before object-oriented programming was a thing, there were abstract
data types: data structures that presented to the user-programmer
only the minimum interface needed; access to the underlying implementation
was restricted. For example, a stack implemented using an array would not
grant access to individual array components; the programmer could use only
the interface operations push() and pop().
General-purpose container classes typically involve type
parameters, called generic classes in Java. The parameter type is
enclosed in angled brackets, a tradition that comes from the C++ notion of a
template.
ArrayList<int> theCounts;
ArrayList<Message> mailbox;
The central idea of a class is that we can define the public interface
however we want, and keep the internals to ourselves. Thus, outside users
have only the interface we
provide. If there is a length field, or a pointer to the first cell, then if
we keep the field private we need not fear that some client programmer will
modify it and thus make a mess of it.
Class rule #1: all class data fields should be private.
This way, you who writes the class has full control over its internal
structure. Bailey prefers protected,
but there is considerable argument against that.
Note that
protected is often considered to be a bad idea,
despite Bailey's enthusiasm; we will use private instead!
Who are we keeping data private (or protected) from?
The idea is that you are the class implementor, and some other programmer
(possibly you at a later date) is the client
programmer. See Bailey §1.9.
Class methods can be divided into accessors, which are
guaranteed never to modify internal class fields, and mutators,
which may (though perhaps only in selected cases). An accessor that grants
access to a specific class field, for example getXcoord() or getName(), is
called a field accessor; corresponding field
mutators might be setXcoord() and setName().
A class need not have any mutators; such a class is said to be immutable.
This means that, once the object is created, it cannot be changed. If one
wants a modified object, one has to create a new one.
String
Example 1: Bailey, p 7: two ways of implementing a string. Possible
interface:
charAt(int), length()
Note that how we implement the class has no bearing on the interface!
Other examples of objects:
- Point (x and y coordinates)
- Ratio (representing an exact fraction)
- Student record (name, address, registrations, etc)
- BankAccount record
- Association
- Stack
- Rectangle
The Point, Ratio, BankAccount, Association and Rectangle objects each
consist of two member values; Point, Ratio and BankAccount consist of two
integer member values. But there is quite a difference!
Point
Here is a pretty minimal class. We have a constructor and two (field)
accessors. There is no mutator, and so the class is immutable. If you want
to move a point, you have to create a new one.
Note the convention that fields begin with _; many find some such convention
helpful. (To be fair, others find this kind of convention irksome.)
class Point {
private int _xcoord;
private int _ycoord;
public Point(int x, int y) {
_xcoord = x;
_ycoord = y;
}
public int getX() {return _xcoord;}
public int getY() {return _ycoord;}
}
Note the methods getX() and getY(),
which are field accessors. Field mutators,
to allow "moving" a point, would make sense, but are not necessary as it's
just as easy to create a new point at the new location.
Ratio example, Bailey p 8
We are creating a class to represent rational numbers. Note especially the
gcd() and reduce() methods (we return to gcd() below). Also note that the
Ratio class is again immutable. To change a value, we assign an entirely new
value.
The ratio class has field accessors getNumerator() and getDenominator(), but
these don't quite work they way you might think. What is returned by
getNumerator for new Ratio(4,6)?
For new Ratio(4,-6)?
toString()
Note the toString() method of the Ratio class. You can call this explicitly
whenever needed, as r.toString(). However, note that toString() works (for
us) more generally; it is our first example of inheritance.
(There is nothing special about toString(); any method can, in the right
circumstances, have its workings affected by inheritance.) The master parent
class Object defines toString(); any subclass can override
that definition, as is being done here. In System.out.println(), when
something needs to be printed then toString() is called implicitly; the
rules of inheritance ensure that the most
specific version of toString is the one to be invoked.
See objects.html#object_toString.
Class demo:
- change Ratio.toString() so it prints differently (in
demos/ratio.java). Perhaps parentheses should be included: (19/37)
- verify that conversion is not automatic;
we can't do String s = r1.
System.out.println() must be explicitly
calling toString!
Loops
Suppose we have ArrayList<String> L, that has data in it. How can we
print out the entries?
1. while loop
int
i=0; // Java
while (i< L.size()) {
System.out.println(L.get(i));
i++;
}
For a while loop, the loop
variable 'i' must be declared
before the while.
2. for loop ("classic for")
for
(int i = 0; i< L.size(); i++) { // Java
System.out.println(L.get(i));
}
Note that I've chosen to declare i within the loop here. You can do that or
else declare the loop variable as in the while loop example above.
3. for-each loop
for
(String s : L)
System.out.println(s);
Note that we don't have get(i) here; the for-each loop uses the String
variable s as the "loop variable". Note that s must
be declared within the loop, as shown. Java takes care of assigning to s
each element of L, in turn.
4. Iterator loop
Iterator<String>
it = words.iterator();
while (it.hasNext())
System.out.println(it.next());
This is an iterator. Iterators were sort of a predecessor to the for-each
loop. Both Iterators and for-each work for any Collection,
not just ArrayList. Why would you use an Iterator, rather than the for-each
loop? There are times when the loop structure isn't so simple; consider a
single loop that takes elements from two lists, one from each for each loop
pass. You can't do that with a for-each loop, because the for-each loop
would go through just one of the lists.
What an iterator is is a precise
way of keeping track of the "current position" in a list. The actual object
representing the iterator has two pieces: a reference to the original list,
and also a current position.
Loop Patterns
It is also possible to approach loops from the perspective of how to
write different loops to accomplish different things. For a good summary
of loop patterns, see www.cs.uni.edu/~wallingf/patterns/loops.html.
Here are a few examples:
1. Process All Items: Here the for-each, classic for or
while loops all work. The latter two give you the position value ("i").
You would use a process-all-items loop to find the sum of the values in a
list. How would you find the maximum? Here you might use a conditional
process-all-items approach:
int i=0;
int max=0;
while (i < A.size()) {
if (A[i]>max) max=A[i];
i++;
}
2. Loop-and-a-half, or Process-items-until-an-event:
Suppose we want to process items until something happens. Suppose we're reading
data values, and a special sentinel values (-1) is returned. We can do
this as:
val = getvalue();
while (val != sentinel) {
processvalue(val);
val = getvalue();
}
A popular alternative is the break loop:
while (true) {
val = getvalue();
if (val == sentinel) break;
processvalue(val);
}
This approach irritates some language purists, but I am not one of them.
3. Searching a list: this is like the preceding, except
that we're searching a list for a value v for which valtest(v) is true.
However, there might not be any such value in the list, so we have to
ensure that we don't run off the end of the list:
int i = 0;
while (i<A.size() && ! valtest(A[i])) {
// not valtest
i++;
}
This loop looks suspicious! The loop body is too plain! But it works. If
the loop terminates with an i<A.size(), then A[i] is the value for
which valtest() succeeded; if it terminates with i==A.size() then no value
was found. The alternative is the break loop:
int i=0;
while (true) {
if (i>= A.size() || valtest(A[i]))
break; // found it
i++;
}
Note the sense of the condition is reversed.
Back to the Ratio class
The gcd() method on Bailey page 9 is recursive:
it calls itself. How does this work?
There are a few separate issues. First, we note that gcd(a,b) = gcd(a,b%a),
always; any divisor of a and b is a divisor of b%a (which has the form
b-ka), and any divisor of a and b%a is a divisor of b.
The second issue, though, is how it can even be legal for a function to call
itself. Internally, the runtime system handles this by creating a separate
set of local variables for each call to gcd(). This is done on the so-called
runtime stack. This means that different calls to gcd(), with
different parameter values, don't interact or interfere.
Finally, there's the question of whether rgcd() ever returns. One
way to prove this is to argue that the first parameter to rgcd() keeps
getting smaller. We stop when it reaches 0, as it must. The atomic
case in the recursion is the case that involves no further
recursive calls; in the gcd() example it is the case when a==0.
How could we create an iterative
(looping) version? Here's one possibility.
// pre: a>=0,
b>=0, a>0 or b>0
int gcd(int a, int b) {
while (a>0 && b>0) {
if (a>=b) a = a % b;
else b = b % a;
}
if (a==0) return b; else return a;
More classes with two fields
Ratio
Both fields are integers. There are accessors for numerator and denominator,
but no mutators. Also, the numerator and denominator
stored may not equal the numerator and denominator supplied by the
constructor.
Point
Again, both fields are integers. There are accessors for both x and y, but,
again, no mutators. However, the x and y supplied by the constructor are the
x and y actually used.
Student and BankAccount examples
Bailey includes a BankAccount example on page 11. There are two fields: an
account_number and a balance (Bailey has the account_number be a String,
though it could be an integer as well).
A related example might be a Student class (which has more than
two fields!), with fields for name, address, and other personal examples.
Each of these classes comes equipped with a nearly full range of field
accessors that return individual fields, and field
mutators that update them.
Note, however, that in the BankAccount class there is no mutator for the account field; we do not anticipate
changing that. Also, there is no field mutator for balance;
we can deposit() and withdraw() money, but we can't just set the balance to
whatever we want. In a real banking system, this helps the programmers
verify that when money is deposited, it has to come from somewhere else.
Notice also the pre- and post-conditions for the methods. These are a good
idea, though they take some getting used to and "trivial" preconditions can
be inscrutable.
Also note the .equals() method.
The Student and BankAccount examples can be deceptive, as they tend to focus
primarily on fields. In this sense
they are more like database records than java classes. Java/C#/C++ classes
tend to focus on methods. (The
class Point also is dominated by its fields.) Compare with the class Ratio,
with nontrivial methods reduce() and gcd(). (The BankAccount class might potentially have some "nontrivial"
methods added to move funds around that verified the money was there, but
this simple example doesn't do that.)
Association
The Association class in §1.5 (class on p 16, example on p 15) is simply a
"pair", <key,value>, where we provide accessors for both fields but a
mutator only for the value. That is, we do not allow ourselves to change the
key (however, we can create a new Association object with a new key). We
also provide an equals() method.
Note that there are two constructors for this class. Furthermore, the
single-parameter constructor calls the two-param constructor; if we
restructure the underlying implementation only the two-param constructor
needs to be rewritten.
How does Association differ from Point?
Rectangle
Bailey's Rectangle class (page 22) contains two Point objects. There are
mutators to set the left x-coordinate, and the width (and presumably the
lower y-coordinate and the height). But none of these is directly a field
mutator for the two internal Point objects with which the rectangle is
constructed.
Classes based on shapes form one of the most common examples of an
object hierarchy with polymorphism and inheritance. That is, there is a
base class Shape, and then child classes, say, Rectangle and Triangle. We
can create shapes as:
Shape s1 = new Rectangle(...);
Shape s2 = new Triangle(...);
and then draw the shape with
s1.draw();
What makes this work?
But the bulk of Bailey's Rectangle example does not involve inheritance. The
primary goal of this section is how we might come up with a good interface
for class Rectangle.
Note the drawOn() precondition that the window is a valid one. Note also the
relatively rich set of operations. Note also that for left()
and width() the accessor and the
mutator have the same name! Java distinguishes between the two by the
presence of the parameter. Some people find this approach helpful; others
find it too confusing by half. The most common naming strategy is probably left() for the accessor and setLeft()
for the mutator.
Normally, drawing a Rectangle is a primitive operation in the graphics
library.
Interfaces
See section 1.8 starting at p 23. Bailey starts with interface Structure,
which at first appears to implement a basic list. Note, however, that
there is no mechanism to access the ith element of the Structure; it is
not a list because you cannot retrieve elements in list order.
We could then have class List extend Structure, and also class Set extend
Structure.
Vector
See lists.html#vector.
Introduction to C++
Here are a few notes on this: Intro
to C++
What about installing it?
Macs sometimes have xcode. Or you can get it at https://developer.apple.com/xcode/
(or maybe the Apple App Store).
For windows, you can install MS Visual Studio, or mingw.
The link to the MSDNAA site for Visual Studio keeps changing; right now it
seems to be called Microsoft Imagine and is at docs.cs.luc.edu/syshandbook/academic-alliances-programs.html.
Be sure to click register the first time you connect.
Your account identifier is your Loyola email address, with the "@luc.edu".
Hangman
The Hangman example (with embedded class WordList) starts at page 18. What
is different about the WordList class? How do words get accessed?
This is in §1.6; part of the goal here is the example in §1.8 of an Interface.
On p. 20 is the basic interface of WordList as a standalone class. On pp
22-23, an interface Structure is
defined and WordList is then declared to implement
that interface:
public class WordList implements Structure
That's a Java/C# feature; C++ doesn't quite have "implements".
A Java/C# class can extend
just one parent class, but can implement
multiple interfaces. In particular, a WordList could extend, say, StrList,
and also implement Structure.
C++ does in fact allow classes to extend from multiple parents; this is
called multiple inheritance. The general case is not
nearly as useful as one might think; most (almost all?) reasonable examples
of multiple inheritance involve cases where all but one of the inheritances
is really an "implement".
Big-O notation and Bailey Chapter 5: Analysis
See lists.html#bigO
Binary Search
See sorting.html#binsearch