Comp 388-005 Week 1
Lewis Tower 415, 4:15-8:15
Welcome
Readings:
Bailey chapter 1, on objects generally.
Bailey chapter 2, on assertions. We will come back to
this later; you can skim it for now.
Bailey chapter 3, on a Vector class
Morin chapter 1, sections 1.1 and 1.2
One slight peculiarity of Morin is
that he refers to the array-based List implementation of chapter 2 as an
ArrayStack.
IDEs: Xamarin v eclipse v command-line.
Primary text: Bailey, online, and maybe Morin, also online. Both books use
Java for examples, but this makes surprisingly little difference, unless you
are trying to run the code directly as C#.
Information about MSDNAA is in the Intro to C++
section
What the heck is a data structure? Any structure that
involves data! Generally, data that is in addition to that needed to
implement the structure itself.
On page 1 of Morin there is a list of some applications of data structures:
- opening a file and finding the right data blocks on a disk
- Looking up someone in your phone's contacts list
- Logging in to a social network
- Looking up something with Google (or Bing)
- Calling 911
All of these examples in some sense involve search, and
retrieval of the correct data.
Data structures themselves can be a general-purpose container,
like a list, or an application-specific structure. Note,
however, that it is usually a good idea to try to make application-specific
structures into something more general, if only to separate conceptually the
app-specific part from the general part.
Examples we will look at this semester
- Array-based list
- linked list
- ordered tree
- unordered expression tree
- dictionary / tree
- dictionary / hash table
- set / hash table
A couple we won't do much with:
- directed graph
- undirected graph
What else will we do?
We will look at algorithms, specifically for searching
and sorting. The "obvious" algorithms can be much improved on. We will
quantify this by looking at the approximate running time
of algorithms.
We will take a brief look at the C++ language, though
most assignments will still be in C#. But C++ has some important
differences with the way memory management works.
We will look at a few advanced programming ideas, though these will be
limited.
We will look at recursion, and how to handle complex
recursive data structures.
As an example of recursion, and as an example of how programming works
"under the hood", we will look at a simple compiler for
a simple language, miniJava.
We will look at how objects and polymorphism
can simplify programming.
Interfaces
In object-oriented languages, we implement data structures such as in the
list above with classes. The class interface specifies
the publicly available operations, in the form of methods
(or, especially in C++, member functions) that the class provides. The
interface is all the client programmer
needs to know; the implementation can be changed at will. It is not
uncommon, for example, for a programmer to change a list implementation from
array to linked, or vice-versa.
Morin discusses interfaces in Section 1.2. Examples: queue, list, set,
sorted set
Classes let us:
- Keep the client-programmer interface abstract
- Separate interface from implementation
- Allow the implementation to be improved later
Long before object-oriented programming was a thing, there were abstract
data types: data structures that presented to the user-programmer
only the minimum interface needed; access to the underlying implementation
was restricted. For example, a stack implemented using an array would not
grant access to individual array components; the programmer could use only
the interface operations push() and pop().
General-purpose container classes typically involve type
parameters, called generic classes in C#. The parameter type is
enclosed in angled brackets, a tradition that comes from the C++ notion of a
template.
List<int> theCounts;
List<Message> mailbox;
The central idea of a class is that we can define the public interface
however we want, and keep the internals to ourselves. Thus, outside users
have only the interface we
provide. If there is a length field, or a pointer to the first cell, then if
we keep the field private we need not fear that some client programmer will
modify it and thus make a mess of it.
Class rule #1: all class data fields should be private.
This way, you who writes the class has full control over its internal
structure. Bailey prefers protected,
but there is considerable argument against that.
Note that
protected is often considered to be a bad idea,
despite Bailey's enthusiasm; we will use private instead!
Who are we keeping data private (or protected) from?
The idea is that you are the class implementor, and some other programmer
(possibly you at a later date) is the client
programmer. See Bailey §1.9.
Class methods can be divided into accessors, which are
guaranteed never to modify internal class fields, and mutators,
which may (though perhaps only in selected cases). An accessor that grants
access to a specific class field, for example getXcoord() or getName(), is
called a field accessor; corresponding field
mutators might be setXcoord() and setName().
A class need not have any mutators; such a class is said to be immutable.
This means that, once the object is created, it cannot be changed. If one
wants a modified object, one has to create a new one.
String
Example 1: Bailey, p 7: two ways of implementing a string. Possible
interface:
charAt(int), length()
Note that how we implement the class has no bearing on the interface!
Other examples of objects:
- Point (x and y coordinates)
- Ratio (representing an exact fraction)
- Student record (name, address, registrations, etc)
- BankAccount record
- Association
- Stack
- Rectangle
Point
Here is a pretty minimal class. We have a constructor and two (field)
accessors. There is no mutator, and so the class is immutable. If you want
to move a point, you have to create a new one.
Note the convention that fields begin with _; many find some such convention
helpful. (To be fair, others find this kind of convention irksome.)
class Point {
private int _xcoord;
private int _ycoord;
public Point(int x, int y) {
_xcoord = x;
_ycoord = y;
}
public int getX() {return _xcoord;}
public int getY() {return _ycoord;}
}
Note the methods getX() and getY(),
which are field accessors. Field mutators, to allow "moving" a point, would
make sense, but are not necessary as it's just as easy to create a new
point.
Ratio example, Bailey p 8
We are creating a class to represent rational numbers. Note especially the
gcd() and reduce() methods (we return to gcd() below). Also note that the
Ratio class is again immutable. To change a value, we assign an entirely new
value.
The ratio class has field accessors getNumerator() and getDenominator(), but
these don't quite work they way you might think. What is returned by
getNumerator for new Ratio(4,6)?
For new Ratio(4,-6)?
Here's my port of the Ratio class to C#.
ToString()
Note the ToString() method of the Ratio class. You can call this explicitly
whenever needed, as r.ToString(). However, note that ToString() works (for
us) more generally; it is our first example of inheritance.
(There is nothing special about ToString(); any method can, in the right
circumstances, have its workings affected by inheritance.) The master parent
class Object defines ToString(); any subclass can override
that definition, as is being done here. In WriteLine(), when something needs
to be printed then ToString() is called implicitly; the rules of inheritance
ensure that the most specific version
of ToString is the one to be invoked.
See objects.html#object_ToString.
Class demo:
- change Ratio.ToString() so it prints differently (in demos/ratio.cs).
Perhaps parentheses should be included: (19/37)
- verify that conversion is not automatic;
we can't do String s = r1.
WriteLine must be explicitly
calling ToString!
(actually, C++ does have some automatic type conversions.)
Loops
Suppose we have List<string> L, that has data in it. How can we print
out the entries?
1. while loop
int
i=0; // C#
while (i< L.size()) {
Console.WriteLine(L[i]);
i++;
}
For a while loop, the loop
variable 'i' must be declared
before the while.
2. for loop ("classic for")
for
(int i = 0; i< L.Count; i++) { // C#
Console.WriteLine(L[i]);
}
Note that I've chosen to declare i within the loop here. You can do that or
else declare the loop variable as in the while loop example above.
3. for-each loop
foreach
(string s in L)
Console.WriteLine(s);
Note that we don't have get(i) here; the for-each loop uses the String
variable s as the "loop variable". Note that s must
be declared within the loop, as shown. C# takes care of assigning to s each
element of L, in turn.
4. Iterator loop (java style)
Iterator<String> it =
words.iterator();
while (it.hasNext())
System.out.println(it.next());
This is an iterator. Iterators were sort of a predecessor to the for-each
loop. Both Iterators and for-each work for any Collection,
not just ArrayList. Why would you use an Iterator, rather than the for-each
loop? There are times when the loop structure isn't so simple; consider a
single loop that takes elements from two lists, one from each for each loop
pass. You can't do that with a for-each loop, because the for-each loop
would go through just one of the lists.
What an iterator is is a precise
way of keeping track of the "current position" in a list. The actual object
representing the iterator has two pieces: a reference to the original list,
and also a current position.
C# has iterators also, but we won't use them.
Back to the Ratio class
The gcd() method on Bailey page 9 is recursive:
it calls itself. How does this work?
There are a few separate issues. First, we note that gcd(a,b) = gcd(a,b%a),
always; any divisor of a and b is a divisor of b%a (which has the form
b-ka), and any divisor of a and b%a is a divisor of b.
The second issue, though, is how it can even be legal for a function to call
itself. Internally, the runtime system handles this by creating a separate
set of local variables for each call to gcd(). This is done on the so-called
runtime stack. This means that different calls to gcd(), with
different parameter values, don't interact or interfere.
Finally, there's the question of whether rgcd() ever returns. One
way to prove this is to argue that the first parameter to rgcd() keeps
getting smaller. We stop when it reaches 0, as it must. The atomic
case in the recursion is the case that involves no further
recursive calls; in the gcd() example it is the case when a==0.
How could we create an iterative
(looping) version? Here's one possibility.
// pre: a>=0,
b>=0, a>0 or b>0
int gcd(int a, int b) {
while (a>0 && b>0) {
if (a>=b) a = a % b;
else b = b % a;
}
if (a==0) return b; else return a;
Finally, note the ToString() method, and the override
keyword. ToString() is our first example of inheritance.
The master parent class Object defines ToString(); any subclass can override
that definition, as is being done here. In WriteLine(), when something needs
to be printed then its ToString() method is called implicitly; the rules of
inheritance ensure that the most specific
version of ToString is the one to be invoked.
Student and BankAccount examples
Bailey includes a BankAccount example on page 11. A related example might be
a Student class, with fields for name, address, and other personal examples.
Each of these classes comes equipped with a nearly full range of field
accessors that return individual fields, and field
mutators that update them.
Note, however, that there is no mutator for the account
field; we do not anticipate changing that. Also, there is no field mutator
for balance; we can deposit() and
withdraw() money, but we can't just set the balance to whatever we want. In
a real banking system, this helps the programmers verify that when money is
deposited, it has to come from somewhere else.
(C# has an optional different way of generating automatic accessors and
mutators, but you don't have to use that.)
Notice also the pre- and post-conditions for the methods. These are a good
idea, though they take some getting used to and "trivial" preconditions can
be inscrutable.
Also note the .Equals() method (.equals() in Java).
The Student and BankAccount examples can be deceptive, as they tend to focus
primarily on fields. In this sense
they are more like database records than java classes. Java/C#/C++ classes
tend to focus on methods. (The
class Point also is dominated by its fields.) Compare with the class Ratio,
with nontrivial methods reduce() and gcd(). (The BankAccount class might potentially have some "nontrivial"
methods added to move funds around that verified the money was there, but
this simple example doesn't do that.)
Association
The Association class in §1.5 (class on p 16, example on p 15) is simply a
"pair", <key,value>, where we provide accessors for both fields but a
mutator only for the value. That is, we do not allow ourselves to change the
key (however, we can create a new Association object with a new key). We
also provide an Equals() method.
Note that there are two constructors for this class. Furthermore, the
single-parameter constructor calls the two-param constructor; if we
restructure the underlying implementation only the two-param constructor
needs to be rewritten.
How does Association differ from Point?
Rectangle
Classes based on shapes form one of the most common examples of an object
hierarchy with polymorphism and inheritance. That is, there is a base class
Shape, and then child classes, say, Rectangle and Triangle. We can create
shapes as:
Shape s1 = new Rectangle(...);
Shape s2 = new Triangle(...);
and then draw the shape with
s1.draw();
What makes this work?
But the bulk of Bailey's Rectangle example does not involve inheritance. The
primary goal of this section is how we might come up with a good interface
for class Rectangle.
Note the drawOn() precondition that the window is a valid one. Note also the
relatively rich set of operations. Note also that for left()
and width() the accessor and the
mutator have the same name! Java and C# distinguish between the two by the
presence of the parameter. Some people find this approach helpful; others
find it too confusing by half. The most common naming strategy is probably left() for the accessor and setLeft()
for the mutator.
Normally, drawing a Rectangle is a primitive operation in the graphics
library.
Interfaces
See section 1.8 starting at p 23. Bailey starts with interface Structure,
which at first appears to implement a basic list. Note, however, that
there is no mechanism to access the ith element of the Structure; it is
not a list because you cannot retrieve elements in list order.
We could then have class List extend Structure, and also class Set extend
Structure.
Vector
See lists.html#vector.
Introduction to C++
Here are a few notes on this: Intro
to C++
What about installing it?
Macs sometimes have xcode. Or you can get it at https://developer.apple.com/xcode/
(or maybe the Apple App Store).
For windows, you can install MS Visual Studio, or mingw.
The link to the MSDNAA site for Visual Studio keeps changing; right now it
seems to be called Microsoft Imagine and is at docs.cs.luc.edu/syshandbook/academic-alliances-programs.html.
Be sure to click register the first time you connect.
Your account identifier is your Loyola email address, with the "@luc.edu".
Hangman
The Hangman example (with embedded class WordList) starts at page 18. What
is different about the WordList class? How do words get accessed?
This is in §1.6; part of the goal here is the example in §1.8 of an Interface.
On p. 20 is the basic interface of WordList as a standalone class. On pp
22-23, an interface Structure is
defined and WordList is then declared to implement
that interface:
public class WordList implements Structure
That's a Java/C# feature; C++ doesn't quite have "implements".
A Java/C# class can extend
just one parent class, but can implement
multiple interfaces. In particular, a WordList could extend, say, StrList,
and also implement Structure.
C++ does in fact allow classes to extend from multiple parents; this is
called multiple inheritance. The general case is not
nearly as useful as one might think; most (almost all?) reasonable examples
of multiple inheritance involve cases where all but one of the inheritances
is really an "implement".
Big-O notation and Bailey Chapter 5: Analysis
See lists.html#bigO
Binary Search
See sorting.html#binsearch