The basic List interface resembles an array, in that one can access and update elements by specifying their position. The underlying implementation, however, may not be via an array (for example, linked lists), in which case accessing the Nth element may not be as fast as accessing a corresponding array element.

Most list interfaces also support some mechanism of automatic size expansion to accommodate new additions, reducing the risk of array-bounds overflow. However, attempting to access an element (by position) that is not there is always an error: retrieving item 100 of a list that contains items 0 through 37 can never end well.

Java supports the ArrayList class, and the List interface. We'll only touch on interfaces here.

To declare an ArrayList of strings, use

import java.util.*; // or
java.util.ArrayList

...

ArrayList<String> L = new
ArrayList<String>()

The constructor here is the right side; there is no parameter. The
<String> is the "type parameter". ArrayList is a **generic**
class, as are most data-structure classes.

The list L created above is initially empty. We use L.add(str) to add entries:

L.add("apple");

L.add("banana");

L.add("cherry");

The current size of the list is available as L.size(); this would return 3 for the list above. The elements are at positions 0, 1 and 2.

We can now set and retrieve elements with L.set(i,val) and L.get(i), corresponding to A[i]=val and A[i], for an array A. In the above, L.get(0) returns "apple".

The capacity of the list represents the underlying memory allocation, which, for an ArrayList, is an array. It is 10, by default, though we could also have passed in a numeric parameter to the constructor: new ArrayList<String>(100).Things get interesting when you keep using .add to insert one more element than there is room for. A new internal array is allocated, and the contents of the old array are copied over. The .add() operation then continues as if nothing unusual had happened.

It is possible to use .add() to add an entire array's worth of elements. We can also use .add(pos, val) to add the value to position pos, rather than to the end of the list. In this case, the value that was at position pos is moved to position pos+1, and so on; nothing is overwritten. Other useful ArrayList options:

- L.contains(val) returns true if val occurs in L
- L.indexOf(val) finds position of first occurrence of val in L, or -1 if val is not there
- L.remove(n) removes value at position n
- L.remove(val) finds first occurrence of val and removes it
- L.subList(from, to) returns new list of L[from],L[from+1], ..., L[to-1]
- L.toArray() returns an array, with no excess capacity

As a general rule, ArrayLists are best when you're not sure of the list length at the start of the program. Arrays work best when you

Chapter 3 contains Bailey's Vector class, which represents a List of Objects implemented via an array. The main feature is that the vector can grow.

We'll start with the Vector interface on page 46.

Bailey's examples:

Vector used for Wordlist

Vector used for L-systems

There is an issue with Vectors of Objects: we don't really want Objects.
Look closely at the code on page 47; it contains a **cast**
to (String):

targetWord = **(String)**
list.get(index);

What this does is takes the Object list.get(index), and, because it is really a string, allows its use as a string at compile-time.

In Java we will usually use **generics**, eg
Vector<T>.

In the word-frequency example on page 49 (actually starting on p 48) there is

wordInfo = (Association) vocab.get(i);

and

vocabWord = (String) wordInfo.getKey();

Adding to the middle of a vector: you need to move from right to left.
See the picture on Bailey p 52.

The demo program is listgrowth.cs.

We first create the list.

List<String> s = new List<String>();

We can examine s.Count to get the current length of the list, and s.Capacity to get the current length of the internal array in object s. Initially, both are 0.

If we add an item: s.Add("apple"), then s.Count = 1 and s.Capacity = 4.

If we add three more items, both s.Count and s.Capacity are 4.

If we then execute s.Add("banana"), s.Count is now 5 and s.Capacity is now 8.

If we then execute s.Add("cherry"), s.Capacity becomes 16.

- addOne(String s): to add one more string
- addSome(int count, String s): to add multiple copies of the string
- addArray(int count, String s): this works like addSome(count,s), but it puts all the new strings into a new ArrayList, and then adds them all at once to the original list.

Try:

- Use the inspector to examine the addOne("apple"). Inspect theList.
What is the size of elementData? Where
*is*elementData, by the way? Where did it come from? - Same after addOne("apple")
- Same after addSome(9,"banana"). Why 9?
- Same after addOne("cherimoya")

Keep going until you can guess the expansion pattern for elementData.

- The number of moves to insert at a random place in the middle of a list of length N is, on average, N/2.
- The number of compares to search a list of length N for a random element that is in fact present is, on average, N/2.
- The number of compares to search a list of length N for an element not there is simply N.

Now suppose we want to insert N items into a list initially of length 0, perhaps searching the list each time in order to insert in alphabetical order. Each item's required position is more-or-less random, and so takes on average size()/2 moves. That is, to insert the 1st element takes 0/2 moves, the 2nd takes 1/2, the 3rd takes 2/2, the 4th takes 3/2, the 5th takes 4/2, etc. Adding all these up gives us a total "average" number of moves of

1/2 + 2/2 + 3/2 + 4/2 + 5/2 + ... + (N-1)/2 = 1/2*(1 + 2+ 3 + ... + (N-1))

= 1/2*(N(N-1)/2) = N

Now, for large N this is approximately N

Both these can be described as O(N).

We would really like to be able to declare containers with a fixed type, where the type is supplied as parameter:

List<String> s = new List<String>();

If we use Bailey's Vector class, we would have pretty much the same
performance, but getting strings *out* of the Vector would always
require a cast:

String s = (String) vect.get(3);

- s.push(A): adds data value A to the stack
- s.pop(): returns the most recently pushed value, and deletes it from the stack

Finding something to do with the stack is harder; why would you need that very specific last-in, first-out (LIFO) access? There are lots of examples from system design and programming-language design, but they tend not to be trivial. One straightforward example is to confirm that a line consisting of ()[]{} has all the braces in balance. The algorithm is as follows:

if you encounter an opening symbol, (, [, or {, push it.

if you encounter a closing symbol, ), ], or }, pop what is on the stack and verify the two correspond.

when you get to the end of the input, verify that the stack is empty.

Note that generally popping something off an empty stack is an error, so that you should check with isEmpty().

class Stack {

private List<string> L;

public void push(string s) {L.Add(s)}

public string pop() {string s = L[L.Count - 1]; L.remove[L.Count-1]; return s;}

public boolean is_empty() {return L.Count == 0;}

}

push(e) corresponds to L.add(e),

is_empty() corresponds to L.size() == 0.

pop() corresponds to {e=L.get(size()-1; L.delete(size()-1); return e;}

If all we do is add, then the growth strategy of doubling the internal space when necessary makes perfect sense.

But what happens if we will regularly grow lists to large size, and then
delete *most* of the entries? A list grown to have internal
capacity 1024 will retain that forever, even if we shrink down to just a
few elements.

One approach is to re-allocate to a *smaller* elements[] whenever
L.Count < L.Capacity/2, or something like that.

Morin in §2.6 (p 49) introduces what he calls a RootishArrayStack, which
is an array-based list with an efficient **delete**
operation. Here are the key facts (big-O notation is officially introduced
in the next section):

- The space used for n elements is n + O(√n)
- For any m add/remove operations, the time spent growing and shrinking is O(m)

The idea is to keep a list of arrays (an array of pointers to arrays). These sub-arrays have size 1, 2, 3, 4, etc respectively. For 10 elements, the RootishArrayStack would have four arrays, and thus a capacity of 1+2+3+4.

If the RootishArrayStack has N elements in n arrays, then N ≃ n²/2; this follows because 1+2+...+n ≃ n²/2. When the RootishArrayStack needs to expand, it will add an array of size n+1 to the pool; this is about √(2n). Thus, growth is "slower" than for C# Lists or Java ArrayLists. However, when a new allocation is made for growth, the old space is not discarded.

The real advantage of the RootishArrayStack is for deletions. If the list shrinks so that the last sub-array is now empty, that sub-array and that sub-array only is discarded. This is a relatively efficient operation.

See Figure 5.1 on page 83.

Here are a few examples for an array-based Vector as in Bailey of length N:

operation |
cost |
provisos and notes |

Inserting at the end of a Vector | O(1) | if no expansion is necessary |

Inserting in the middle of a Vector | O(N) | N/2 moves on average |

Searching a Vector | O(N) | N/2 comparisons on average if found |

Inserting and searching are both

Adding an element to a SetVector takes O(n) comparisons, because we have to make sure it isn't already there.

Now suppose we want to insert N items into a Vector initially of length 0, perhaps searching the list each time in order to insert in alphabetical order. Each item's required position is more-or-less random, and so takes on average size()/2 moves. That is, to insert the 1st element takes 0/2 moves, the 2nd takes 1/2, the 3rd takes 2/2, the 4th takes 3/2, the 5th takes 4/2, etc. Adding all these up gives us a total "average" number of moves of

1/2 + 2/2 + 3/2 + 4/2 + 5/2 + ... + (N-1)/2 = 1/2*(1 + 2+ 3 + ... + (N-1))

= 1/2*(N(N-1)/2) = N

Now, for large N this is approximately N

Building a list up by inserting each element at the front (or inserting each element at random) is O(n

Taking the union or intersection of two sets is O(n

Finding if a number is prime by checking every k < sqrt(n) is O(n

How hard is it to find the minimum of an array of length N? O(N)

How hard is it to find the

See sorting.html#binsearch for an analysis of

A function is said to be polynomial
if it is O(n^{k}) for some fixed k; quadratic growth is a special
case.

So far we've been looking mainly at running time. We can also consider
space needs. As an example, see the **Table of Factors**
example on Bailey page 88. Let us construct a table of all the k<=n and
a list of all the factors (prime or not) of k, and ask how much space
is needed. This turns out to be n log n. The argument here is a bit
mathematical; see Bailey. If the table length is n, then factor f can
appear no more than n/f times (once every fth line).

The running time to construct the table varies with how clever the
algorithm is, it can be

- O(n
^{2}) [check all i<k for divisibility] - O(n
^{3/2}) [check all i<sqrt(k)] - O(n log n) [Sieve of Eratosthenes]

Now suppose we want to search a large string for a specific character. How long should this take? Bailey has an example on p 90. The answer depends on whether we're concerned with the worst case or the average case (we are almost never interested in the best case). If the average case, then the answer typically depends on the probability distribution of the data.

- Bailey, chapter 9 section 4
- Morin, Chapter 3 (p 63) (we will look at singly-linked lists, or Morin's SLList)

Each linked-list block contains two pointers, one for data and one for the link. That's a 2× space overhead. For array-based lists, that would correspond to having each list have a Capacity that was double its Count. That's not necessarily bad, but the point is that linked lists have limited space efficiency. (They

Here is some code from the demo file Tlister.java

class
TLinkedList<T> {

private T data;

private TLinkedList<T> next;

public TLinkedList(T d, TLinkedList<T> n) {data=d; next=n;}

public T first() {return data;}

public TLinkedList<T> rest() {return next;}

}

private T data;

private TLinkedList<T> next;

public TLinkedList(T d, TLinkedList<T> n) {data=d; next=n;}

public T first() {return data;}

public TLinkedList<T> rest() {return next;}

}

The interface is peculiar here; ignore that for now.

A program that uses this might be:

static
void main(String[] args) {

TLinkedList<String> slist = new TLinkedList<String>("apple", null);

slist = new TLinkedList<String>("banana", slist);

slist = new TLinkedList<String>("cherry", slist);

slist = new TLinkedList<String>("daikon", slist);

slist = new TLinkedList<String>("eggplant", slist);

slist = new TLinkedList<String>("fig", slist);

TLinkedList<String> p = slist;

while (p!= null) {

System.out.println(p.first());

p = p.rest();

}

}

TLinkedList<String> slist = new TLinkedList<String>("apple", null);

slist = new TLinkedList<String>("banana", slist);

slist = new TLinkedList<String>("cherry", slist);

slist = new TLinkedList<String>("daikon", slist);

slist = new TLinkedList<String>("eggplant", slist);

slist = new TLinkedList<String>("fig", slist);

TLinkedList<String> p = slist;

while (p!= null) {

System.out.println(p.first());

p = p.rest();

}

}

This is not

class
TLinkedList<T> {

class Cell<T> {

private T data;

private Cell<T> next;

public Cell(T d, Cell<T> n) {data=d; next=n;}

public T first() {return data;}

public Cell<T> rest() {return next;}

}

private Cell<T> head = null;

public void AddToFront(T element) {head = new Cell<T>(element, head);}

public bool is_empty() {return head == null;}

public T First() {return head.first();}

public void DelFromFront() {head = head.rest();}

}

class Cell<T> {

private T data;

private Cell<T> next;

public Cell(T d, Cell<T> n) {data=d; next=n;}

public T first() {return data;}

public Cell<T> rest() {return next;}

}

private Cell<T> head = null;

public void AddToFront(T element) {head = new Cell<T>(element, head);}

public bool is_empty() {return head == null;}

public T First() {return head.first();}

public void DelFromFront() {head = head.rest();}

}

A slightly more complete Cell subclass is the following (changes in

public class Cell<T> { private T data_; private Cell<T> next_; public Cell(T s, Cell<T> n) {data_ = s; next_ = n;} public Tdata() {return data_;} public Cell<T>next() {return next_;} public voidsetData(T s) {data_ = s;} public voidsetNext(Cell<T> c) {next_ = c;} }

pop() corresponds to ...

In section 3.7 Bailey uses vectors/Mylists to implement an abstract Set. Note the more limited set of operations; there is no get() and no set().

add() now works very differently: add(E e) is basically if (!contains(e) ) add(e), where the second add(e) is Vector.add(e).

On the face of it, to form the union of two sets A and B of size N, we need N

Later we'll make this faster with hashing. Brief summary: choose a relatively large M, maybe quite a bit larger than N. Define h(obj) = hashCode(obj) % M. Now choose a big array ht (for hash table) of size M, initially all nulls. For each a in A, do something with ht[hash(a)] to mark the table. Then, for each b in B, if ht[hash(b)] is still null, put it in; it's not a duplicate! If ht[hash(b)] is there already, then we have to check "the long way", but in general we save a great deal.

Is there an intersect option?

- Matrix(int height, int width)
- getWidth()
- getHeight()
- get(int row, int col)
- set(int row, int col, double val)

How should we proceed?

Here's a simpler problem: how should we implement a **Vector<T>**
class, where vector objects have a fixed length, and are initialized to 0?
C# does take care of that latter, but List<T>'s do not automatically
have the right length. Also, ideally we'd like to "hide" the add()
operation, that can make a List<T> grow *longer* than we'd
like.

class
Vector { public Vector(int l) {...}

public int getlength() {...}

public double get(int i) {...}

public void set(int i, double val) {...}

}

public int getlength() {...}

public double get(int i) {...}

public void set(int i, double val) {...}

}

Now let's return to the Matrix class. As in Vector, we will pre-allocate
space for all the elements. Here is some simple code to implement a matrix
class with TList objects (not yet converted to Java).

/** * Class Matrix is implemented by a TList of rows. */ class Matrix { // instance variables - replace the example below with your own private TList<TList<double> > m; // list of lists private int height, width; /** * Constructor for objects of class TList */ public Matrix(int h, int w) { // initialize instance variables height = h; width = w; m = new TList<TList<double>>(height); // we must preallocate all the rows for (int i = 0; i<height; i++) { TList<double> theRow = new TList<double>(width); theRow.Fill(0.0); // we must preallocate all the slots (columns) in each row m.Add(theRow); } } public int getwidth() {return width;} public int getheight() {return height;} // get nth value, with range check public double get(int r, int c) { if (r<0 || r >= height) { Console.WriteLine("Warning: Matrix.get() called with out-of-range row = " + r); return 0.0; } if (c<0 || c >= width) { Console.WriteLine("Warning: Matrix.get() called with out-of-range column = " + c); return 0.0; } return m.get(r).get(c); } // set nth value, with range check public void set(int r, int c, double val) { if (r<0 || r >= height) { Console.WriteLine("Warning: Matrix.set() called with out-of-range row = " + r); return; } if (c<0 || c >= width) { Console.WriteLine("Warning: Matrix.get() called with out-of-range column = " + c); return; } m.get(r).set(c,val); } }

Things to note:

- We're using TList to build a 2-D structure.
- There's no analogue to TList.add(E e); we have to add entire rows or columns or else the Matrix will no longer be neatly rectangular. Note that we add new rows and columns "empty", that is, populated with nulls. (In the code above, there is no way to add a new row or column.)
- Because the generic class uses TLists, not arrays, we don't have any problem using the element type E directly throughout. When we created the Vector and MyList classes, we had that annoying need to use Object when creating arrays even when we wanted EltType.
- Matrix.print(int fieldwidth) is a handy way of generating output. Note the parameter. Because of the parameter, making this into ToString() is tricky. (Not shown above.)
- How do we know all the rows are the same length?

Also, linked lists use memory efficiently if you have a great many shorter lists. While the next_ fields require space, there are no "empty" slots as in an array-based stack. And no memory wasted due to list expansion.

These are

While the array implementation of a stack is quite fast, the linked list approach is equally straightforward. All we have to do is maintain a pointer to the head:

class stack<T> {

private Cell<T> head_;

public bool is_empty() {return (head_ == null);}

public T pop() {T val = head_.data(); head_ =
head_.next(); return val;}

public void push(T val) {head_ = new
Cell<T>(val, head_);}

}

What would we need to do in C++ if we wanted to be sure we deleted a popped cell?

How would you sort a linked list? QuickSort is out.

"When in doubt, use a hash table"

- Brian Fitzpatrick, Google engineering manager and former Loyola undergrad

- Brian Fitzpatrick, Google engineering manager and former Loyola undergrad

One way to search through a large number of values is to create a

Linked lists are particularly convenient for representing the buckets, as we will have a relatively large number of them, and most will be small.

What shall we use as a hash function? This comes up often, and a great number of standard data structures rely on having something available. Therefore, Java provides every object with a hashCode() method. It returns a 32-bit value.

Demo: what are hashcodes of

- int values
- "d"
- "A"
- " "
- "2"

Example: bucket hashing of

"avocado", "banana", "canteloupe", "durian", "eggplant", "feijoa",where hash(s) = s.length();

Many classes choose to "tune" the standard hashCode() by providing their own version. Many data structures will simply assume that two objects with different hashcodes are unequal, so it is important when providing an overriding .Equals() method to also provide .hashCode(). In lab 3, I provided equals() and hashCode() for class LinkedList<T>; for lab 2 I did this for StrList.

If you were to create a class with its own .equals(), but no .hashCode(), search might fail with some containers. Given a container of your class, Java might determine that there was no value in the container that had the same hashCode() value as the search target, and give up, even if there was in fact a value in the container that was .equals() to the search target.

Mid-class exercise: call hashCode() on the following strings:

{ "avocado", "banana", "canteloupe", "durian", "eggplant", "feijoa", "guava", "hackberry", "iceberg", "jicama", "kale", "lime", "mango", "nectarine", "orange", "persimmon", "quince", "rutabega", "spinach", "tangerine", };The above can be assigned to an array string[] A; this is done in hashFruit.java and hashStats.java (in hash).

1. Do all of you get the same values for s.hashCode()? On linux, for "avocado" I get -622659773 and for "guava" I get 98705182. (Java is a little more standard across different platforms than C#.)

2. Now use hash(s), in the file above, and put the strings into htable. For what htablesize do you get buckets with "collisions": more than one string assigned to it? For what htablesize is this particular table collision-free?

3. Can you think of an orderly way of searching for the answer for #2?

The hash table in hash.cs is not actually an object. What do we have to do to make it one? Perhaps htablesize could be a parameter to the constructor.

Another way to do hashing is so-called "open" hashing: a data object **d**
is simply put into htable[hash(**d**)]. If that position is
taken, the next position is used. For this to work, we need to be sure
that htablesize is quite a bit larger (eg at least double) the number of
elements added. Deletions require careful thought. See Bailey 15.4.1.

This traversal is in no particular order!

A class based on this is in hashtable.java; note the print() method. This class uses the string type; there is also inthashclass.cs that uses int (yes, I should have made this use a generic type).

class StrSet { private StrList sl; public StrSet() {sl = new StrList(100); } public boolean isMember(String s) { for (int i=0; i<sl.size(); i++) { if (sl.get(i) == s) return true; } return false; } public void add(string s) { if (isMember(s)) return; sl.add(s); } }But there is a problem here: the isMember() and add() methods are O(N). [why?]

Can we do better? Yes, with hashing.

To create a HashSet, we use a hash table as in hashtable.java. The code for this is in hashsetdemo.java.

class hashset { private hashtable ht; public hashset(int size) {ht = new hashtable(size);} public boolean isMember(String s) { return ht.isMember(s); } public void add(String s) { if (isMember(s)) return; ht.add(s); } public void print() { ht.print(); } }To run this, it must be linked with hashclass.java.

To create a dictionary, we will use generic type parameters K for the key and V for the values. We will rewrite our hashtable class so that the Cell contains fields for the key (of type K) and value (of type V).

The interface will then be:

- V get (K key): returns the value corresponding to key, or else default(V) (generally null)
- void add(K key, V val): adds the new pair. Precondition: K is not already present
- void update(K key, V newval): like add
*.*K may or may not be present.

The dictionary example is at dictionary.cs. It contains a simple driver program.

Demo: use dictionary.cs and count the word occurrences in a paragraph pasted in from some other source (these notes, or else a paragraph from Bailey). If s is a long string holding the paragraph, use s.Split() to divide it into words. Extra features:- Use s.Split(" .,;[]()\t") to split at other characters besides spaces
- Convert each word to lowercase

The file dictionary.cs contains a full-fledged implementation of a dictionary type, complete with generics, support for "iterators", and the d[] notation.

This section is still in C#

I want this to work:

foreach (KeyValuePair<string,int> kvp in d)The hashtable is an array of linked lists; the linked-list cell type is

Console.WriteLine("{0}: {1}", kvp.Key, kvp.Value);

public class Cell { private K key_; private V val_; private Cell next_; public Cell(K k, V v, Cell n) {key_ = k; val_ = v; next_ = n;} public K getKey() {return key_;} public V getVal() {return val_;} public Cell next() {return next_;} public void setVal(V v) {val_ = v;} public void setNext(Cell c) {next_ = c;} }To start, I must have class dictionary

class dictionaryThen I must implement the IEnumerable method. The exact method signature is as follows; note the return type.: System.Collections.Generic.IEnumerable > {

IEnumeratorWhat is up with foonumerator()? That's here:> IEnumerable >.GetEnumerator() { return foonumerator(); }

IEnumeratorWhy didn't I just define this in IEnumerable, above? Because we also> foonumerator() { for (int i = 0; i (p.getKey(), p.getVal()); p = p.next(); } } yield break; }

System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() { return foonumerator(); }Otherwise I would have to type everything twice.

I figured this all out by reading the MSDN Dictionary.cs reference code, here.