Comp 271 Lab 5: sorted lists

Comp 271 lab 5 - sorted lists (implemented with trees!)

Goals

Searching a linked list (sorted or not) takes time O(N), but insertion (once found) takes time O(1). Searching a sorted array list takes time O(log N), but insertion so as to maintain the sorted order requires O(N). The goal here is to use a tree to implement a list-like interface so that both in-order insert and search take time O(log N). The data will be String. The class will be called StrList.

We will also make get(k) work in time O(log N).

The interface you are to implement for StrList is the following:

void insert(string s): inserts the string in order. You may assume no duplicates, though see below.
bool search(string s): searches for the given string.
string get(int k): returns the kth string, starting as usual at k=0.
void print(): print the list of all the strings, one per line.
void size(): total number of strings

Normally we think of a list as a linear structure. What does "linear" mean? Perhaps the most basic attribute of linear structure is the get(k) operation. However, the linear structure here is always sorted: as a result, we will not support add() or set(int k, string s) as these aren't consistent with ordered lists.

Note that implementing a list interface with a tree structure is a good example of the benefits of separating interface from structure.

Overview

Start with my StrList.java file (in StrList.zip).

If we want get(int k) to take time O(log N), we need to cache at each node the size of the subtree it represents. This will be maintained in each node's count_ field, which plays a completely different role from the count_ in the Map/Dictionary lab. This takes some modest effort to maintain. It represents the number of items in the node's left subtree; it is not something that can be set by the client programmer. Note that everything below except get() can be done without reference to count_.

int size(): to find the size, you can either do a traversal of the tree, counting the nodes, or else refer to root count_ (if root_ != null).
boolean search(String s): perform ordinary tree search. The public boolean search(String s) will call a recursive version search(String s, TreeNode t). The atomic cases are t==null (return false) and s==t.data() (return true); otherwise you search the left subtree if s<t.data and the right subtree otherwise. This should be identical to the previous lab's containsKey().
void insert(String s): mostly done, but the counts have to be fixed.
print(): this is basically my inTraverse() method.
String get(int k): To do this you will use the count_ field to recursively find the correct element.

The trickiest is probably get(int k). Finding the kth element is straightforward (sort of) if you do in-order traversal of the tree and stop at the kth element, but that is O(N). (Also, it's tricky to stop a recursive traversal before completion. Can you think of a good way to do this?)

As with insert() and search(), the one-parameter version will call a recursive two-parameter version get(k,t), where the second parameter is a TreeNode.

We want to check at most one node per level of the tree, which is O(log N) if the tree is even remotely balanced. The idea is to maintain, at each TreeNode t, a "cached" copy of the size of the subtree rooted by t; this is the contents of the count_ field. To maintain this size property, the value of count_ must be updated on every insertion within the subtree.

To find the kth element of the list, suppose we've recursively arrived at TreeNode t. Let lcount be the size of t.left() (that is, t.left().count(), unless t.left() is null, in which case lcount==0). Then:

if k < lcount, return get(k,t.left()): the kth element of t is the kth element of t.left().
if k==lcount, return t.data(): the kth element of t is t.data()
if k>lcount, return get(k- (lcount+1), t.right()) (lcount+1 is the number of elements in the subtree t before reaching the right subtree; the kth element of t is the (k-(lcount+1))th element of t.right()).

The hard part, actually, is making sure count_ is updated when new values are inserted into the tree. This is done for you by my rinsert() method for the comp<0 case, but you need to extend it to the comp>0 (t.right() subtree) case (where the //FIX THIS comments appear). If you do it correctly there, following the example of the left-subtree case, then the counts will be fixed all the way up the tree. (Why?). It is essential that rinsert() return true if a value was inserted, and false if a value was not inserted, so parent calls can tell whether they should update their counts.

You should fix the count-setting part of rinsert() before trying to implement get(k), because the latter depends on the former. You can check your counts by looking carefully at the output of inTraverse(). Here's what it should look like for slist2:

        antelope (1)
    eats (3)
        fresh (1)
grass, (7)
        plays (1)
    with (3)
        zebra (1)

Note that, at each node t, count_ represents the size of subtree t.

Testing

The test code provided in SListDriver.java includes insert1() and insert2(); the former creates an unbalanced tree and the latter creates a balanced tree. The inTraverse() method prints count_, and can be used to determine whether your count_ values are correct.

To test get(k), use code like this (from main()):

	for (int n = 0; n < slist.size(); n++) {
		System.out.printf("string %d is %s%n", n, slist.get(n));
	}

To test search, use the search() method below (and in the project file), and then search for a few strings:

	search("zebra",sl);
	search("aardvark",sl);
	search("grass,",sl);
	search("eats",sl);
	search("food",sl);

    public static void search(string s, strlist t) {
	if (t.search(s)) Console.WriteLine("string \"{0}\" was found", s);
	else Console.WriteLine("string \"{0}\" was NOT THERE", s);
    }

Repeated strings

Above I've tacitly assumed strings would be counted only once. If you want to allow multiple insertions of the same string each to count (this is optional), then use the multiplicity_ field to hold how many times a string has been inserted (note that multiplicity_ has nothing to do with count_; the former applies to that node's data only while the latter applies to the entire subtree). In the (comp==0) case of rinsert(), you will increment multiplicity_. The other places you need to take this into account are print(), where a string with count n should be printed n times, size(), where each string should be counted according to its multiplicity_ value, and Get(k), which should count a string appearing n times as n steps into the list. That is, the fifth string of {("and",1), ("be",3), ("cat",1), ("dog",2)} is "cat", and the fourth string is "be".

(If you implement repeated strings, you can easily implement a word-counting program that keeps its words in sorted order, as in the previous lab.)

To submit your project, create a zipfile and put it on Sakai.