Comp 388 lab 4 - sorted lists (implemented with trees)

Goals

Searching a linked list takes time O(N). Searching a sorted array list takes time O(log N), but insertion so as to maintain the sorted order requires O(N). The goal here is to use a tree to implement a list-like interface so that both in-order insert and search take time O(log N). The data will be string. The class will be called strlist.

We will also make Get(k) work in time O(log N).

The interface you are to implement for strlist is the following:

Normally we think of a list as a linear structure. What does "linear" mean? Perhaps the most basic attribute of linear structure is the get(k) operation. However, the linear structure here is always sorted: as a result, we will not support add() or set(int k, string s) as these aren't really consistent with ordered lists.

Note that implementing a list interface with a tree structure is a good example of the benefits of separating interface from structure.

Overview

Start with my strtree.cs file. In defining a strlist class, you can either have it contain a strtree as a member, or else rewrite the strtree class to be strlist.

If we want Get(int k) to take time O(log N), we need to cache at each node the size of the subtree it represents. This will be maintained in each node's count_ field. This takes some modest effort to maintain. Note that everything below except Get() can be done without reference to count_.

Get(int k) is probably the trickiest. Finding the kth element is straightforward (sort of) if you do in-order traversal of the tree and stop at the kth element, but that is O(N). We want to check at most one node per level of the tree, which is O(log N) if the tree is even remotely balanced.

As with insert() and search(), the one-parameter version will call a recursive two-parameter version Get(k,t), where the second parameter is a treenode. The idea is as follows. Let lcount be the size of t.left() (which may be null). Then:

The hard part, actually, is making sure count_ is updated when new values are inserted into the tree. This is done for you by my rinsert() method for the comp<0 case, but you need to extend it to the comp>0 (t.right() subtree) case (where the //FIX comments appear). If you do it correctly there, following the example of the left-subtree case, then the counts will be fixed all the way up the tree. (Why?). It is essential that rinsert() return true if a value was inserted, and false if a value was not inserted, so parent calls can tell whether they should update their counts.

You should fix the count-setting part of rinsert() before trying to implement Get(k), because the latter depends on the former. You can check your counts by looking carefully at the output of inTraverse().

Testing

The test code provided in strtree.cs includes insert1() and insert2(); the former creates an unbalanced tree and the latter creates a balanced tree. The inTraverse() method prints count_, and can be used to determine whether your count_ values are correct.

To test Get(k), try this

	for (int n = 0; n < tree.size(); n++) {
		Console.WriteLine("string {0} is {1}", n, tree.Get(n));
	}
To test search, write a short search() method below and then search for a few strings:
	search("zebra",sl);
	search("aardvark",sl);
	search("grass,",sl);
	search("eats",sl);
	search("food",sl);

    public static void search(string s, strlist t) {
	if (t.search(s)) Console.WriteLine("string \"{0}\" was found", s);
	else Console.WriteLine("string \"{0}\" was NOT THERE", s);
    }

Repeated strings

Above I've tacitly assumed strings would be counted only once. If you want to allow multiple insertions of the same string each to count (this is optional), then use the multiplicity_ field to hold how many times a string has been inserted (note that multiplicity_ has nothing to do with count_; the former applies to that node's data only while the latter applies to the entire subtree). In the (comp==0) case of rinsert(), you will increment multiplicity_. The other places you need to take this into account are print(), where a string with count n should be printed n times, size(), where each string should be counted according to its multiplicity_ value, and Get(k), which should count a string appearing n times as n steps into the list. That is, the fifth string of {("and",1), ("be",3), ("cat",1), ("dog",2)} is "cat", and the fourth string is "be".

To submit your project, create a zipfile and put it on Sakai or email it to me at pld@cs.luc.edu. Remember that if you email your assignment you must not include any *.exe files in the zipfile!