Comp 271 Lab 2: iterative mergesort

Comp 271 lab 2 - iterative mergesort (and C++ lists)

A Race of Sorts

Part 1 Overview: bottom-up mergesort

Bailey implements a recursive version of mergesort. You are to implement the non-recursive version.

I've provided you with an IntelliJ project named sorters (as sorters.zip). In the Sorters.java class in this project, I implement most of the sorting methods of Bailey chapter 6, each applied to a class IntList which is my own implementation of an ArrayList<int>.

In particular, recursive mergesort appears as IntList.rmsort(). This would sort an array of length 100 by:

dividing it into two pieces of size 50 each
dividing each of those into two pieces of size 25 each
dividing each of those into two pieces of sizes 13 and 12
...
(down to pieces of size 2)
divide those pieces into two pieces of size 1, which are automatically sorted.
Now do all the merges on the way back up

Iterative mergesort works differently. At any one time, you visualize the array as a sequence of "blocks", each of size a perfect power of two except for the final block. At the start, the blocksize is 1, and each block of length 1 is, of course, sorted. The process repeats with successive block sizes 2, 4, 8, 16, 32, ... until the blocksize becomes larger than the entire array size. On an array of size 100 you would now do the following:

merge each pair of consecutive blocks of length 1 into a block of length 2
merge each pair of consecutive blocks of length 2 into a block of length 4
merge each pair of consecutive blocks of length 4 into a block of length 8
...
as the last merge, we have one block of length 64, and a second block of length 100-64 = 36. Merge them.

I've supplied you with the following merging methods:

mergeblocks (int[] A, int[] T, int start, int blocksize)
mergeblocks2(int[] A, int[] T, int start, int block1size, int block2size)

T is the temp space, to be allocated at the start of imsort(). The first merger method, mergeblocks(), merges two equal-sized blocks; the second allows for the block sizes to be unequal. Use whichever you prefer. Both should handle having an undersized block as the last block of the array, but let me know if you have problems.

In class Sorters.java I've started the implementation of iterative mergesort in method imsort(). The code there is this:

    public void imsort() {
        int[] temp = null; // allocate temp; finish this!
        int blockLength = 1;
        while (blockLength < elements.length) {
            //mergepass(...)
            // swap elements and temp as follows:
            //int [] a = elements;    // this is just a pointer, not an allocation!
            //elements = temp;
            //temp = a;
            blockLength *= 2;
        }
    }

You will have to edit this as necessary, and also edit the mergepass() code. You should not change the mergeblocks() methods. Make sure you allocate temp!

My class SortTester runs several versions of sorting algorithms on the same data, and records the time. Currently it runs InsertionSort, recursive MergeSort (Sorters.rmsort()), and Java's built-in Collections.sort(). You are to complete the definition of iterative mergesort, imsort(), and test it for speed against recursive mergesort and Collections.sort(). You will need to comment out the InsertionSort test once you've got imsort() working, as you'll need to test it with sizes much too large for InsertionSort to handle in a reasonable time.

In my definition of class IntList, I've left out the expand() method (the topic of the previous lab), but you don't need that as you're not using add(). Instead, you will create IntLists of the desired size using the constructor,

IntList A = new IntList(100000);

and then use A.RandomFill() to create random entries. My code does that, and then makes copies of the data, so that different sorts all start with the same initial data. Currently the code runs InsertionSort on nums (this is the one you will comment out), recursive mergesort on nums1, and is set up to run iterative mergesort on nums2. After you run your sort method, you can check that it worked with IntList.checksort(). Right now, this fails for nums2, because imsort() isn't finished.

Timing Comparisons(updated!)

We've already compared selection sort and insertion sort.

For block sizes of 1,000,000 up to 10,000,000 or so (doubling each time; say 1,000,000; 2,000,000; 5,000,000; 10,000,000), tell me the relative times of iterative and recursive mergesort. You could add qsort2() if it works, or the built-in Java sort, but that's optional. Do the timing numbers seem consistent with O(n log(n))?

Second, find out how large an array you can sort with iterative mergesort in 10 seconds.

Make sure you use checksort() to verify that your sort implementation works. I will.

Part 2: implement StrList (from lab 1) in C++

Start with StrList.cpp (one file). Do the same thing to the add() method that you did in lab 1 in Java. Don't worry about fill().

There are two catches. First, you need to call delete [] oldelements (or whatever the name of the variable is) after you're done with it. C++ does not have garbage collection.

Second, what is the type of elements? It's declared to be an array of string. That's the same as pointer to string, or string*. A string* is the same as a pointer to an array of strings. If newelements and elements are two string*, then the assignment
elements = newelements;
causes elements to point to the same object as newelements; that is, to the new array. It is safe to call delete [] elements before this, as that simply erases the current contents of elements, as a pointer.

To submit your Java project, create a zipfile and email it to me at pld@cs.luc.edu, or use sakai. Alternatively, you can upload SortTester.java and Sorters.java individually (for this project I will need both).

For the C++ project, all I will need is the StrList.cpp file.