Bailey, Chapter 6, p 119

Morin, Chapter 11, p 225 (O(n log n) sorts only)

Suppose you have an array A of size N that is *sorted*, so i<j
=> A[i]<A[j]. Then finding an element can be done in time O(log N).

Generally speaking, O(log N) means "growing only very slowly with N". Casually, O(log N) can be seen as "almost constant".

N | log_{2}(N) |

100 | 7 |

1000 | 10 |

100,000 | 17 |

1,000,000 | 20 |

1,000,000,000 | 30 |

It doesn't really matter what base we use; a change of base just introduces a constant of proportionality. It is often convenient for visualization, however, to use base 2.

To search for value X in log(N) time, we keep dividing the array in half:

lo=0; hi=N-1

while (lo < hi) {

mid = (lo+hi)/2;

if (X<A[mid]) hi = mid-1;
// search A[lo]...A[mid-1]

else if (X>A[mid]) lo=mid+1; //
search A[mid+1]...A[hi]

else lo = hi = mid;
// found

}

Suppose this is the array, and we are searching for X=11.

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |

2 | 3 | 5 | 8 | 13 | 21 | 34 | 55 |

Initially we have lo=0 and hi=7, so mid=3. X>A[3], so we set lo=4.

Now lo=4 and hi=7, so mid=5. We have X<A[5], so hi=4, at which point the loop stops.

Now let us search for X=25:

lo=0, hi=7: mid=3, X>A[mid] so lo=mid+1

lo=4, hi=7: mid=5, X>A[mid] so lo=mid+1

lo=6, hi=7: mid=6, X<A[mid] so hi=mid-1

lo=6, hi=5

Note in this case the loop terminates with lo>hi.

It is important to understand why the number of times through the loop
here is log_{2}(N).

Sometimes it is helpful to introduce a **loop invariant **here:
the statement that either X is to be found in the range A[lo]..A[hi], or
else X is not present in A. With this in mind, we can eliminate the second
comparison above (X>A[mid]).

This is one of relatively few elementary examples of a loop that is hard to write correctly without an invariant.

Another thing to keep in mind here is that lo≤mid<hi. However, lo==mid will occur if hi=lo+1. So in the following loop, we arrange for the search alternatives to be lo..mid and mid+1..hi. If we instead arrange for the search alternatives to be lo..mid-1 and mid..hi, then the loop can fail to terminate! That is, we can have hi=lo+1, so mid=lo, and so searching mid..hi is the same as searching lo..hi, and so we keep searching lo..hi forever.

lo=0; hi=N-1

while (lo < hi) {

mid = (lo+hi)/2;
// ranges to be searched should be
lo..mid and mid+1..hi, both of which are SMALLER than lo..hi

if (X>A[mid]) lo = mid+1; //
search A[mid+1]...A[hi]

else hi = mid
// search A[lo]...A[mid]

}

Each time we pass through, we swap adjacent elements if they are out of order. Note that the same value may be carried forward in several swaps. There is no good reason to believe bubble-sort is reasonably efficient.

Find the biggest, move it to the Nth position (data[N-1])

Find the second biggest (that is, the biggest of positions 0 up to N-1) and move it to the N-1th position.

Here is code for this for StrList. Note the need to use Java's String.compareTo(s1,s2) rather than <.

public void ssort() { for (int i = 0; i<currsize-1; i++) { // find smallest of elements[i]..elements[currsize-1] and swap to position i int index_min = i; string curr_min_val = elements[i]; for (int j = i+1; j<currsize; j++) { if (String.compareTo(elements[j],curr_min_val) < 0) { curr_min_val = elements[j]; index_min = j; } } swap(i,index_min); } }For the TList<T> version, we need two things. First, the TList<T> class must require that T implement the IComparable interface:

class TList<T>

Second, we then need to use compareTo() in sort():

public void ssort() { for (int i = 0; i<currsize-1; i++) { // find smallest of elements[i]..elements[currsize-1] and swap to position i int index_min = i; T curr_min_val = elements[i]; for (int j = i+1; j<currsize; j++) { if (elements[j].compareTo(curr_min_val)<0) { curr_min_val = elements[j]; index_min = j; } } swap(i,index_min); } }

Demo: ssort.cs, which uses IntList.cs and comparisons using the

public static void Main(string[] args) { if (args.Length > 0) { LISTSIZE = Convert.ToInt32(args[0]); System.out.format("List size is %d%n", LISTSIZE); } nums = new IntList(LISTSIZE); nums.RandomFill(); nums.ssort(); }In the ssort() method we use Stopwatch to record the time:

// selection sort public void ssort() { Stopwatch s = new Stopwatch(); s.Start(); for (int i = 0; i<currsize-1; i++) { // find smallest of elements[i]..elements[currsize-1] and swap to position i int index_min = i; int curr_min_val = elements[i]; for (int j = i+1; j<currsize; j++) { if (elements[j] < curr_min_val) { curr_min_val = elements[j]; index_min = j; } } swap(i,index_min); } s.Stop(); System.out.format("sorting took %d milliseconds%n", s.ElapsedMilliseconds); }Does the time appear to be quadratic?

Insertion sort, p 125

for (i=0, i<N, i++)

insert data[i] into the sorted portion data[0]..data[i-1].

Here it is interpreted in dance: https://www.youtube.com/watch?v=XaqR3G_NVoo

To get an estimate of the running time, we count comparisons: there are log(n) stages, and all the merges at each stage take O(n) together. Total: O(n log n)

One strategy: merge into a temp array T, then copy back into A.

Book's strategy: a little different; requires copying only half the array. How much does this slow things down? One way to investigate this would be to copy the entire array twice, to see the speedup.

Another strategy for reducing the copying of T is the "back-and-forth" method: merging from A into T at one stage, and then from T into A at the next. This is easier for the nonrecursive version.

Count comparisons: log(n) steps, and all the merges at each step take O(n) together (more precisely, we have to merge lists whose total length is n). Total: O(n log n)

IterativeMergeSort: how it works. Issues with merge.

The optimization to avoid copying

1. Change to RecursiveMergeSort to make it a little easier to use the merge code directly in IterativeMergeSort

2. Compare

T[k]=A[i];

k++;

i++;

with

T[k++] = A[i++];

demos: sorting/msort.cs (must be compiled with sorters.cs)

Iterative versus Recursive mergesort

Since the process of sorting numbers consists of moving each value to its ultimate location in the sorted array, we might make some progress toward a solution if we could move a single value to its ultimate location.

The idea that speed depends on getting values to their ultimate location early on is demonstrably false; see mergesort.

Nonetheless, the Quicksort algorithm is about as fast as sorting gets. Quicksort is like recursive mergesort in that we divide the data into two pieces and sort each piece. Unlike mergesort, the pieces may not be the same size. But even more unlike mergesort, the elements in the first piece are all less than the elements in the second piece, so once the pieces are sorted there is nothing more to do. There is no "merge" step.

The basic idea behind quicksort is straightforward:

- On a single pass of a section of the array (usually up from the left end and down from the right, until they meet), divide the array section into a "low subsection" and a "high subsection". Whenever you find a big element in the low subsection and a small element in the high subsection, swap them.
- Recursively sort those two subsections.

Here is the "simplest" partition strategy. **It has a flaw**.
We take a number called the pivotvalue, and divide A[left]...A[right] into
two sections, A[left]..A[mid] and A[mid+1]..A[right], so that the first
section contains values less than the pivotvalue and the second section
contains values greater than or equal to the pivotvalue. Here is the code:

private static int simple_partition(int [] A, int pivotvalue, int left, int right) { while (true) { while (left < right && pivotvalue <= A[right]) right--; // now left == right or A[right] < pivotvalue

while (left < right && A[left] < pivotvalue) left++; // now left == right or pivotvalue <= A[left] if (left < right) swap(A,left,right); else if (A[right] < pivotvalue) return right+1; else return right; // left == right == pivot } }

This and other code can be found in qsort.cs.

One pass through the loop decrements **right** until it
finds a value < pivotvalue, and increments **left** until
it finds a value >= pivotvalue. The values are then swapped. When **left**
and **right** finally meet, say at **mid**,
then if i

There is some slight trickiness when **left** and **right**
meet. If they meet at the end of the first inner while loop, we might have
pivotvalue <= A[right] or might not; this is reflected in the test at
the end.

Quicksort now looks like this:

private static void quickSortRecursive1(int [] A, int left, int right) // pre: left <= right // post: data[left..right] in ascending order { int pivotindex; // the final location of the leftmost value if (left >= right) return; int pval = (data[left]+data[right])/2; pivotindex = simple_partition(data,pval,left,right); /* 1 - place pivot */ quickSortRecursive1(data,left,pivotindex-1); /* 2 - sort small */ quickSortRecursive1(data,pivotindex,right);/* 3 - sort large */ /* done! */ }

If simple_partition() returns **left**, then the second
recursive call is quickSortRecursive(A, left, right); that is, we have
infinite-depth recursion. This will happen if, for example, pivotvalue
equals the mimimum value in the array segment. If all the values from
A[left] to A[right] are equal, this will happen for any reasonable choice
of pivotvalue.

Note that if simple_partition() should return **right**,
the two calls are qSR(A,left,right-1) and qSR(A,right,right); the
recursive subcalls are strictly shorter.

The usual way of fixing this is to pick a specific value **known
to be in the array** as the pivotvalue, and also to make sure
that, at the end, if the return value is mid, then A[mid] = pivotvalue.
This means the recursive calls are qSR(A,left,mid-1) and
qSR(A,mid+1,right); these are "safe" even if mid==left or mid==right.

The most common choice of pivotvalue is A[left]. However, if we want to
choose A[i] as the pivotvalue for left*
*

private static int bailey_partition(int [] A, int left, int right) { // pre: left <= right // post: data[left] placed in the correct (returned) location

// Random r = new Random(); // needs "using System;"

// int index = r.Next(left, right+1);

// swap(A, index, left); int pivotvalue = A[left]; while (true) { // move right "pointer" toward left while (left < right && pivotvalue < A[right]) right--; if (left < right) swap(A,left++,right); else return left; // left == right == pivot // now pivotvalue = A[right] // move left pointer toward right while (left < right && A[left] < pivotvalue) left++; if (left < right) swap(A,left,right--); // after, A[left] == pivotvalue again else return right; // left == right == pivot } }

Horvick, part 1 p 111 (final page of part 1) has an even simpler strategy. The pivotIndex value is chosen randomly between left and right, inclusive, by the caller. (Horvick's code is for a generic type T; I've replaced that with int. To compare generic values one uses .CompareTo(); I've replaced that with <.)

private int horvick_partition(int [] items, int left, int right, int pivotIndex) { int pivotValue = items[pivotIndex]; Swap(items, pivotIndex, right); int storeIndex = left; for (int i = left; i < right; i++) { if (items[i] < pivotValue) { Swap(items, i, storeIndex); storeIndex += 1; } } Swap(items, storeIndex, right); return storeIndex; }

This partitions the array in a single upwards pass; however, there is a lot more swapping.

Consider the following array of data:

3 |
11 |
8 |
18 |
7 |
14 |
5 |
13 |

Suppose we use 11 as the pivot value in Bailey's method. We must swap 11 so it is A[left]:

11 |
3 |
8 |
18 |
7 |
14 |
5 |
13 |

Decrement right until it points to 5

Swap 11 and 5: 5 3 8 18 7 14 11 13

increment left until it points to 18

Swap 18 and 11: 5 3 8 11 7 14 18 13

decrement right until it points to 7

swap 11 and 7: 5 3 8 7 11 14 18 13

Now the array is divided into elements < 11, 11 itself, and elements > 11.

Suppose we wanted to use 10 as pivotValue. Because 10 is not present in the array, we can't actually use Bailey's method.

Now consider this array:

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |

11 |
3 |
11 |
18 |
7 |
14 |
11 |
13 |

We start by decrementing right to point to the rightmost 11, at A[6]. We swap A[6] with A[0], which is a no-operation, and then increment left to 1.

increment left until it points to A[2]; swap A[2] and A[6], decrement right to 5

decrement right to A[4]; swap A[2] and A[4] to get 11 3 7 18 11 14 11 13; increment left to 3

A[3] > pivotValue so swap A[3] and A[4], decrement right to 3:

11 |
3 |
7 |
11 |
18 |
14 |
11 |
13 |

left = right = 3, and A[3] = 18

Note that

(3,7,11) 11 (11,13,14,18)

The quicksort algorithm in

private static IntPair morin_partition(int[] A, int left, int right) { int pivot = A[left]; // Morin actually chose A[left + rand.Next(right-left+1)] int lo = left-1, j = left, hi = right+1; // A[left..lo]Try this on an array with some duplicates. How about {11,3,11,18,7,14,11,13}?< pivot, A[lo+1..j-1] = pivot, A[hi..right] > pivot while (j < hi) { if (A[j] < pivot) { // move to beginning of array

lo+=1; swap(A, j, lo);

j+=1; } else if (A[j] > pivot) {

hi-=1; swap(A, j, hi); // move to end of array } else { j++; // keep in the middle } } return new IntPair(lo,hi); }

11 3 11 18 7 14 11 13

3 11 11

3 11 11

3

There is also, as written, a swap of 14 with itself.

The Morin 3-category technique allows pivots not in the array, as long as the pivot is not less than all elements of the array or greater than all elements of the array. Why?

One drawback of the Morin partition is that it is a little harder to do manually, as j, lo and hi are all moving.

What if we did the bucketing in low-to-high order? (least-significant to most-significant digit). That is, if the numbers are all < 1000, we sort first on the ones digit, then on the tens digit, and then on the hundreds digit.

Is this O(n)? (answer: no)

Demo: print out comparisons, by setting debug=true.

Can we do better?

The standard O(N) strategy is based on the Quicksort partition: run the partition(A,0,N-1) method, and get back an index i. At that point we know A[i] is in the right place. If i

One catch is that finding the median of A[0]..A[i-1] is not particularly helpful. We need to expand the recursive case a little: to find the Kth element, for 0≤K

int findKth(A,left,right,K) // find Kth smallest element of A[left]...A[right], starting at K=0

The algorithm then becomes:

int pivot = bailey_partition(A,left,right); // now A[pivot] is (pivot-left)th smallest if (K == pivot-left) return A[pivot]; else if (K < pivot-left) { // Kth-smallest must be among A[left]..A[pivotindex] return findKth(A,left,pivot-1,K); } else { return findKth(A, pivot+1, right, K-(pivot-left)-1); }See median.cs.

Demo: some cases to make sure the -1's, etc are sensible.