Divide-and-Conquer

Introduction

Problem: Find a number in a sorted array

Consider the problem that we need to find whether a specific number is in a sorted array. A naive approach is searching the array from the start to the end. The worst time would then be since its a linear search.

What if we always check the midpoint of the array and discarding the wrong half? Then the time would reduce to . We call this kind of sort the binary search. By dividing the problem by half at each step, we have reduced the time from linear to logarithmic.

Design Paradigms

The insertion sort we discussed before uses an incremental approach:

having sorted the subarray , we inserted into its proper place, yielding the sorted subarray .
the idea is to incrementally build up a solution to the problem.

An alternative design approach is the divide-and-conquer:

divide: break the problem into smaller subproblems, smaller instances of the original problem.
conquer: solve these problems recursively.
combine the solutions to subproblems into the solution for the original problems.

Merge Sort

The merge sort is a famous sorting algorithm that uses divide-and-conquer. The algorithm will

divide the -element sequence to be sorted into two subsequences of elements each, and then
conquer them by sort the two subsequences recursively using merge sort, and finally
combine them by merge the two subsequences into one to produce the final answer.

The recursion stops when the subarray has only 1 element. The key of this algorithm is the merging part. One of the tedious bit is copying elements between arrays.

First we assume subarrays and are sorted.
Then we copy these subarrays to new arrays and .
Both and contain an additional element at the end, call the sentinel, so we don’t have to check for end of array.
Merge and back into by comparing and . The pseudocode of merge is presented below:

The runtime of the merging is quite easy to determine. Since there are no nested loops and all loops are bounded directly by the index. The runtime will then be Next we check the correctness of the merge algorithm by loop invariant:

Loop invariant: at the start of the iteration of the last for loop,
- the subarray contains the smallest elements of and , in sorted order and
- and are the smallest elements of their arrays that have not been copied back to .
Initialization: the loop starts with , hence is empty and contains the smallest elements of . As , and are the smallest uncopied elements.
Maintenance: suppose . Then is the smallest element not copied back. contains the smallest elements, and after copying into , contains the smallest elements. Incrementing and reestablishes the loop condition. The argument for is similar.
Termination: at termination, . By the loop invariant, contains the smallest elements of and , in sorted order. That’s all elements in and apart from the two .

The Complete Algorithm

First we prove the correctness of this algorithm. Since the algorithm is recursive, we use the proof by induction to show its correctness.

Proof. We assume that MergeSort sorts correctly arrays of size and show that it sorts correctly an array of size . - Base case: . The algorithm will return at line 1 with the sorted array of a single element. - Inductive step: by the induction hypothesis lines 3 and 4 return two subarrays sorted correctly. We have already proved the Merge is correct hence after its execution the algorithm will return the array sorted.

Next we deal with the runtime of MergeSort. We assume for simplicity that is an exact power of 2 and use to denote the time for MergeSort to sort elements. Then we have the runtime for each line:

This yields a recurrence equation where depends on :

If , then , and the algorithm terminates in constant time
“The time for MergeSort to sort elements is twice the time for MergeSort to sort elements plus time for Merge.”
Otherwise we have , where
- is the time to divide into subproblems,
- is the time to solve subproblems each of size ,
- is the time to conquer (to combine the obtained sub-solutions),

To solve a recurrence equation like this, we have several methods:

Substitution method: guess a solution and verify using induction.
Draw a recursion tree, add times across the tree.
Use the Master Theorem to solve a general recurrence equation in the shape of

The recursion tree of the MergeSort: The tree obviously has a depth of . And we add the runtimes of the nodes at the same level to get the runtime for each level. It is easy to see that all levels except the leaf level have a runtime of . So the runtime is

Comparison with Insertion Sort

The MergeSort always runs in time . This is way better than the worst case and average case of for InsertionSort.
But it is worse than the best-case time of InsertionSort. So the InsertionSort might be faster if the array is almost sorted.
MergeSort needs more space than InsertionSort:
- MergeSort always stores elements outside the input.
- InsertionSort only needs additional space.
- We say that InsertionSort sorts in place.

Definition: In-place Sorting

A sorting algorithm sorts in place if it only uses additional space.

The Master Theorem

The master theorem provides a “cookbook” method for solving recurrences of the form where and .

is called the driving function, and
is called the master recurrence.

The master recurrence describes the running time of a divide-and-conquer algorithm that divides a problem of size into subproblems each of size , i.e., the algorithm solves each subproblem in time .

The driving function describes the cost of dividing the problem before the recursion, as well as the cost of combining the results together.

Important term: is called the watershed function.

The Master Theorem

Let and be constants, and let be non-negative for large enough . Then, the solution of the recurrence function defined over has the following asymptotic behavior:

If there exists a constant s.t. , then .
If there exists a constant s.t. , then .
If there exists a constant s.t. , and if additionally satisfies the regularity condition for some constant and all sufficiently large , then .

Caution

The master theorem is inexhaustive and does not apply to all possible recurrence equations. But in practice it does cover the vast majority of the recurrence equations we may face.

The master theorem allows us to state the master recurrence without floors and ceilings even when we don’t have problems of exactly the same size. e.g., The three cases can be understood intuitively as follows:

Case 1: the watershed function must grow polynomially faster than , by at least a factor for some constant .
Case 2: watershed and driving functions grow asymptotically nearly at the same rate (the growth are the same for ).
Case 3: the watershed function must grow polynomially slower than , by at least a factor for some constant , and the regularity condition must be satisfied.

MergeSort Example

The runtime of the MergeSort is Then we have It is easy to see that the case 2 hold by So the solution is

Further Examples 1

For the following recurrence equation we have The case 1 holds because So the solution is

Further Examples 2

For the following recurrence equation we have The case 3 holds because and So the solution is

Data Structures and Algorithms (H) Lecture 3 Notes

Divide-and-Conquer

Introduction

Problem: Find a number in a sorted array

Design Paradigms

Merge Sort

The Complete Algorithm

Comparison with Insertion Sort

The Master Theorem

MergeSort Example

Further Examples 1

Further Examples 2