Monday, April 6, 2009

Merge Sort

Leave a Comment

Abstract:

Merge Sort has several advantages on performance. This technique divides the whole data as like divided and conquer method but until two elements in a group. Then solves the divided parts and conquers them. The resulting efficiency of this method is that the division of problem is much less then the previous method, so the recursive call is also less. As a result we have less comparison and less computational time. Merge sort can be implemented in C to sort the file.

Introduction:


Merging is the process of combining two or more sorted files into a third sorted file. In most implementations it preserves the input order of equal elements in the sorted output. It is an example of the divide and conquer algorithmic paradigm. It is also called Divide and Conquer where we divide the problem into a number of sub-problems and then we conquer the sub-problems by solving them recursively. Merge sort can be implemented in C to sort the file. Divide the file into n subfiles of size 1 and merge adjacent pairs of files. We then have approximately n/2 files of size 2. Repeat this process until there is only one file remaining of size n.

Working Mechanism:

Theoretically, a merge sort works as follows:
If the list is of length 0 or 1, then it is already sorted. Otherwise:
Divide the unsorted list into two sublists of about half the size.
Sort each sub list recursively by re-applying merge sort.
Merge the two sublists back into one sorted list.

To sort A [p…r] Divide by splitting into two subarray A [p…q] and A [q+1…r], where q is the half way point of A[p…r].
Conquer by recursively sorting the two subarrays, A [p…q] and A [q+1…r].
Combine by merging the two sorted subarrays A [p…q] and A [q+1…r] to produce a simple sorted array A [p…r].


Example:


Original file:

5

2

4

7

1

3

2

6

Pass 1:

2

5

4

7

1

3

2

6

Pass 2:

2

4

5

7

1

2

3

6

Pass 3:

1

2

2

3

4

5

6

7

Sorted file:

1

2

2

3

4

5

6

7




Pseudocode:

Merge (A, p, q, r)
n1 ← q – p + 1
n2 ← r – q

Create arrays, L[1…n1+1] and R[1…n2+1]

for i ← 1 to n1,
do L[i] ← A[p+i-1]
for j ← 1 to n2,
do R[j] ← A[q+j]

L[n1+1] ← ∞
R[n2+1] ← ∞
i ← 1
j ← 1

for k ← p to r
do if L[i] ≤ R[j]
then A[k] ← L[i]
i ← i+1
else
A[k] ← R[j]
j ← j+1

Note: We constructed two auxiliary arrays namely L and R.


Implementation of the algorithm in C:

/* An example of routine that accepts two sorted arrays a and b of n1 and n2 elements, respectively, and merges them into a third array c containing n3 elements. */

void mergearr(int a[], int b[], int c[], int n1, int n2, int n3)
{
int appoint, bpoint, cpoint;
int alimit, blimit, climit;

alimit = n1-1;
blimit = n2-1;
climit = n3-1;
if (n1+n2 ! = n3)
{
printf(“array bounds incompatible/n”);
exit(1);
}
apoint = 0;
bpoint = 0;
for (cpoint = 0; appoint <= alimit && bpoint <= blimit; cpoint++) if(a[appoint] < style="font-weight: bold;">Analysis:

Merge sort is more efficient than other sort for some types of lists if the data to be sorted can only be efficiently accessed sequentially, merge sort is a stable sort as long as the merge operation is implemented properly. In sorting n items, merge sort has an average and worst-case performance of(n log n). If the running time of merge sort for a list of length n is T(n), then the recurrence T(n) = 2T(n/2) + n follows from the definition of the algorithm .
In the worst case, merge sort does approximately (n ⌈lg n⌉ - 2⌈lg n⌉ + 1) comparisons, which is between (n lg n - n + 1) and (n lg n + n + O (lg n)). In the worst case, merge sort does about 39% fewer comparisons than quick sort does in the average case; merge sort always makes fewer comparisons than quick sort, except in extremely rare cases, when they tie, where merge sort’s worst case is found simultaneously with quick sort’s best case.

Types of analysis:

Worst case:
T(n)=maximum time of algorithm of any input size of n.

Average case:
T(n)=expected time of algorithms overall inputs of size n.
Need assumptions of statistical distribution of inputs.

Best case:
Cheat with a slowly algorithm that works fast on some input.

Conclusions:

It is always O(nlogn), it is a very good alternative if the extra memory requirement is not a problem. Array elements with fast comparisons and slow copying seem to slightly penalize merge sort.
O(nlgn) grows more slowly than O(n2).
Therefore merge sort asymptotically beats insertion sort in the worst case.
In practice, merge sort beats insertion sort for n>30 or so.

0 comments :

Post a Comment