Smooth sorting


We continue to dive into a variety of heaps.

Today we’ll analyze an elegant ordering method that uses special heaps based on Leonardo’s numbers.

Many have heard about this sorting, but few people know exactly how it works. Today we will see that there is nothing complicated in it. The method was invented by the legendary Edsger Dijkstra. In addition to the many brightest achievements in the theory of algorithms, he is also the author of such a witty statement: “Students who have previously studied Basic, it is almost impossible to teach good programming. As potential programmers, they have undergone irreversible mental degradation. ” I hope it’s not blasphemy that the animation in the article was created using VBA :-)







EDISON Software - web-development
EDISON.

, Android iOS.

! ;-)

Heap sorting in itself is very good, since its time complexity is O ( n log n ) regardless of the data. In order to not be an array, the complexity of heapsort never degrades to O ( n 2 ) , which can happen, for example, with quick sorting. The flip side of the coin is that sorting by a binary heap cannot be accelerated, O ( n ) complexity can also not be expected (but the same quick sorting, under certain conditions, can reach such indicators).

In general, there was a question on the agenda: is it possible to contrive so that the time complexity of sorting by a heap, on the one hand, is no lower thanO ( n log n ) , but in a favorable scenario (in particular, if an almost sorted array is processed) increased to O ( n ) ?

This issue was personally addressed by Edsger Dijkstra himself, who found out that yes, it is possible.

It is assumed that those who read this article understand how sorting by heap works in general, they know what sorting tree is and why a sifting is needed. If someone has gaps in this knowledge, then before continuing reading, I recommend that you read the previous article .

What's wrong with a binary heap


Let's take a look at how heapsort sorts an almost ordered array and see why this algorithm does not process such incoming data faster.


Click on the animation to go to the article “Sorting by the n-pyramid”.

The first thing that catches your eye is that when sifting, the maxima are constantly pushed to the root of the heap, which corresponds to the first element of the array. If the input array is almost ordered, then for the algorithm this will only add a little work. Smaller elements will still first go down the tree, i.e. move closer to the end of the array, not to its beginning.

The second slowing factor, which is not so obvious, is the standard binary heap itself is always a balanced tree. In the case of initially ordered data, this plays a negative role. If there is random data in the original array, then they are evenly distributed in a balanced tree, and multiple sifting passes through all branches approximately the same number of times. For almost ordered data, an unbalanced tree is more preferable - in this case, the data on that part of the array that correspond to longer branches of the tree will be processed less often than others.

Leonardo numbers


To solve both problems, Dijkstra proposed using special binary heaps based on Leonardo numbers.

Leonardo numbers are almost like Fibonacci numbers, but only better.
A series of Leonardo numbers is given recursively:

L 0 = 1
L 1 = 1
L n = L n - 1 + L n - 2 + 1

The first 20 Leonardo numbers:
1, 1, 3, 5, 9, 15, 25, 41, 67 , 109, 177, 287, 465, 753, 1219, 1973, 3193, 5167, 8361, 13529

Absolutely any integer can be represented as the sum of Leonardo numbers having different serial numbers.

This is very useful in our case. Array of nelements can not always be represented as a single heap of Leonardo (if n is not a Leonardo number). But then, any array can always be divided into several subarrays that will correspond to different Leonardo numbers, i.e. to be heaps of different order.

Here is an example of an array of the 21st element, consisting of three Leonard heaps. In each of the heaps, the number of nodes corresponds to any number of Leonardo.


Important points to know:

  1. Each Leonardov pile is an unbalanced binary tree.
  2. The root of each heap is the last (and not the first, as in a regular binary heap) element of the corresponding subarray.
  3. Any node with all its descendants is also a Leonard heap of a smaller order.

Build and dismantle heaps


In the recurrence formula for Leonardo numbers,

L n = L n - 1 + L n - 2 + 1 is

very pleased with the unit at the end.

And that's why. Suppose we have two adjacent subarrays in the array that correspond to heaps built on two adjacent Leonardo numbers. Using the element immediately after these subarrays, these subarrays can be combined into a common heap, which corresponds to the next Leonard number.



Going through the elements in the array, we build a bunch of Leonard heaps. If using the element you can combine the two previous heaps (this is possible if and only if the two previous heaps correspond to two consecutive Leonardo numbers), then combine. If combining is not possible (the two previous heaps do not correspond to two consecutive Leonardo numbers), then the current element simply forms a new heap of one element corresponding to the first (or second, if the first is used before it) Leonardo number.

At the second stage of the algorithm, the reverse process occurs - we parse the heaps. If we remove the root in the heap, then we get two smaller heaps corresponding to the two previous Leonardo numbers. This can be done because:

L n - 1 = L n - 1+ L n - 2

In Fibonacci numbers there is no such useful unit, therefore we do not use the Fibonacci heap.

Smooth Sort :: Smoothsort


The final algorithm:

  • I. Create a bunch of Leonard heaps from the array, each of which is a sorting tree.
    • I.1. Iterate over the elements of the array from left to right.
    • II.1. We check whether using the current element it is possible to combine the two leftmost heaps in the existing heap of Leonard heaps:
      • II.1.a. If yes, then we combine the two leftmost heaps into one, the current element becomes the root of this heap, we make a sift for the combined heap.
      • II.1.b. If not, then add the current element as a new heap (consisting of one node so far) to the existing heap of Leonard heaps.
  • II. , :
    • II.1. . , .
    • II.2. ( ) ( ).
    • II.3. , . .
    • II.4. ( ), .
    • II.5. After moving the maximum element to the end, the sorted part of the array increased, and the unsorted part decreased. Repeat steps II.1-II.4 for the remaining unsorted part of the array.



Python implementation example


import random

def smoothsort(lst):

    #    
    leo_nums = leonardo_numbers(len(lst))


    #       
    heap = []

    #   
    #       
    #       
    for i in range(len(lst)):
        if len(heap) >= 2 and heap[-2] == heap[-1] + 1:
            heap.pop()
            heap[-1] += 1
        else:
            if len(heap) >= 1 and heap[-1] == 1:
                heap.append(0)
            else:
                heap.append(1)
        restore_heap(lst, i, heap, leo_nums)

    #  
    for i in reversed(range(len(lst))):
        if heap[-1] < 2:
            heap.pop()
        else:
            k = heap.pop()
            t_r, k_r, t_l, k_l = get_child_trees(i, k, leo_nums)
            heap.append(k_l)
            restore_heap(lst, t_l, heap, leo_nums)
            heap.append(k_r)
            restore_heap(lst, t_r, heap, leo_nums)

#   ,     
def leonardo_numbers(hi):

    a, b = 1, 1
    numbers = []
    while a <= hi:
        numbers.append(a)
        a, b = b, a + b + 1
    return numbers

#        
def restore_heap(lst, i, heap, leo_nums):
    
    #      
    
    current = len(heap) - 1
    k = heap[current]

    while current > 0:
        j = i - leo_nums[k]
        if (lst[j] > lst[i] and
            (k < 2 or lst[j] > lst[i-1] and lst[j] > lst[i-2])):
            lst[i], lst[j] = lst[j], lst[i]
            i = j
            current -= 1
            k = heap[current]
        else:
            break

    # 
    
    while k >= 2:
        t_r, k_r, t_l, k_l = get_child_trees(i, k, leo_nums)
        if lst[i] < lst[t_r] or lst[i] < lst[t_l]:
            if lst[t_r] > lst[t_l]:
                lst[i], lst[t_r] = lst[t_r], lst[i]
                i, k = t_r, k_r
            else:
                lst[i], lst[t_l] = lst[t_l], lst[i]
                i, k = t_l, k_l
        else:
            break

#         ,
#     
def get_child_trees(i, k, leo_nums):

    t_r, k_r = i - 1, k - 2
    t_l, k_l = t_r - leo_nums[k_r], k - 1
    return t_r, k_r, t_l, k_l

#  
def main(n):
    lst = list(range(n))
    random.shuffle(lst)
    print(lst)
    smoothsort(lst)
    print(lst)

Time complexity


If we take an almost ordered array as input, then the visualization shows why such an array is processed much faster.



Savings occur only due to sifting. In almost ordered data, the sifting sinks shallow in the tree, including after the heaps are gradually disbanded in the second stage. In the initially random data, the sifting is more expensive, since it often drops in its heap to the very last level.

Let's estimate the total time complexity.

At the first stage, we iterate over n elements, adding it to the heaps already existing on the left. Adding to the heap itself is bypassed in O (1), but then for the heap you need to make a sift. In ordered data, a shallow sifting often costs O (1) for one element added to the heap. In unordered data, the sifting for each addition is costed in O (log n ), since as a result of randomness a sifting has to go through the levels of the tree often to the very bottom.

Therefore, at the first stage, the best time complexity is:
for almost ordered data - O ( n ),
for random data - O ( n log n ).

For the second stage, the situation is similar. When exchanging the next maximum, you again need to sift the heap at the root of which it was. And the sifting metrics for ordered and disordered data will be different.

At the second stage, the best time complexity is the same as at the first:
for almost ordered data - O ( n ),
for random data - O ( n log n ).

Adding time complexity for the first and second stages:
for almost ordered data - O (2 n ) = O ( n ),
for random data - O (2 n log n ) = O ( n log n ).

In general, the worst and average time complexity for smooth sorting is O ( n log n ).
Dijkstra in his calculations (which I will not bore you with) proved that the best complexity smoothly tends to O ( n ) than the more ordered the incoming data. Hence the name - smooth sorting.

Extra memory complexity


To decompose the data into a bunch of Leonard heaps, you just need to remember exactly which Leonardo numbers are involved at each step. Knowing these numbers, the heaps themselves are aligned algorithmically. This number series grows very quickly, so even for large arrays you will need a very small set of Leonard numbers.

Binomial heap sort :: Binomial heap sort


There is a tree structure, very similar to the one that we sorted out - a binomial heap . This is also a bunch of heaps of different sizes, in each of which the number of nodes is a power of two. Any array of any number of elements can be expanded into this heap, since any natural number is decomposed into the sum of twos of different degrees.

In principle, you can do a smooth sorting based on binomials:



Will it work faster? Hardly. The binomial heap is not binary, and in the last article we found out that increasing the number of descendants does not accelerate, but slows the screen . In addition, you can notice that the binomial heap has longer branches, which is why neighboring ordered regions of the array will be slightly slower to connect to each other.

It is not known whether the Dijkstra binomial heap was generally considered as a possible basis for its algorithm. Be that as it may, the Leonardov heap is probably more optimal.

Next Series Trailer


However, even if a binomial pile is not the best option for smooth sorting, you should not completely discard it.

If the binomial tree is slightly modified and completely different (very bold) ideas are used to go around it, we get an original and effective algorithm that has its own advantages. What are we going to talk about next time?


Click on the animation to go to the article with the next sorting by heap.

References


The Smooth / Smooth

Leonardo Number , Binomial Heap / binomial heap

Series Articles:



Today’s smooth sorting has been added to the AlgoLab app. As well as a bonus - and sorting with a binomial pile. So who wants to personally drive the data on the heap heaps - update the excel file with macros.

All Articles