Sort by n-pyramid


Sorting in a heap (it is also a pyramidal sorting) on ​​Habré has already been remembered with a good word more than once or twice, but this has always been quite well-known information. Everybody knows the usual binary heap, but the theory of algorithms also has: an

n-heap; a bunch of heaps based on Leonardo numbers; Deramide (a hybrid of heap and binary search tree); tournament mini-pile; mirror (reverse) heap; weak pile; Jung's heap; binomial pile; and God knows what other heaps ...

And the smartest representatives of computer science in different years proposed their sorting algorithms using these pyramidal structures. Who cares what they did - for those we are starting a small series of articles devoted to sorting using these structures. The world of heaps is diverse - I hope you will be interested.
EDISON Software - web-development
This article was written with the support of EDISON.

We are engaged in the development of mobile applications and provide software testing services .

We love the theory of algorithms! ;-)
There is such a class of algorithms - sorting by choice. The general idea is that the unordered part of the array is reduced due to the fact that it searches for the maximum elements that are rearranged from it into an increasing sorted area.



Sorting the usual choice is brute force. If, in search of maxima, it is simple to linearly go through the array, then the time complexity of such an algorithm cannot exceed O ( n 2 ).

A bunch


The most efficient way to work with the highs in the array is to organize the data into a special tree structure, known as a heap . This is a tree in which all parent nodes are no less than descendant nodes.

Other names of the heap - the pyramid , the sorting tree .

Let's look at how it can be easily and almost free to present an array in the form of a tree.

Take the very first element of the array and consider that this is the root of the tree - a node of the 1st level. The next 2 elements are nodes of the 2nd level, the right and left descendants of the root element. The next 4 elements are nodes of the 3rd level, the right / left descendants of the second / third element of the array. The next 8 elements are nodes of the 4th level, descendants of elements of the 3rd level. Etc. In this image, the nodes of the binary tree are clearly located strictly below the corresponding elements in the array:



Although, the trees in the diagrams are more often depicted in such a scan:



If you look at this angle, then it’s clear why sorting by a bunch is called pyramidal sorting. Although, this is about the same as if you call a chess elephant an officer, a rook a tura, and a queen a queen.

Indices for the descendants of i -th element is determined by elementary (if the index of the first element of the array considered equal to 0, as is customary in the majority of programming languages):

Left descendant of 2 × i + 1,
the right child: 2 × i + 2

(I'm on the charts and in animations, traditionally, the indexes of arrays begin with 1, where the formulas are slightly different: left child: 2 × i and right child: 2 × i + 1, but these are already small arithmetic nuances).

If the resulting offspring from these formulas the indices go beyond the array, it means that i -th item has no children. It may also happen that the i -th element is a left descendant (falls on the last element of the array in which an odd number of elements), but there is no right.

So, any array can be easily represented in the form of a tree, however, this is not a heap yet, because, in the array, some descendant elements may be larger than their parent elements.

In order for our tree, created on the basis of the array, to become a heap, it needs to be sifted properly.

Sifting


The soul of sorting a bunch is sifting.

The sifting for an element is that if it is smaller than the descendants combined in an inextricable chain, then this element must be moved as low as possible, and larger descendants should be raised 1 level up.

The image shows the sifting path for the item. The blue color indicates the element for which sifting is carried out. Green - larger descendants down the branch. They will be raised one level up, because they are larger in size than the blue knot for which the screen is made. The element itself from the uppermost blue node will be moved to the place of the lowest descendant from the green chain.



A sift is needed in order to make a sorting tree out of an ordinary tree and to further support the tree in this (sorting) state.

In this image, the elements of the array are redistributed so that it is already laid out in a heap. Although the array is decomposed into a sorting tree, it has not yet been sorted (either ascending or descending), although all descendants in the tree are smaller than their parent nodes. But then the most maximum element in the sorting tree is always in the main root, which is very important.



Heap Sort :: Heapsort


The algorithm is actually simple:

  • Stage 1. We form a sorting tree from the entire array. To do this, we pass from right to left the elements (from the last to the first) and if the element has descendants, then we make a sift for it.
  • 2. . , . ( ) . , .. . , — . , .




Python code for a classic pyramidal sort implementation:

#    
def HeapSort(data):

    #    
    #   -   
    # (   )       
    for start in range((len(data) - 2) / 2, -1, -1):
        HeapSift(data, start, len(data) - 1) 

    #        
    #        .
    for end in range(len(data) - 1, 0, -1): 
        #       
        #    
        data[end], data[0] = data[0], data[end]
        #        
        #   
        #     
        HeapSift(data, 0, end - 1)
    return data

#   ,      
def HeapSift(data, start, end):

    #   - ,     
    root = start 
    
    #      ,
    #   ,    
    while True:

        child = root * 2 + 1 #  
        #      -  
        if child > end: break 

        #       ,
        #      
        if child + 1 <= end and data[child] < data[child + 1]:
            child += 1

        #     ,   
        #       , 
        #       
        if data[root] < data[child]:
            data[root], data[child] = data[child], data[root]
            root = child
        else:
            break

Algorithm Complexity


Why a simple heap is good - it does not need to be stored separately, unlike other types of trees (for example, a binary search tree based on an array must be created before use). Any array is already a tree in which, right on the go, you can immediately identify parents and descendants. The complexity of the additional memory is O ( 1 ), everything happens right away.

As for the complexity of time, it depends on the sifting. A single sifting is bypassed in O (log n ) . First, we make a sift for n elements in order to build the initial heap from the array - this step takes O ( n log n ) . At the second stage, when we take out nthe current maximums from the heap, we make a single sifting for the remaining unsorted part, i.e. this stage also costs us O ( n log n ) .

Total time complexity: O ( n log n ) + O ( n log n ) = O ( n log n ).
Moreover, the pyramidal sorting has neither degenerate nor better cases. Any array will be processed at a decent speed, but there will be no degradation or records.

Heap sorting on average is slightly slower than fast sorting. But for quicksort, you can pick up a killer array on which the computer hangs, but for heapsort - no.

Time complexity
WorstAverageThe best
O(n log n)
O(n2)O(n log n)O(n)


:: Ternary heapsort


Let's look at the ternary heap. You will not believe it from binary, it differs only in that the parent nodes have a maximum of not two, but three descendants. In ternary heap for the i -th element codes it three offspring similarly calculated (if the first element index = 0):

The left descendant 3 × i + 1
Middle descendant 3 × i + 2
right descendant 3 × i + 3

(If indexes start with 1, as in the animations in this article, then in these formulas you just need to subtract one).

Sorting process:



On the one hand, the number of levels in the tree decreases markedly compared to the binary heap, which means that on average there will be fewer swaps during sifting. On the other hand, to find the minimum descendant, more comparisons will be needed - because the descendants are now not two, but three each. In general, in terms of time complexity - somewhere we find, somewhere we lose, but in general the same thing. The data in the ternary heap is sorted a little faster than in the binary, but this speedup is very small. In all variations of the pyramidal sorting, the developers of the algorithms prefer to take the binary option, because the ternary is supposedly more difficult to implement (although it is “more difficult” to add a couple or three extra lines to the algorithm), and the speed gain is minimal.

Sort by n-heap heap :: N-narny heapsort


Of course, you can not stop there and adapt sorting by a bunch for any number of descendants. Maybe if you continue to increase the number of descendants, you can significantly increase the speed of the process?

For the i th element of the array indices (if the count of zero), its N descendants computed very simply:

1st descendant: N × i + 1
2nd descendant: N × i + 2
3rd descendant: N × i + 3
...
Nth descendant: N × i + N

Python code for sorting by an N-heap:

#      N 
def NHeapSort(data):

    n = 3 #    

    #    
    #   -   
    # (   )       
    for start in range(len(data), -1, -1):
        NHeapSift(data, n, start, len(data) - 1) 

    #        
    #        .
    for end in range(len(data) - 1, 0, -1): 
        #       
        #    
        data[end], data[0] = data[0], data[end]
        #        
        #   
        #     
        NHeapSift(data, n, 0, end - 1)
    return data
    
#  -     N 
def NHeapSift(data, n, start, end):
    
    #   - ,     
    root = start 

    while True:
        
        #   (    )
        #   
        child = root * n + 1
        if child > end: 
            break 

        max = child
        
        #    
        for k in range(2, n + 1):
            current = root * n + k
            if current > end:
                break
                
            if data[current] > data[max]:
                max = current
        
        #     
        #        
        #  
        if data[root] < data[max]:
            data[root], data[max] = data[max], data[root]
            root = max
        else:
            break

However, more does not mean better. If you take the situation to the limit and take N descendants for an array of N elements, then sorting by a bunch degrades to sorting by the usual choice. Moreover, it will also be a worsened version of sorting by selection, because senseless gestures will be performed: the sifting will first put the maximum in first place in the array, and then it will send the maximum to the end (in selection sort, the maximum is sent to the end immediately).

If the ternary heap minimally overtakes the binary, then the quadruple is already losing. Finding the maximum descendant among several becomes too expensive.

Next Series Trailer


So, the main disadvantage of the binary / ternary / n-heap is the inability to jump in its complexity is higher than O ( n log n ) . The way out of the deadlock is to use more sophisticated heap varieties in sorting. In a week we will get acquainted with what Edsger Dijkstra thinks about this.


Click on the animation to go to the article with the following sorting by a bunch

References


Heap / Pyramid

Series Articles:



AlgoLab application added sorting by n-heap. To select the number of descendants, in the comment on the cell of this sort, you need to specify a number for n. The range of possible values ​​is from 2 to 5 (it makes no more sense, because for n> = 6, animation with three levels of nesting at a normal scale is not guaranteed to fit on the screen).

All Articles