Invert Sort

A programmer from India clearly shows Zig-Zag, Zig-Zig and Zig used in the SplaySort algorithm:


This season we are exploring a variety of heaps and how they can be used for sorting . However, this time we will take a step away from the main theme. Today's structure - splay tree - is not a bunch. But we need it to mentally prepare for the study of the next pile - next week there will be a lecture on sorting by a Cartesian tree.
EDISON Software - web-development
This article was prepared with the support of EDISON. One of the directions of our activity is automation of measurements and expert systems , due to which we carry out high-tech projects using a strict scientific approach. We love computer science! ;-)






Binary search tree sort


Splay tree is an improved binary search tree. First, let's remember how to sort using the usual “unimproved” binary tree.

As you well know, in a binary tree, any left child is less than the parent, any right child is no less (i.e. greater or equal) than the parent.


Sorting is generally straightforward:

  • Stage 1. Based on the array, build a binary search tree. We put the first element of the array to the root, compare the remaining elements first with the root, then, depending on the comparison, move down the left or right branches (and along the way we compare the array element with the existing tree nodes). In the end, the next element reaches the end of a branch and becomes a node itself.
  • 2. . , ( , ) . .

Building a tree is quite tolerable in terms of algorithmic complexity - on average, inserting a new node costs O (log n ) , so the time complexity of the first stage is O ( n log n ) .

But in the second stage, not everything is so rosy. A recursive tree walk can easily become a long journey through an extremely winding maze, and the time complexity often degrades to O ( n 2 ) .

Splay tree


To solve this problem, only some 35-37 years ago, two Scientist scientists Robert Tarjan and Daniel Slitor developed this tree structure. They suggested that for any operations (insert, search, delete) with any tree node, immediately rebalance the tree, making the node the root of the entire structure.


In the left photo is Robert Tarjan (first row, second right) in the company of the Matrix Architect and inventor of Pascal. In the right photo is Daniel Slitor.
Clicking on the image will open a full-format version.


In Russian, the unsuccessful name "expanding tree" stuck, less often - "oblique tree". Although if it were simply literally translated, then the “expanded tree” (or, even better, “turned”) sounds good and more accurately reflects the essence of this algorithmic structure. But that is.

In order to push the node to the root, special simple operations are used, the so-called turns:

Zig Turn


If the element you want to make a root is on the second level of nesting, then everything is extremely simple.

We denote this element like the X , and its parent (which is also the root of the tree) - as the P .
A , B , C are subtrees. How much is there in these subtrees nodes does not matter, only interested in the roots of these subtrees that have relations "parent-child" with X and P . If any of the subtrees are missing (that is, if X and P do not have any descendants), then this does not affect the order of actions.


To make X the root of the tree, you need to reverse the parent-child relationship between X and P , and also outweigh the subtree B - it was the right descendant of X , became the left descendant of P ( you don’t need to do anything with A and C ). And it's all! As a result of these simple manipulations, the structure, as it was a binary search tree, remained the same - the principle “the left child is less than the parent, the right child is greater or equal than the parent” will not be violated anywhere.

The previous image X is a left descendant of P . If X was a right descendant, you just need to mirror the situation:


If X has not the second level of difficulty, but the third, then everything is more complicated, but not by much.

If X has a parent P , which in turn has a parent G , then we have only two different situations:
1) if X is a descendant for P on the same side as P for G ;
2) if X and P are versatile descendants for their parents.

First, consider the first case.

ZigZig Turn


Let X be the left descendant for P , and P the left descendant for G (the option when X and P are simultaneously right descendants is solved similarly).


As you can see, it is necessary to change the relationship between the X , P and the G , as well as properly outweigh subtrees B and the C . And our binary search tree will not suffer from this.

ZigZag Turn


If X is the right descendant, and P is left (or X is left, and P is right, not the essence), then I hope you already understand everything from this picture:


If X in the tree is still at a lower level of nesting, then to raise it, you need to apply ZigZig or ZigZag (if necessary, do it several times) to that parent up the branch, which is at the third level of nesting.

Invert Sort :: Splay sort


Actually, here we perform the same points as in tree sorting - first we build a binary sort tree, then we go around it. But every time we insert a new knot into the tree - using ridges and zags, we make it the root.

At the first stage of winning, this does not give, inserting one node (taking into account that you have to zigzag to make it a root) costs on average O (log n ) - and thus the complexity of this stage is O ( n log n ) .

However, as a result, amazing transformations take place with the tree - it straightens and is kept in such a straightened state all the time. And this decisively accelerates the second stage (tree traversal), since the resulting final ladder is processed with O ( n ) complexity .

Thus, the total complexity of the algorithm (worst, medium, best) in time is O ( n log n ) .

In practice, this sorting is slower than MergeSort, QuickSort and other HeapSort, but it clearly demonstrates how you can speed up operations with binary search tree.

The moral of this fable is this: if you have to deal with a binary search tree - if possible, make it self-balancing. As a possible option - work with it as with a splay tree, i.e. in any action with any node in the tree, make this node the root. Of course, there are other alternative self-balancing search trees (red-black tree, AVL tree, etc.), which may be more preferable.

C code


not to look nervous
/*
 *------------------------------------------------------------
 *
 *      File..........: $RCSfile: splaysort.c,v $
 *      Revision......: $Revision: 1.2 $
 *      Author........: Gary Eddy (gary@cs.mu.OZ.AU)
 *			Alistair Moffat (alistair@cs.mu.OZ.AU)
 *      Date..........: $Date: 1995/09/07 06:19:17 $
 *
 *	Description:
 *		Sorts by repeated insertion into a splay tree.
 *		Insertions are done top down
 *
 *------------------------------------------------------------
 */

#include	<stdio.h>
#include	<stdlib.h>
#include	<malloc.h>
#include	<alloca.h>

char	*sort_name = "Splaysort";
char	*sort_version = "$Revision: 1.2 $";

/*
** Define DATAPTR for the 12 byte per record version of the program.
** Otherwise only 8 extra bytes per record are used and data
** references are done by indexing the data array.
** Different compiler/architecture combinations can cause wild
** variation in the ratio of speeds between these variations.
*/

#define DATAPTR

#ifdef DATAPTR
#define DATA(p) ((p)->data)
#else
#define DATA(p) (A+size*(p-T))
#endif

/*
** With the fast copy option enabled a more intelligent copy is used
** depending on the size of the items being copied.
** This approach adopted from the nqsort program of Bentley and McIlroy,
** see Software Practice and Experience, v23, n11, November 1993, 1249-65.
*/

#define FASTCOPY

#ifdef FASTCOPY
#define COPYCODE(TYPE, parmi, parmj, n) { \
        long i = (n) / sizeof (TYPE); \
        register TYPE *pi = (TYPE *) (parmi); \
        register TYPE *pj = (TYPE *) (parmj); \
        do { \
                *pi++ = *pj++;                    \
        } while (--i > 0);      \
}

void
copyfunc(char *d, char *s, int size, int copytype)
{
        if(copytype <= 1)
                COPYCODE(long, d, s, size)
        else
                COPYCODE(char, d, s, size)
	return;
}

#define COPY(d,s,size) \
        if (copytype == 0) { \
                *(long *)(d) = *(long *)(s); \
        } else \
                copyfunc(d, s, size, copytype)
#else
#define COPY(d,s,size)	memcpy(d,s,size)
#endif

typedef struct  node_rec node;

struct	node_rec {
	node	*left, *rght;
#ifdef DATAPTR
	char	*data;
#endif
};

/*
 *	sort():
 *		The entire sort code 
 *
 *	Accesses outside variables:	none
 *
 *	Return value...:		none
 */

void
sort(void *A, size_t n, size_t size, int (*cmp)(const void *, const void *))
{
	register node *next, *curr, *prnt;
	char 	*item;
	node	*l, *r, *ch;
	node	root, *T, *new, *end, *move;
#ifndef DATAPTR
	char	*p;
#endif

/*
** Determine which copying method will be used.
*/
#ifdef FASTCOPY
	int	copytype=((char *)A - (char *)0) % sizeof(long) ||
		size % sizeof(long) ? 2 : size == sizeof(long) ? 0 : 1;
#endif


	if(n < 2)
		return;
	if((T = new = (node *) malloc(sizeof(*T)*n)) == NULL) {
		fprintf(stderr, "Couldn't allocate space for structure\n");
	}
	/* 
	** Do the first insertion manually to avoid the empty tree
	** case in the main loop.
	*/
	curr = new++;
	item = A;
	curr->left = curr->rght = NULL;
#ifdef DATAPTR
	curr->data = item;
#endif
	item += size;
	end = T+n;
	/*
	** For each item move down the tree dividing it into
	** two subtrees, one containing items less than the new
	** element and the other those which are greater.
	** The pointers l and r refer to the greatest element in the
	** smaller subtree and the smallest element in the large
	** subtree respectively. During the splitting of the tree
	** l and r retain children that may be incorrect in the final
	** tree but these false links are cut at the end of the
	** insertion.
	*/

	for( ; new<end; ) {
		l = r = &root;
		while(curr != NULL) {
			if(cmp(item, DATA(curr)) < 0) {
				/* Left root case */
				if((ch = curr->left) == NULL) {
					r = r->left = curr;
					break;
				}
				/* Left zig-zig */
				else if(cmp(item, DATA(ch)) < 0) {
					curr->left = ch->rght;
					ch->rght = curr;
					r = r->left = ch;
					curr = ch->left;
				}
				/* Left zig-zag */
				else {
					r = r->left = curr;
					l = l->rght = ch;
					curr = ch->rght;
				}
			}
			else {
				/* Right root case */
				if((ch = curr->rght) == NULL) {
					l = l->rght = curr;
					break;
				}
				/* Right zig-zag */
				else if (cmp(item, DATA(ch)) < 0) {
					l = l->rght = curr;
					r = r->left = ch;
					curr = ch->left;
				}
				/* Right zig-zig */
				else {
					curr->rght = ch->left;
					ch->left = curr;
					l = l->rght = ch;
					curr = ch->rght;
				}
			}
		}
		/* Cut false links */
		r->left = l->rght = NULL;
		new->rght = root.left;
		new->left = root.rght;
#ifdef DATAPTR
		new->data = item;
#endif
		curr = new++;
		item += size;
	}
	/* Now copy all of the data back into the input array.
	** Uses an iterative destructive inorder traversal.
	** Last item inserted is the current root.
	*/
	prnt = NULL;
	move = T;
	while (1) {
		if ((next = curr->left) != NULL) {
			/* left subtree exists */
			curr->left = prnt;
			prnt = curr;
			curr = next;
		}
		else {
			next = curr->rght;
			curr->rght = move++;
			if (next != NULL) {
				/* and arrange for a visit */
				if((curr = next->left) != NULL) {
					next->left = prnt;
					prnt = next;
					continue;
				}
				else {
					curr = next;
					continue;
				}
			}
			/* no right subtree either, turn around*/
			if (prnt != NULL) {
				curr = prnt;
				prnt = prnt->left;
				curr->left = NULL;
				continue;
			}
			/* nothing left to be done */
			break;
		}
	}
#ifndef DATAPTR
	/*
	** Change the goes-to array in rght to a comes_from in left.
	** Note the kludge on pointers, where left points into the 
	** character array containing the elements.
	*/
	for(next = T, p = A; next < end; p += size, next++)
		next->rght->left = (node *)p;
	/* and use the `comes from' array to unscramble the permutation */
	item = (char *)malloc(size);
        for (next=T; next<end; next++) {
                char *datacurr, *dataleftcurr, *final;
                if (next->left == NULL)
                        continue;
                curr = next;
                final = datacurr = DATA(curr);
                COPY(item, datacurr, size);
                while ((char *)(curr->left) != final) {
                        dataleftcurr = (char *)(curr->left);
                        COPY(datacurr, dataleftcurr, size);
                        prnt = curr;
                        curr = T + (((char *)(curr->left)-A)/size);
                        prnt->left = NULL;
                        datacurr = dataleftcurr;
                }
                COPY(datacurr, item, size);
                curr->left = NULL;
        }
#else
	/* Change the goes-to array in rght to a comes_from in left */
	for(next = T; next < end; next++)
		next->rght->left = next;
	/* and use the `comes from' array to unscramble the permutation */
	/*
	** This 12 byte version uses the presence of the data pointer
	** by making it the flag for already placed items. This means
	** that left pointers can point to nodes, eliminating the
	** calculation to find the next node.
	*/
	item = (char *)malloc(size);
	for (next=T; next<end; next++) {
		char *datacurr, *dataleftcurr, *final;
		/* second condition guarantees at least one iteration
		** of while loop.
		*/
		if (DATA(next) == (char *) NULL || next->left == next)
			continue;
		final = datacurr = DATA(next);
		curr = next->left;
		COPY(item, datacurr, size);
		while ((dataleftcurr = DATA(curr)) != final) {
			COPY(datacurr, dataleftcurr, size);
			prnt = curr;
			curr = curr->left;
			DATA(prnt) = (char *) NULL;
			datacurr = dataleftcurr;
		}
		COPY(datacurr, item, size);
		DATA(curr) = (char *) NULL;
	}
#endif
	free(item);
	free(T);
} /* sort() */

Next Series Trailer


And we return to the heaps. The next little thing will be more interesting than the one that we examined today. This hybrid is a tree-like data structure that is both a heap and a binary search tree.



References


Binary search tree , Splay tree

Splay trees

Series Articles:



Sort added to AlgoLab. So, if you update the Excel file with macros, you can personally upset the binary search tree.

All Articles