Blitsort is an in-place stable adaptive rotate merge sort

Overview

Origin

Blitsort is a rotate merge sort based on quadsort.

Visualization

In the visualization below nine tests are performed on 256 elements.

  1. Random order
  2. Ascending order
  3. Ascending Saw
  4. Generic random order
  5. Descending order
  6. Descending Saw
  7. Random tail
  8. Random half
  9. Ascending tiles.

The upper half shows the swap memory of 32 elements, and the bottom half shows the main memory. Colors are used to differentiate various operations. Parity merges are in orange. Rotations are in yellow and violet.

blitsort benchmark

Quad swap

Blitsort uses the same quad swap as quadsort. It is a routine that creates blocks of 32 sorted elements using 31 comparisons for in-order and reverse-order data, and aproximately 160 comparisons for random data.

A block of 4 elements is created with a decision tree, a block of 4 is turned into a block of 8 with a binary decision tree, and four blocks of 8 are turned into a block of 32 using a parity merge.

Rotate merge sort

A rotate merge sort uses rotations to partition two sorted arrays until they're small enough to be merged using auxiliary memory. Blitsort does so by taking the center element of the first array, using a binary search to find all elements smaller than the center element in the second array, and performing an array rotation. It does so recursively until a partition becomes small enough to be merged.

Monobound binary search

Blitsort uses a monobound binary search, which is up to two times faster than the binary search in general use.

Trinity rotation

Blitsort uses a trinity rotation, a new and significantly faster array rotation algorithm.

Memory

By default blitsort uses 512 elements worth of stack memory.

The minimum memory requirement for blitsort is 32 elements of stack memory, it can be configured to use sqrt(n) memory.

Blitsort rotate merges recursively, requiring an additional log(n) memory. It's possible to make this O(1) through the implementation of a stack.

There is currently no clear consensus on what constitutes as an in-place sort, it boils down to what someone considers a small enough memory footprint to be considered negligable. This typically ranges from the size of a cache line to the size of the L1 cache.

Performance

Blitsort has exceptional performance due to the quad swap, monobound binary search, and trinity rotation. It is likely the fastest in-place stable sort written so far and is about 15% faster than octosort, which is a block merge sort.

Blitsort's performance is similar to that of quadsort as long as the auxiliary memory is greater or equal to the square root of the array being sorted, which comes out at 262,144 elements with the default stack of 512 elements. Performance on larger arrays will degrade slowly.

Data Types

Blitsort supports long doubles and 8, 16, 32, and 64 bit data types. By using 32 or 64 bit pointers it's possible to sort any other data type, like strings. Custom data sizes can be added in blitsort.h.

Interface

The interface is the same one as qsort, which is described in man qsort.

Big O

                 ┌───────────────────────┐┌───────────────────────┐
                 │comparisons            ││swap memory           
┌───────────────┐├───────┬───────┬───────┤├───────┬───────┬───────┤┌──────┐┌─────────┐┌─────────┐
│name           ││min    │avg    │max    ││min    │avg    │max    ││stable││partition││adaptive │
├───────────────┤├───────┼───────┼───────┤├───────┼───────┼───────┤├──────┤├─────────┤├─────────┤
│blitsort       ││n      │n log n│n log n││1      │1      │1      ││yes   ││no       ││yes     
├───────────────┤├───────┼───────┼───────┤├───────┼───────┼───────┤├──────┤├─────────┤├─────────┤
│mergesort      ││n log n│n log n│n log n││n      │n      │n      ││yes   ││no       ││no      
├───────────────┤├───────┼───────┼───────┤├───────┼───────┼───────┤├──────┤├─────────┤├─────────┤
│timsort        ││n      │n log n│n log n││n      │n      │n      ││yes   ││no       ││yes     
├───────────────┤├───────┼───────┼───────┤├───────┼───────┼───────┤├──────┤├─────────┤├─────────┤
│quicksort      ││n      │n log n│n²     ││1      │1      │1      ││no    ││yes      ││no      
└───────────────┘└───────┴───────┴───────┘└───────┴───────┴───────┘└──────┘└─────────┘└─────────┘

Benchmark: blitsort vs std::stable_sort vs gfx::timsort

The following benchmark was on WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) using the wolfsort benchmark. The source code was compiled using g++ -O3 -w -fpermissive bench.c. Stablesort is g++'s std:stable_sort function.

The graph shows the relative performance on 100,000 32 bit integers.

Graph

data table
Name Items Type Best Average Loops Samples Distribution
stablesort 100000 32 0.006048 0.006072 1 100 random order
blitsort 100000 32 0.005839 0.005869 1 100 random order
timsort 100000 32 0.007586 0.007613 1 100 random order
stablesort 100000 32 0.000658 0.000713 1 100 ascending order
blitsort 100000 32 0.000061 0.000063 1 100 ascending order
timsort 100000 32 0.000045 0.000045 1 100 ascending order
stablesort 100000 32 0.001345 0.001432 1 100 ascending saw
blitsort 100000 32 0.001046 0.001055 1 100 ascending saw
timsort 100000 32 0.000854 0.000860 1 100 ascending saw
stablesort 100000 32 0.003905 0.003925 1 100 generic order
blitsort 100000 32 0.003664 0.003679 1 100 generic order
timsort 100000 32 0.005589 0.005616 1 100 generic order
stablesort 100000 32 0.000895 0.000905 1 100 descending order
blitsort 100000 32 0.000048 0.000048 1 100 descending order
timsort 100000 32 0.000089 0.000092 1 100 descending order
stablesort 100000 32 0.001040 0.001053 1 100 descending saw
blitsort 100000 32 0.000626 0.000634 1 100 descending saw
timsort 100000 32 0.000455 0.000460 1 100 descending saw
stablesort 100000 32 0.002053 0.002113 1 100 random tail
blitsort 100000 32 0.001678 0.001688 1 100 random tail
timsort 100000 32 0.001997 0.002018 1 100 random tail
stablesort 100000 32 0.003523 0.003557 1 100 random half
blitsort 100000 32 0.003212 0.003224 1 100 random half
timsort 100000 32 0.004021 0.004041 1 100 random half
stablesort 100000 32 0.000971 0.000984 1 100 ascending tiles
blitsort 100000 32 0.000575 0.000586 1 100 ascending tiles
timsort 100000 32 0.000838 0.000864 1 100 ascending tiles

Benchmark: blitsort vs qsort (quicksort)

The following benchmark was on CYGWIN_NT-10.0-WOW gcc version 10.2.0. The source code was compiled using gcc -O3 bench.c.

The graph shows the relative performance on 100,000 32 bit integers.

Graph

data table
Name Items Type Best Average Compares Samples Distribution
qsort 100000 32 0.011868 0.011914 1732151 10 random order
blitsort 100000 32 0.010623 0.010844 1630132 10 random order
qsort 100000 32 0.000719 0.000733 300004 10 ascending order
blitsort 100000 32 0.000197 0.000197 99999 10 ascending order
qsort 100000 32 0.006692 0.006747 1723033 10 ascending saw
blitsort 100000 32 0.001893 0.001911 388188 10 ascending saw
qsort 100000 32 0.004165 0.004213 602517 10 generic order
blitsort 100000 32 0.007435 0.007485 1566304 10 generic order
qsort 100000 32 0.001029 0.001048 400015 10 descending order
blitsort 100000 32 0.000188 0.000192 99999 10 descending order
qsort 100000 32 0.006883 0.007062 1780288 10 descending saw
blitsort 100000 32 0.001925 0.001953 399927 10 descending saw
qsort 100000 32 0.008802 0.008875 1695201 10 random tail
blitsort 100000 32 0.003056 0.003072 571457 10 random tail
qsort 100000 32 0.010466 0.010562 1734815 10 random half
blitsort 100000 32 0.005813 0.005854 960136 10 random half
qsort 100000 32 0.000745 0.000753 184218 10 unstable
blitsort 100000 32 0.002868 0.002872 809499 10 ascending tiles
You might also like...
The merge() function is used for merging two halves

The merge() function is used for merging two halves. The merge(arr, l, m, r) is a key process that assumes that arr[l..m] and arr[m+1..r] are sorted and merges the two sorted sub-arrays into one. See the following C implementation for details.

Simple command line tool that processes image files using the FidelityFX Super Resolution (FSR) or Contrast Adaptive Sharpening (CAS) shader systems.

Simple command line tool that processes image files using the FidelityFX Super Resolution (FSR) or Contrast Adaptive Sharpening (CAS) shader systems.

around — adaptive rounding operation

around — adaptive rounding operation Attempts to perform nice rounding of a floating point number, like a human would do. Usage: around.h #include "ar

Adaptive Runtime AUTOSAR Linux Simulator

Adaptive-AUTOSAR Adaptive AUTOSAR is a simulated Adaptive Platform enviroment over Linux defined by AUTOSAR. The goal of this project is to implement

A composable container for Adaptive ROS 2 Node computations. Select between FPGA, CPU or GPU at run-time.

adaptive_component A composable stateless container for Adaptive ROS 2 Node computations. Select between FPGA, CPU or GPU at run-time. Nodes using har

BAAF-Net - Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021)
BAAF-Net - Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021)

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021) This repository is for BAAF-Net introduce

Bolt is a C++ template library optimized for GPUs. Bolt provides high-performance library implementations for common algorithms such as scan, reduce, transform, and sort.

Bolt is a C++ template library optimized for heterogeneous computing. Bolt is designed to provide high-performance library implementations for common

Pipy is a tiny, high performance, highly stable, programmable proxy.

Pipy Pipy is a tiny, high performance, highly stable, programmable proxy. Written in C++, built on top of Asio asynchronous I/O library, Pipy is extre

Fast comparison-based sort algorithm

nanosort Algorithm nanosort aims to be a fast comparison-based sorting algorithm, tuned for POD types of reasonably small sizes. nanosort implements a

Cetus is a high performance, stable, protocol aware proxy for MySQL Group Replication.

Introduction Cetus is a high performance, stable, protocol aware proxy for MySQL Group Replication. Getting started 1. Prerequisites cmake gcc glib2-d

Pipy is a tiny, high performance, highly stable, programmable proxy written in C++

Pipy is a tiny, high performance, highly stable, programmable proxy. Written in C++, built on top of Asio asynchronous I/O library, Pipy is extremely lightweight and fast, making it one of the best choices for service mesh sidecars.

SortNode is a JS binding for SORT: Simple, online, and real-time tracking of multiple objects in a video sequence.

SortNode is a JS binding for SORT: Simple, online, and real-time tracking of multiple objects in a video sequence.

alie, simplified Discord bot, that's it. As fast and stable as possible.

alie alie, simplified Discord bot, that's it. As fast and stable as possible. Requirements Linux-compatible OS (aka Linux distribution) A C compiler w

Quick sort is a highly efficient sorting algorithm and is based on partitioning of array of data into smaller arrays.
Quick sort is a highly efficient sorting algorithm and is based on partitioning of array of data into smaller arrays.

Quick sort is a highly efficient sorting algorithm and is based on partitioning of array of data into smaller arrays. A large array is partitioned into two arrays one of which holds values smaller than the specified value, say pivot, based on which the partition is made and another array holds values greater than the pivot value.

Kernel source for j7y17lte - the goal is to make it as closest to linux-stable sources as possible without breaking OneUI compatibility.

Linux kernel release 3.x http://kernel.org/ These are the release notes for Linux version 3. Read them carefully, as they tell you what this is al

Hand tuned sort routines for different C types

c 2008,2009,2021 Andrew I. Schein See LICENSE file. Update for 2021 This repo disappeared from the internet when Bitbucket turned off its mercurial se

This is a Program, to sort Arrays with the QuickSort Algorithm.

QuickSort This is a program, to sort arrays with the QuickSort Algorithm. The Algorithm is optimized to be quick, but it isn't the fastest. I have wri

A stable nginx module for SSL/TLS ja3 fingerprint, with high performance.

nginx-ssl-fingerprint A stable nginx module for SSL/TLS ja3 fingerprint, with high performance. Description This module adds new nginx variables for t

Comments
  • Problems when building blitsort

    Problems when building blitsort

    Hello!

    When building code I get lots of errors (1370 exactly). Most of them are "use of undeclared identifier" or "unknown type name". Any help here?.

    Also, what is supposed I have to pass in the last parameter of blitsort func (type CMPFUNC*)?

    Thanks in advance.

    opened by cristiansgc 3
  • Random int test record on mac

    Random int test record on mac

    I wrote a simple test program. The output revealed qsort is better than blitsort on ARM macOS when sort the random int. And on ARM macos, the size of long long and size of long double maybe is the same. I used gcc to compile the code, compiler said duplicated value 8byte. So I commented the line and compiled successfully. I don't know wether the qsort implementation in c is optimized on ARM Macos or not.I'm just a noob, the code maybe have some errors. You can point out my error. My operating system is bigsur 11.0 on mac. My code as follow:

    #include <stdlib.h> #include <stdio.h> #include <string.h> #include <sys/time.h> #include <time.h> #include <errno.h> #include <math.h>

    #include "blitsort.h"

    int cmp_int (const void * a, const void * b) { return ( (int)a - (int)b ); }

    long long utime() { struct timeval now_time; gettimeofday(&now_time, NULL); return now_time.tv_sec * 1000000LL + now_time.tv_usec; }

    int main(){ int a[100000],b[100000],c[100000]; int seed = time(NULL); srand(seed); long long assignment_start_one = utime(); for(int i = 0;i < 100000;i++){ a[i] = rand(); } for(int j = 0;j < 100000;j++){ b[j] = a[j]; c[j] = a[j]; } long long assignment_end_one = utime(); printf("Assignment time: %lld. \n",assignment_end_one - assignment_start_one); long long start = utime(); blitsort(b,100000,sizeof(int),cmp_int); long long end = utime(); long long total = end - start; long long start_qsort = utime(); qsort(c,100000,sizeof(int),cmp_int); long long end_qsort = utime(); long long total_qsort = end_qsort - start_qsort; printf("Blitsort time: %lld. \n",total); printf("Qsort time: %lld. \n",total_qsort); return 0; }

    opened by xerox51 2
  • Benchmarks with small number and

    Benchmarks with small number and "too large" number of items in the array

    Could you also add plots of a benchmark with different number of items in the array (as you did in benchmarks of Quadsort)?

    Especially including 4-items, 8-items, 16-items, 32-items, 230_000-items, 300_000-items, and 400_000-items array runs (i.e. very low number of items and then very large number considering its limited 512 stack).

    opened by dumblob 2
Owner
null
merge two sorted lists fast

Py Merge Merge sorted list faster than using list.sort or heapq.merge. import merge # create some sorted lists a = list(range(-100, 1700)) b = list(r

Earthly 10 Nov 21, 2021
Rajesh Kumar Sah 1 Nov 20, 2021
This is a Program, to sort Arrays with the QuickSort Algorithm.

QuickSort This is a program, to sort arrays with the QuickSort Algorithm. The Algorithm is optimized to be quick, but it isn't the fastest. I have wri

Niklas 1 Oct 29, 2021
A collection of multiple types of lists used during pentesting, collected in one place. List types include usernames, passwords, combos, wordlist and may more..

Access list is a collection of multiple types of lists used during pentesting, collected in one place, created by Undercode This list include a collec

UNDERCODE UTILITIES 10 Nov 21, 2022
Parallel algorithms (quick-sort, merge-sort , enumeration-sort) implemented by p-threads and CUDA

程序运行方式 一、编译程序,进入sort-project(cuda-sort-project),输入命令行 make 程序即可自动编译为可以执行文件sort(cudaSort)。 二、运行可执行程序,输入命令行 ./sort 或 ./cudaSort 三、删除程序 make clean 四、指定线程

Fu-Yun Wang 3 May 30, 2022
A stable adaptive partitioning comparison sort.

Intro This document describes a partitioning stable adaptive comparison-based sort named gridsort. Binary Cube Gridsort sorts data by storing data in

null 199 Jul 13, 2022
A stable adaptive branchless partitioning comparison sort.

Intro This document describes a partitioning stable adaptive comparison-based sort named fluxsort. Benchmarks and a visualization are available at the

null 189 Nov 8, 2022
Bitset Sort, a faster std::sort replacement.

Bitset Sort Bitset Sort is a variant of quick sort, specifically BlockQuickSort. Bitset Sort uses a carefully written partition function to let the co

null 65 Oct 15, 2022
imGuIZMO.quat is a ImGui widget: like a trackball it provides a way to rotate models, lights, or objects with mouse, and graphically visualize their position in space, also around any single axis (Shift/Ctrl/Alt/Super)

imGuIZMO.quat v3.0 imGuIZMO.quat is a ImGui widget: like a trackball it provides a way to rotate models, lights, or objects with mouse, and graphicall

Michele Morrone 271 Nov 19, 2022
merge two sorted lists fast

Py Merge Merge sorted list faster than using list.sort or heapq.merge. import merge # create some sorted lists a = list(range(-100, 1700)) b = list(r

Earthly 10 Nov 21, 2021