Sorting routine implementations in "template" C

Related tags

Utilities sort
Overview

sort.h

build status

Overview

sort.h is an implementation of a ton of sorting algorithms in C with a user-defined type that is provided at include time.

This means you don't have to pay the function call overhead of using a standard library routine. This also gives us the power of higher-level language generics.

In addition, you don't have to link in a library: the entirety of this sorting library is contained in the sort.h header file.

You get the choice of many sorting routines, including:

If you don't know which one to use, you should probably use Timsort.

If you have a lot data that is semi-structured, then you should definitely use Timsort.

If you have data that is really and truly random, quicksort is probably fastest.

Usage

To use this library, you need to do three things:

  • #define SORT_TYPE to be the type of the elements of the array you want to sort. (For pointers, you should declare this like: #define SORT_TYPE int*)
  • #define SORT_NAME to be a unique name that will be prepended to all the routines, i.e., #define SORT_NAME mine would give you routines named mine_heap_sort, and so forth.
  • #include "sort.h". Make sure that sort.h is in your include path.

Then, enjoy using the sorting routines.

Quick example:

#define SORT_NAME int64
#define SORT_TYPE int64_t
#define SORT_CMP(x, y) ((x) - (y))
#include "sort.h"

You would now have access to int64_quick_sort, int64_tim_sort, etc., which you can use like

/* Assumes you have some int64_t *arr or int64_t arr[128]; */
int64_quick_sort(arr, 128);

See demo.c for a more detailed example usage.

If you are going to use your own custom type, you must redefine SORT_CMP(x, y) with your comparison function, so that it returns a value less than zero if x < y, equal to zero if x == y, and greater than 0 if x > y.

The default just uses the builtin < operators:

#define SORT_CMP(x, y)  ((x) < (y) ? -1 : ((y) < (x) ? 1 : 0))

It is often just fine to just subtract the arguments as well (though this can cause some stability problems with floating-point types):

#define SORT_CMP(x, y) ((x) - (y))

You can also redefine TIM_SORT_STACK_SIZE (default 128) to control the size of the tim sort stack (which can be used to reduce memory). Reducing it too far can cause tim sort to overflow the stack though.

Speed of routines

The speed of each routine is highly dependent on your computer and the structure of your data.

If your data has a lot of partially sorted sequences, then Tim sort will beat the kilt off of anything else.

Timsort is not as good if memory movement is many orders of magnitude more expensive than comparisons (like, many more than for normal int and double). If so, then quick sort is probably your routine. On the other hand, Timsort does extremely well if the comparison operator is very expensive, since it strives hard to minimize comparisons.

Here is the output of demo.c, which will give you the timings for a run of 10,000 int64_ts on 2014-era MacBook Pro:

Running tests
stdlib qsort time:             1285.00 us per iteration
stdlib heapsort time:          2109.00 us per iteration
stdlib mergesort time:         1299.00 us per iteration
quick sort time:                579.00 us per iteration
selection sort time:         127176.00 us per iteration
merge sort time:                999.00 us per iteration
binary insertion sort time:   13443.00 us per iteration
heap sort time:                 592.00 us per iteration
shell sort time:               1054.00 us per iteration
tim sort time:                 1005.00 us per iteration
in-place merge sort time:       903.00 us per iteration
grail sort time:               1220.00 us per iteration
sqrt sort time:                1095.00 us per iteration

Quicksort is the winner here. Heapsort, in-place merge sort, and timsort also often tend to be quite fast.

Contributing

See CONTRIBUTING.md.

References

License

Available under the MIT License. See LICENSE.md for details.

Comments
  • Added Niklaus Wirth formulation of quicksort with some added twists. …

    Added Niklaus Wirth formulation of quicksort with some added twists. …

    …Changed makefile to unroll loops (Yes, it does make a difference).

    This version of quicksort has been my standby sorting algorithm for years. It is a bit more simplistic than the version already in sort.h, but performs well, especially with loop unrolling enabled.

    opened by olehjalmar 10
  • Gallop mode in timsort

    Gallop mode in timsort

    I haven't found the gallop mode code in timsort, which I believe is an important feature of it. Is there a particular reason for it? For example, did some tests demonstrate timsort without gallop is better? If not, do you accept PR related to that? Thanks in advance.

    opened by liwt31 8
  • Mergesort broken?

    Mergesort broken?

    Segfault when sorting more than 4M

    [email protected] ~/tmp $ cat test.c 
    #include <stdlib.h>
    #include <sys/stat.h>
    #define SORT_NAME s
    #define SORT_TYPE int
    #include "sort.h"
    
    int main(int argc, char** argv)
    {
        struct stat s;
        stat(argv[1], &s);
        FILE * f =  fopen(argv[1], "rb");
        int * d = malloc(s.st_size);
        size_t r = fread(d, 1, s.st_size, f);
        if (r % sizeof(int) != 0 || r != s.st_size) return -1;
        s_merge_sort(d, s.st_size/sizeof(int));
        free(d);
        printf("ok\n");
        return 0;
    }
    [email protected] ~/tmp $ gcc test.c 
    [email protected] ~/tmp $ dd if=/dev/urandom of=dump bs=1M count=3                                                                                                      
    3+0 records in
    3+0 records out
    3145728 bytes (3.1 MB) copied, 0.25936 s, 12.1 MB/s
    [email protected] ~/tmp $ ./a.out dump 
    ok
    [email protected] ~/tmp $ dd if=/dev/urandom of=dump bs=1M count=4
    4+0 records in
    4+0 records out
    4194304 bytes (4.2 MB) copied, 0.378901 s, 11.1 MB/s
    [email protected] ~/tmp $ ./a.out dump 
    Segmentation fault
    [email protected] ~/tmp $
    
    opened by yekm 8
  • timsort is broken

    timsort is broken

    If I print the sorted array, this is what I get: (reduced to 100 runs)

    [email protected]:~/Downloads/swenson-sort-e46b67f$ make && ./demo Running tests tim sort time: 18.00 us per iteration 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.034655 0.035228 0.055097 0.069287 0.069998 0.079968 0.144708 0.144813 0.156669 0.234791 0.251203 0.268880 0.303772 0.304815 0.313367 0.314424 0.321176 0.322220 0.323341 0.331830 0.373599 0.375300 0.406086 0.418413 0.461625 0.487840 0.495669 0.501077 0.501475 0.505309 0.512779 0.515011 0.536191 0.575012 0.639690 0.657350 0.694520 0.700650 0.710305 0.720733 0.721414 0.769201 0.814687 0.841742 0.848197 0.891333 0.905963 0.923393 0.933750 0.965645

    On an Ubuntu 10.10, amd64. Maybe a 64 bits issue?

    opened by kassoulet 8
  • Patch updating Makefile to -std=c11, including <inttypes.h>, and resolving conversion specifier warnings.

    Patch updating Makefile to -std=c11, including , and resolving conversion specifier warnings.

    Below is a short patch that resolves numerous warnings and updates the makefile to compile to the C standard that is only 8-years old as opposed to 30-years old. It also resolves the cast of int64_t values and the use of %lld by including inttypes.h and using the proper PRId64 conversion specifier. Additionally, warnings regarding signed and unsigned values used within a conditional expression are resolved via a cast to size_t.

    opened by drankinatty 6
  • Implement gallop mode in timsort (#49)

    Implement gallop mode in timsort (#49)

    This PR implements gallop mode in timsort according to the CPython code. Benchmark between this implementation and the original timsort (otim sort or osortertim_sort) is shown below (array size 1600, averaged over 1000 tests):

                 tim sort -- stable
                otim sort -- stable
    -------
    Running tests with random numbers:
    -------
    sort.h sortertim_sort         - ok,    61202.0 usec
    sort.h osortertim_sort        - ok,    50822.0 usec
    -------
    Running tests with same number:
    -------
    sort.h sortertim_sort         - ok,      511.0 usec
    sort.h osortertim_sort        - ok,      478.0 usec
    -------
    Running tests with sorted numbers:
    -------
    sort.h sortertim_sort         - ok,      560.0 usec
    sort.h osortertim_sort        - ok,      487.0 usec
    -------
    Running tests with sorted blocks of length 10:
    -------
    sort.h sortertim_sort         - ok,    41630.0 usec
    sort.h osortertim_sort        - ok,    42016.0 usec
    -------
    Running tests with sorted blocks of length 100:
    -------
    sort.h sortertim_sort         - ok,    13310.0 usec
    sort.h osortertim_sort        - ok,    14273.0 usec
    -------
    Running tests with sorted blocks of length 10000:
    -------
    sort.h sortertim_sort         - ok,      494.0 usec
    sort.h osortertim_sort        - ok,      479.0 usec
    -------
    Running tests with swapped size/2 pairs:
    -------
    sort.h sortertim_sort         - ok,     1012.0 usec
    sort.h osortertim_sort        - ok,     1617.0 usec
    -------
    Running tests with swapped size/8 pairs:
    -------
    sort.h sortertim_sort         - ok,      997.0 usec
    sort.h osortertim_sort        - ok,     1636.0 usec
    

    The overhead is significant for random numbers, while gain from nearly sorted numbers (last 2 tests) is pretty decent. If the size of the array is multiplied by 10 (to 16000), smaller overhead is expected:

                 tim sort -- stable
                otim sort -- stable
    -------
    Running tests with random numbers:
    -------
    sort.h sortertim_sort         - ok,   618967.0 usec
    sort.h osortertim_sort        - ok,   663787.0 usec
    -------
    Running tests with same number:
    -------
    sort.h sortertim_sort         - ok,     5541.0 usec
    sort.h osortertim_sort        - ok,     5740.0 usec
    -------
    Running tests with sorted numbers:
    -------
    sort.h sortertim_sort         - ok,     5491.0 usec
    sort.h osortertim_sort        - ok,     5549.0 usec
    -------
    Running tests with sorted blocks of length 10:
    -------
    sort.h sortertim_sort         - ok,   645361.0 usec
    sort.h osortertim_sort        - ok,   657122.0 usec
    -------
    Running tests with sorted blocks of length 100:
    -------
    sort.h sortertim_sort         - ok,   302787.0 usec
    sort.h osortertim_sort        - ok,   308961.0 usec
    -------
    Running tests with sorted blocks of length 10000:
    -------
    sort.h sortertim_sort         - ok,    18291.0 usec
    sort.h osortertim_sort        - ok,    21772.0 usec
    -------
    Running tests with swapped size/2 pairs:
    -------
    sort.h sortertim_sort         - ok,     6765.0 usec
    sort.h osortertim_sort        - ok,    14864.0 usec
    -------
    Running tests with swapped size/8 pairs:
    -------
    sort.h sortertim_sort         - ok,     6765.0 usec
    sort.h osortertim_sort        - ok,    14566.0 usec
    

    I don't quite understand why with larger array gallop mode enabled timsort performs better even in cases such as random numbers. Maybe some fluctuation in my environment? Still, the gallop mode proves to be a reasonable improvement.

    This cmp.c is the benchmark file I used, modified from stresstest.c. Put this sort.h and the original sort.h as osort.h in the same directory with cmp.c then it's ready to go.

    opened by liwt31 6
  • undefined reference to `__builtin_clzll'

    undefined reference to `__builtin_clzll'

    The head notes of GNU GetText show this. My apologies of this is not a swenson-sort issue.

    /*
     * Taken from https://github.com/swenson/sort
     * Revision: 05fd77bfec049ce8b7c408c4d3dd2d51ee061a15
     * Removed all code unrelated to Timsort and made minor adjustments for
     * cross-platform compatibility.
     */
    

    GNU GetText is failing to compile on 32-bit older systems. The error under GCC 3.3 is:

    $ cat ~/get-text.txt
    libtool: link: gcc -std=gnu99 -g2 -O2 -fPIC -pthread -Wl,-R -Wl,\$ORIGIN/../lib -Wl,-R -Wl,/usr/local/lib -Wl,--enable-new-dtags -o .libs/hello hello.o  -L/usr/local/lib ../lib/.libs/libtextstyle.so /usr/local/lib/libiconv.so -lm -ldl -lpthread -pthread -Wl,-rpath -Wl,/usr/local/lib
    ../lib/.libs/libtextstyle.so: undefined reference to `__builtin_clzll'
    collect2: ld returned 1 exit status
    make[4]: *** [hello] Error 1
    make[4]: Leaving directory `/home/test/gettext-0.20.1/libtextstyle/adhoc-tests'
    make[3]: *** [all-recursive] Error 1
    make[3]: Leaving directory `/home/test/gettext-0.20.1/libtextstyle'
    make[2]: *** [all] Error 2
    make[2]: Leaving directory `/home/test/gettext-0.20.1/libtextstyle'
    make[1]: *** [all-recursive] Error 1
    make[1]: Leaving directory `/home/test/gettext-0.20.1'
    make: *** [all] Error 2
    Failed to build GetText
    

    And:

    gettext-0.20.1$ grep -IR __builtin_clzll
    libtextstyle/lib/libxml/timsort.h:#define CLZ __builtin_clzll
    gettext-tools/gnulib-lib/libxml/timsort.h:#define CLZ __builtin_clzll
    gnulib-local/lib/libxml/timsort.h:#define CLZ __builtin_clzll
    

    And:

    gettext-0.20.1$ cat libtextstyle/lib/libxml/timsort.h
    ...
    
    #ifndef CLZ
    #ifdef __GNUC__
    #define CLZ __builtin_clzll
    #else
    
    static int clzll(uint64_t);
    
    /* adapted from Hacker's Delight */
    static int clzll(uint64_t x) {
      int n;
    
      if (x == 0) {
        return 64;
      }
    
      n = 0;
    
      if (x <= 0x00000000FFFFFFFFL) {
        n = n + 32;
        x = x << 32;
      }
    
      if (x <= 0x0000FFFFFFFFFFFFL) {
        n = n + 16;
        x = x << 16;
      }
    
      if (x <= 0x00FFFFFFFFFFFFFFL) {
        n = n + 8;
        x = x << 8;
      }
    
      if (x <= 0x0FFFFFFFFFFFFFFFL) {
        n = n + 4;
        x = x << 4;
      }
    
      if (x <= 0x3FFFFFFFFFFFFFFFL) {
        n = n + 2;
        x = x << 2;
      }
    
      if (x <= 0x7FFFFFFFFFFFFFFFL) {
        n = n + 1;
      }
    
      return n;
    }
    
    #define CLZ clzll
    #endif
    #endif
    

    It may be a good idea to perform an actual feature test for availability of __builtin_clzll.

    opened by noloader 5
  • Add quick sort manual tail-call optimization

    Add quick sort manual tail-call optimization

    Before, sorting evil.txt would have a massive 9,000+ stack size. Now it has a stack size of only 30 or so.

    We do this by, rather than making two recursive calls to quick sort, we only do a recursive call on the smaller half. For the larger half, we replace the left and right values in the current call and loop. This way, if a poor choice of pivot is made and, for example, we partition a block of size N into pieces 1 and N-2, we don't create a new stack entry for N-2, but instead reuse the current one.

    Fixes #62

    This also fixes a unary minus on an unsigned integer, which creates problems for some compilers (notably MSVC).

    opened by swenson 4
  • Small bitonic

    Small bitonic

    I have placed routines for sorting networks of length <= 16 in sort.h as well as the BITONIC_SORT routine. However, the BITONIC_SORT routine is no longer a "true" bitonic sort. It calls the optimal sorting networks for length <= 16 and BINARY_INSERTION_SORT for anything larger than that.

    opened by DrMarkS 4
  • Added bitonic sorter. Made bitonic sort the default routine for small…

    Added bitonic sorter. Made bitonic sort the default routine for small…

    … sort and modified all other sorting algorithms to use SMALL_SORT as their base case. The code base for bitonic sort is "autogenerated" from some python code that pulls optimal small sorting networks from https://pages.ripco.net/~jgamble/nw.html Because of this, and because the code got rather large it is a separate header file which is included by sort.h. I am seeing speedups of around 1.2X using bitonic sort of the default small sorting algorithm for anything with <= 16 elements. The default small sorter can be modified by adjusting the #defines SMALL_SORT and SMALL_SORT_BND

    opened by DrMarkS 3
  • Don't use 64-bit integers on 32-bit platforms

    Don't use 64-bit integers on 32-bit platforms

    It should probably work to globally replace

    • uint64_t with size_t and
    • int64_t with ptrdiff_t (or preferably size_t as well unless signed integers are really required)
    opened by nwellnhof 3
  • In-Place Timsort?

    In-Place Timsort?

    I was wondering if it would be possible to do an in-place Timsort. I was thinking something along the lines of doing the run finding stage at the beginning followed by an in-place mergesort.

    I've yet to write it up (I intend to), but I did some extensive tests using the sorts here against 1+ GB of data (same data for all tests - fetched once from /dev/urandom`). What I found, unsurprisingly, was that you can use multithreading to speed up the process (I suppose for large volumes you could even go hadoop-style Big Data). When multithreading, nested mergesorts were faster than any other sorts.

    What did surprise me, though, was that an in-place mergesort was the fastest. Timsort came second, and normal mergesort third, but still. It came to me, however, that the reason was due to cache locality.

    I was wondering though, if we could do the run finding from Timsort and then do an in-place mergesort, perhaps that would be the fastest? It could take advantage of cache locality AND pre-existing runs.

    potential research area 
    opened by haneefmubarak 2
Owner
Christopher Swenson
Christopher Swenson
A Template Engine for Modern C++

Inja is a template engine for modern C++, loosely inspired by jinja for python. It has an easy and yet powerful template syntax with all variables, lo

pantor 1.2k Nov 24, 2022
Windows kernel hacking framework, driver template, hypervisor and API written on C++

Windows kernel hacking framework, driver template, hypervisor and API written on C++

Александр 1.3k Nov 29, 2022
A universal type for non-type template parameters for C++20 or later.

uninttp A universal type for non-type template parameters for C++20 or later. Installation: uninttp (Universal Non-Type Template Parameters) is a head

null 16 Dec 24, 2021
Bolt is a C++ template library optimized for GPUs. Bolt provides high-performance library implementations for common algorithms such as scan, reduce, transform, and sort.

Bolt is a C++ template library optimized for heterogeneous computing. Bolt is designed to provide high-performance library implementations for common

null 358 Nov 14, 2022
Pool is C++17 memory pool template with different implementations(algorithms)

Object Pool Description Pool is C++17 object(memory) pool template with different implementations(algorithms) The classic object pool pattern is a sof

KoynovStas 1 Nov 18, 2022
This is a C plus plus coding template for Compitative programming. This template is very optimized for the Online Judgment

C-plusplus-compitative-Programming-Template Tech We Used C++ Features Easy to compile Easy debug facility Analysised and optimized base template Steps

Alan Binu 15 Jan 27, 2022
FSD-Template - A template UE4.25 project for BP modding.

FSD-Template Project generated by Archengius' UE4 Template Generator. Reflected C++ classes generated by CheatingMuppet & Archengius' UE4SS UHT Genera

null 16 Nov 22, 2022
OpenGL Template Engine - a C++ OpenGL graphics engine which aimed to be a simple startup template for 3D OpenGL projects.

OpenGL Template Engine is a C++ OpenGL graphics engine which aimed to be a simple startup template for 3D OpenGL projects. This is the template I personally use for my own projects and provides me with the general OpenGL 3D render setup with model import and UI.

Marcus Nesse Madland 2 May 16, 2022
A drop-in replacement for std::list with 293% faster insertion, 57% faster erasure, 17% faster iteration and 77% faster sorting on average. 20-24% speed increase in use-case testing.

plf_list A drop-in replacement for std::list with (on average): 293% faster insertion 57% faster erasure 17% faster iteration 77% faster sorting 70% f

Matt Bentley 117 Nov 8, 2022
libsais is a library for linear time suffix array and burrows wheeler transform construction based on induced sorting algorithm.

libsais libsais is a library for fast (see Benchmarks below) linear time suffix array and Burrows-Wheeler transform construction based on induced sort

Ilya Grebnov 113 Nov 29, 2022
Fast, differentiable sorting and ranking in PyTorch

Torchsort Fast, differentiable sorting and ranking in PyTorch. Pure PyTorch implementation of Fast Differentiable Sorting and Ranking (Blondel et al.)

Teddy Koker 651 Nov 25, 2022
Sorting data on a stack with a limited set of instructions

push_swap Sorting data on a stack with a limited set of instructions Final Score: 125/100 Average 100 set speed: 575 Actions (Minimum 490 / Maximum 61

null 14 Nov 22, 2022
Repository containing all the sorting algorithms.

Sorting-Algorithms Repository containing all the sorting algorithms. Bubble Sort Bubble Sort is a simple algorithm which is used to sort a given set o

Rimpi Rani Baruah 4 Nov 15, 2022
Quick sort is a highly efficient sorting algorithm and is based on partitioning of array of data into smaller arrays.

Quick sort is a highly efficient sorting algorithm and is based on partitioning of array of data into smaller arrays. A large array is partitioned into two arrays one of which holds values smaller than the specified value, say pivot, based on which the partition is made and another array holds values greater than the pivot value.

Lakshan Sandanayaka 2 Sep 21, 2021
A drop-in replacement for std::list with 293% faster insertion, 57% faster erasure, 17% faster iteration and 77% faster sorting on average. 20-24% speed increase in use-case testing.

plf::list A drop-in replacement for std::list with (on average): 293% faster insertion 57% faster erasure 17% faster iteration 77% faster sorting 70%

Matt Bentley 117 Nov 8, 2022
In this project, we implemented twelve different sorting algorithms.

C - Sorting algorithms & Big O In this project, we implemented twelve different sorting algorithms. Tests tests: Folder of test files. Provided by Alx

Nicholas M Mwanza 1 Oct 26, 2021
Some basic sorting algos

Sorting-Algos Some basic sorting algos HacktoberFest 2021 This repository consists of mezzo-level projects that undertake a simple task and perform it

Manthan Ghasadiya 9 Oct 13, 2022
Sorting algorithms & Big O

[![Contributors][contributors-shield]][contributors-url] [![Forks][forks-shield]][forks-url] [![Stargazers][stars-shield]][stars-url] Sorting algorith

Joseph Mahiuha 1 Nov 7, 2021
A redis module, similar to redis zset, but you can set multiple scores for each member to support multi-dimensional sorting

TairZset: Support multi-score sorting zset Introduction Chinese TairZset is a data structure developed based on the redis module. Compared with the na

Alibaba 58 Nov 21, 2022
Sorting algorithms & related tools for C++14

It would be nice if only one or two of the sorting methods would dominate all of the others, regardless of application or the computer being used. But

Morwenn 526 Nov 27, 2022