A C++ implementation of timsort

Overview

Latest Release Conan Package Pitchfork Layout

TimSort

A C++ implementation of TimSort, an O(n log n) stable sorting algorithm, ported from Python's and OpenJDK's.

See also the following links for a detailed description of TimSort:

This library requires at least C++11. If you need a C++98 version, you can check the 1.x.y branch of this repository.

According to the benchmarks, gfx::timsort is slower than std::sort() on randomized sequences, but faster on partially-sorted ones. It can be used as a drop-in replacement for std::stable_sort, with the difference that it can't fallback to a O(n log² n) algorithm when there isn't enough extra heap memory available.

gfx::timsort also has a few additional features and guarantees compared to std::stable_sort:

  • It can take a projection function after the comparison function. The support is a bit rougher than in the linked article or the C++20 standard library: unless std::invoke is available, only instances of types callable with parentheses can be used, there is no support for pointer to members.
  • It can be passed a range instead of a pair of iterators, in which case it will sort the whole range.
  • This implementation of timsort notably avoids using the postfix ++ or -- operators: only their prefix equivalents are used, which means that timsort will work even if the postfix operators are not present or return an incompatible type such as void.

Merging sorted ranges efficiently is an important part of the TimSort algorithm. This library exposes gfx::timmerge in the public API, a drop-in replacement for std::inplace_merge with the difference that it can't fallback to a O(n log n) algorithm when there isn't enough extra heap memory available. According to the benchmarks, gfx::timmerge is slower than std::inplace_merge on heavily/randomly overlapping subranges of simple elements, but it is faster for complex elements such as std::string and on sparsely overlapping subranges.

Just like gfx::timsort, gfx::timmerge can take a projection function and avoids using the postfix ++ and -- operators.

The list of available signatures is as follows (in namespace gfx):

// timsort

template <
    typename RandomAccessIterator,
    typename Compare = /* see below (1) */,
    typename Projection = /* see below (2) */
>
void timsort(RandomAccessIterator const first, RandomAccessIterator const last,
             Compare compare, Projection projection);

template <
    typename RandomAccessRange,
    typename Compare = /* see below (1) */,
    typename Projection = /* see below (2) */
>
void timsort(RandomAccessRange &range, Compare compare, Projection projection);

// timmerge

template <
    typename RandomAccessIterator,
    typename Compare = /* see below (1) */,
    typename Projection = /* see below (2) */
>
void timmerge(RandomAccessIterator first, RandomAccessIterator middle,
              RandomAccessIterator last, Compare compare, Projection projection);

In the signatures above:

  • (1) std::less specialization for the value_type of the passed range or iterator.
  • (2) Custom class equivalent to std::identity.

EXAMPLE

Example of using timsort with a comparison function and a projection function to sort a vector of strings by length:

#include <string>
#include <vector>
#include <gfx/timsort.hpp>

size_t len(const std::string& str) {
    return str.size();
}

// Sort a vector of strings by length
std::vector<std::string> collection = { /* ... */ };
gfx::timsort(collection, std::less<std::string>{}, &len);

INSTALLATION & COMPATIBILITY

Ubuntu builds status Windows builds status MacOS builds status

The library has been tested with the following compilers:

  • GCC 5.5
  • Clang 6
  • MSVC 2017

It should also work with more recent compilers, and most likely with some older compilers too. We used to guarantee support as far back as Clang 3.8, but the new continuous integration environment doesn't go that far back.

The library can be installed on the system via CMake with the following commands:

cmake -H. -Bbuild -DCMAKE_BUILD_TYPE=Release
cd build
make install

Alternatively the library is also available on conan-center-index and can be installed in your local Conan cache via the following command:

conan install timsort/2.1.0

DIAGNOSTICS & INFORMATION

The following configuration macros allow gfx::timsort and gfx::timmerge to emit diagnostics, which can be helpful to diagnose issues:

  • Defining GFX_TIMSORT_ENABLE_ASSERT inserts assertions in key locations in the algorithm to avoid logic errors.
  • Defining GFX_TIMSORT_ENABLE_AUDIT inserts assertions that verify pre- and postconditions. These verifications can slow the algorithm down significantly. Enable the audits only while testing or debugging.
  • Defining GFX_TIMSORT_ENABLE_LOG inserts logs in key locations, which allow to follow more closely the flow of the algorithm.

cpp-TimSort follows semantic versioning and provides the following macros to retrieve the current major, minor and patch versions:

GFX_TIMSORT_VERSION_MAJOR
GFX_TIMSORT_VERSION_MINOR
GFX_TIMSORT_VERSION_PATCH

TESTS

The tests are written with Catch2 and can be compiled with CMake and run through CTest.

When using the project's main CMakeLists.txt, the CMake variable BUILD_TESTING is ON by default unless the project is included as a subdirectory. The following CMake variables are available to change the way the tests are built with CMake:

  • GFX_TIMSORT_USE_VALGRIND: if ON, the tests will be run through Valgrind (OFF by default)
  • GFX_TIMSORT_SANITIZE: this variable takes a comma-separated list of sanitizers options to run the tests (empty by default)

BENCHMARKS

Benchmarks are available in the benchmarks subdirectory, and can be constructed directly by passing BUILD_BENCHMARKS=ON variable to CMake during the configuration step.

Example bench_sort output (timing scale: sec.):

c++ -v
Apple LLVM version 7.0.0 (clang-700.0.72)
Target: x86_64-apple-darwin14.5.0
Thread model: posix
c++ -I. -Wall -Wextra -g  -DNDEBUG -O2 -std=c++11 example/bench.cpp -o .bin/bench
./.bin/bench
RANDOMIZED SEQUENCE
[int]
size	100000
std::sort        0.695253
std::stable_sort 0.868916
timsort          1.255825
[std::string]
size	100000
std::sort        3.438217
std::stable_sort 4.122629
timsort          5.791845
REVERSED SEQUENCE
[int]
size	100000
std::sort        0.045461
std::stable_sort 0.575431
timsort          0.019139
[std::string]
size	100000
std::sort        0.586707
std::stable_sort 2.715778
timsort          0.345099
SORTED SEQUENCE
[int]
size	100000
std::sort        0.021876
std::stable_sort 0.087993
timsort          0.008042
[std::string]
size	100000
std::sort        0.402458
std::stable_sort 2.436326
timsort          0.298639

Example bench_merge output (timing scale: milliseconds; omitted detailed results for different middle iterator positions, reformatted to improve readability):

c++ -v
Using built-in specs.
...
Target: x86_64-pc-linux-gnu
...
gcc version 10.2.0 (GCC)
c++ -I ../include -Wall -Wextra -g -DNDEBUG -O2 -std=c++11 bench_merge.cpp -o bench_merge
./bench_merge
size	100000
element type\algorithm:      	std::inplace_merge	timmerge
RANDOMIZED SEQUENCE
[int] approx. average        	 33.404430        	 37.047990
[std::string] approx. average	324.964249        	210.297207
REVERSED SEQUENCE
[int] approx. average        	 11.441404        	  4.017482
[std::string] approx. average	305.649503        	114.773898
SORTED SEQUENCE
[int] approx. average        	  4.291098        	  0.105571
[std::string] approx. average	158.238114        	  0.273858

Detailed bench_merge results for different middle iterator positions can be found at https://github.com/timsort/cpp-TimSort/wiki/Benchmark-results

Issues
  • Expose gfx::timmerge in the public interface

    Expose gfx::timmerge in the public interface

    I've implemented a verification patch on top of this pull request: 0001-Verify-is_sorted-audit-off-by-default.patch.txt. But I am not sure it is worth commiting. If it is, please let me know whether to push a second commit into this pull request or create a separate future pull request.

    opened by vedgy 12
  • vector of pairs with reference

    vector of pairs with reference

    Hi, I seem to be having an issue with pairs. I start with a vector with 4 elements (1,a) (3,c) (4,d) (2,b) But then when I try to sort them I get 'd' being repeated and 'b' is gone (1,a) (2,d) (3,c) (4,d) This doesn't happen when the second value in the pair is a string (I want a string&). Here is the code I use

    int main() {
      typedef std::pair<int, std::string&> KV;
      std::vector<KV> keys;
      keys.reserve(8);
      std::vector<std::string> vals;
      vals.reserve(8);
      vals.push_back("a");
      keys.push_back(static_cast<KV>(KV(1, vals.back())));
      vals.push_back("c");
      keys.push_back(static_cast<KV>(KV(3, vals.back())));
      vals.push_back("d");
      keys.push_back(static_cast<KV>(KV(4, vals.back())));
      vals.push_back("b");
      keys.push_back(static_cast<KV>(KV(2, vals.back())));
      std::cout << std::endl;
      for (auto s : keys) {
        std::cout << "(" << s.first << "," << s.second << ") ";
      }
      std::cout << std::endl;
      timsort(keys.begin(),
                     keys.end(),
                     [](const KV& a, const KV& b) {
                       return a.first < b.first;
                     });
      for (auto s : keys) {
        std::cout << "(" << s.first << "," << s.second << ") ";
      }
      std::cout << std::endl;
      return 0;
    }
    
    bug 
    opened by matanmarkind 10
  • Recruiting new maintainers

    Recruiting new maintainers

    Because I am not using C++ in production anymore, I'd like to transfer this project to another organization with new maintainers.

    However, there're security incidents these days in which attackers inject malware to modules by new maintainers:

    • https://blog.npmjs.org/post/180565383195/details-about-the-event-stream-incident

    So I'd like to review the org to be transferred carefully, so please tell me the details.

    opened by gfx 8
  • Proposal: allow use of

    Proposal: allow use of "tristate" compare functor

    Background:

    Like any well-behaved C++ template algorithm, gfx::timsort takes a comparison functor in the form of std::less(a, b) -- i.e. take two objects and return a bool indicating if a<b

    For most of the algorithm this works well, the exception being gallopLeft() with its calls to gt()/le() These have to be implemented by two calls to the comparison functor. For a simple value type like int the compiler will easily optimize this into a single branch. However for something more complicated (and more expensive to compare!) like std::string this can end up doing two comparisons of the same elements.

    For algorithms like this a qsort()-like comparison functor (i.e. strcmp()/memcmp()/std::string::compare()/etc) is actually more efficient since it gives you more than a simple bool result. Since the comparison functor gives you more information you can call it fewer times.

    Proposal:

    It's good that the standard interface of gfx::timsort matches std::stable_sort and takes a std::less-based function. However, since your code already abstracts these comparisons via the template <typename Value, typename LessFunction> class Compare helper class, why not provide an alternate function that takes a qsort-like functor instead? Just as a proof of concept I tried this:

    diff --git a/timsort.hpp b/timsort.hpp
    index 77dd7a2..e6230c0 100644
    --- a/timsort.hpp
    +++ b/timsort.hpp
    @@ -126,12 +126,46 @@ template <typename Value, typename LessFunction> class Compare {
         func_type less_;
     };
    
    -template <typename RandomAccessIterator, typename LessFunction> class TimSort {
    +template <typename Value, typename TristateFunction> class CompareFromTristate {
    +  public:
    +    typedef Value value_type;
    +    typedef TristateFunction func_type;
    +
    +    CompareFromTristate(TristateFunction f) : tristate_(f) {
    +    }
    +    CompareFromTristate(const CompareFromTristate<value_type, func_type> &other) : tristate_(other.tristate_) {
    +    }
    +
    +    bool lt(value_type x, value_type y) {
    +        return tristate_(x, y) < 0;
    +    }
    +    bool le(value_type x, value_type y) {
    +        return tristate_(x, y) <= 0;
    +    }
    +    bool gt(value_type x, value_type y) {
    +        return tristate_(x, y) > 0;
    +    }
    +    bool ge(value_type x, value_type y) {
    +        return tristate_(x, y) >= 0;
    +    }
    +
    +    // Allow this to be passed to APIs like std::lower_bound
    +    CompareFromTristate &less_function() {
    +        return *this;
    +    }
    +    bool operator()(value_type x, value_type y) {
    +        return lt(x, y);
    +    }
    +
    +  private:
    +    func_type tristate_;
    +};
    +
    +template <typename RandomAccessIterator, typename compare_t> class TimSort {
         typedef RandomAccessIterator iter_t;
         typedef typename std::iterator_traits<iter_t>::value_type value_t;
         typedef typename std::iterator_traits<iter_t>::reference ref_t;
         typedef typename std::iterator_traits<iter_t>::difference_type diff_t;
    -    typedef Compare<const value_t &, LessFunction> compare_t;
    
         static const int MIN_MERGE = 32;
    
    @@ -245,7 +279,7 @@ template <typename RandomAccessIterator, typename LessFunction> class TimSort {
             return n + r;
         }
    
    -    TimSort(compare_t c) : comp_(c), minGallop_(MIN_GALLOP) {
    +    TimSort(const compare_t& c) : comp_(c), minGallop_(MIN_GALLOP) {
         }
    
         void pushRun(iter_t const runBase, diff_t const runLen) {
    @@ -661,6 +695,7 @@ template <typename RandomAccessIterator, typename LessFunction> class TimSort {
    
         // the only interface is the friend timsort() function
         template <typename IterT, typename LessT> friend void timsort(IterT first, IterT last, LessT c);
    +    template <typename IterT, typename TristateT> friend void timsort_tristate(IterT first, IterT last, TristateT c);
     };
    
     template <typename RandomAccessIterator>
    @@ -671,7 +706,18 @@ inline void timsort(RandomAccessIterator const first, RandomAccessIterator const
    
     template <typename RandomAccessIterator, typename LessFunction>
     inline void timsort(RandomAccessIterator const first, RandomAccessIterator const last, LessFunction compare) {
    -    TimSort<RandomAccessIterator, LessFunction>::sort(first, last, compare);
    +    typedef typename std::iterator_traits<RandomAccessIterator>::value_type value_type;
    +    typedef typename gfx::Compare<const value_type &, LessFunction> compare_type;
    +    compare_type const compare_from_less(compare);
    +    TimSort<RandomAccessIterator, compare_type>::sort(first, last, compare_from_less);
    +}
    +
    +template <typename RandomAccessIterator, typename TristateFunction>
    +inline void timsort_tristate(RandomAccessIterator const first, RandomAccessIterator const last, TristateFunction compare_tristate) {
    +    typedef typename std::iterator_traits<RandomAccessIterator>::value_type value_type;
    +    typedef typename gfx::CompareFromTristate<const value_type &, TristateFunction> compare_type;
    +    compare_type const compare_from_tristate(compare_tristate);
    +    TimSort<RandomAccessIterator, compare_type>::sort(first, last, compare_from_tristate);
     }
    
     } // namespace gfx
    

    On a quick test of 10^7 integers from rand() (so lots of duplicates but no sorted runs in the input) this caused about a 1.1% drop in the number of comparisons needed... so for an "expensive to compare, cheap to move" data type you might get a ~1% improvement in overall sort speed (although I didn't do extensive benchmarking of this). Not a huge improvement, but it's something.

    enhancement design 
    opened by mitchblank 8
  • Valgrind reports issues when trying to sort std::deque

    Valgrind reports issues when trying to sort std::deque

    I am still using your timsort implementation in a project, but recently noticed that Valgrind sometimes complained when trying to sort an std::deque with timsort. Here is my reduced test case:

    std::deque<double> collection(35);
    std::iota(std::begin(collection), std::end(collection), -47.0);
    std::mt19937 engine(std::time(nullptr));
    std::shuffle(std::begin(collection), std::end(collection), engine);
    
    gfx::timsort(std::begin(collection), std::end(collection));
    assert(std::is_sorted(std::begin(collection), std::end(collection)));
    

    The actual test case is slightly more complicated than that since I am using Catch, but nothing really different. Unfortunately, Valgrind's error is way too big for me to include it here, but you can find the full error log here. Now, note that it is a log obtained with the test compiled by g++ in release mode, so it lacks some information about where exactly the error happens.

    I will try to find a sequence of numbers that makes Valgrind report errors for sure, and edit this issue when I have one. Currently I don't really have many ideas about what can cause the error. I got similar errors when trying to sort an std::deque with another sorting algorithm: apparently, std::deque is less cool than std::vector when it comes to accessing out-of-bound iterators (not even dereferencing them), so that might be the culprit.


    Update: some things in the original testcase such as the shuffling or the fact that I used double were a bit dumb/unneeded. I managed to find a sequence of integers that always triggers the Valgrind errors and cleaned the minimal testcase a bit:

    std::deque<int> collection = {
        15, 7, 16, 20, 25, 28, 13, 27, 34, 24, 19, 1,
        6, 30, 32, 29, 10, 9, 3, 31, 21, 26, 8, 2, 22,
        14, 4, 12, 5, 0, 23, 33, 11, 17, 18
    };
    
    gfx::timsort(std::begin(collection), std::end(collection));
    assert(std::is_sorted(std::begin(collection), std::end(collection)));
    

    Now that we have an initial sequence that always triggers the Valgrind error, here are the corresponding error logs for the program compiler in debug mode with clang++ and g++. Most of them seem to come from mergeHi but I also remember having seen errors coming from mergeAt and gallopRight in older logs.

    bug 
    opened by Morwenn 6
  • Remove

    Remove "Compare" class; just use LessFunc directly

    As discussed in issue #24, the implemetnations of Compare::le() and Compare::gt() were more expensive than they needed to be. As long as LessFunction provides the strict weak ordering that std::sort() requires, we can safely synthesize all four lt/gt/le/gt options with a single call.

    While the simple fix would be to just change their definitions, I went a bit further and removed the Compare class entirely. I don't really think that it provides much, but it does take space in the TimSort object whether it's needed or not -- this could be addressed by some application of the Empty Base Class optimization, but it's a bit awkward)

    I think it was probably done with the intention that having the comparator as a member of this for the core algorithm would be better because it would cut down on the number of function parameters. However, the normal way that the STL's sorting algorithms are implemented is to just pass the comparator down as a parameter to the functions so that the optimizer can just remove it when it's a pure functor. I think if anything forcing it to be an instance member makes this harder.

    I looked at some other tricks to add some syntactic sugar to keep the gt()/lt()/le()/ge() function names (either with some namespaced inline functions or even just with macros) but it just seemed ugly. So instead I just replaced all of the comaprisons with direct calls to the comparator functor.

    opened by mitchblank 5
  • Make timsort work with move-only types.

    Make timsort work with move-only types.

    In C++11, std::sort is supposed to be able to sort collections of mve-only types. This patch adds support for move-only types to timsort. Here is what it changes:

    • More GFX_TIMSORT_MOVE instead of simple copy-assignments.
    • Introduce GFX_TIMSORT_MOVE_RANGE to use the C++11 iterators version of std::move if required and us std::copy instead.
    • Introduce GFX_TIMSORT_MOVE_BACKWARD to use either std::move_backward or std::copy_backward.
    • I also fixed a mistake in an assert message, but that's trivial :p

    However, here is the big deal: I am not 100% sure that the algorithm never reads from a moved-from value. I read the algorithm and checked everywhere that whenever a value was moved-from the algorithm made sure that the iterators point to other non-moved-from values before being dereferenced, but this method has its limits. I also used a derived version of the benchmarks from my sorting library with a move-only type that keeps track of whether it has been moved-from and asserts if we try to read from it if it has been moved-from. Here is the type I used:

    template<typename T>
    struct move_only
    {
        // Constructors
    
        move_only() = delete;
        move_only(const move_only&) = delete;
    
        move_only(const T& value):
            can_read(true),
            value(value)
        {}
    
        move_only(move_only&& other):
            can_read(true),
            value(std::move(other.value))
        {
            if (not std::exchange(other.can_read, false))
            {
                std::cerr << "illegal read from a moved-from value\n";
                assert(false);
            }
        }
    
        // Assignment operators
    
        move_only& operator=(const move_only&) = delete;
    
        auto operator=(move_only&& other)
            -> move_only&
        {
            if (&other != this)
            {
                if (not std::exchange(other.can_read, false))
                {
                    std::cerr << "illegal read from a moved-from value\n";
                    assert(false);
                }
                can_read = true;
                value = std::move(other.value);
            }
            return *this;
        }
    
        // Whether the value can be read
        bool can_read;
        // Actual value
        T value;
    };
    

    The benchmark runs several times on arrays of one million element with several data patterns to cover common and not-so-common cases, including arrays filled with random values. After every call to a sort method, it also checks that the resulting array is indeed sorted.

    So, while I couldn't be 100% sure that it never reads from moved-from value (it would require a formal analysis), check the algorithm by hand and running all those tests make me pretty confident that it is safe :)

    I didn't write a proper test case though since all of your tests seem to work with C++03, and such a test would explicitly require C++11 support to run and I don't know whether it would integrate seamlessly in the testsuite.

    opened by Morwenn 5
  • Duplicate records after sort

    Duplicate records after sort

    I am investigating why after sorting some entries in my array get overwritten with duplicates. Basically, if I start out with an array of size 500k and no duplicates (well there maybe duplicates from a key perspective), it appears that after running I end up with the same array size, but some entries have been duplicated and others are gone.

    I caught the problem with a sanity check where I ran through the array checking that each element was greater than or equal to its predecessor. I have tried running in debug mode and did not see any assert failures.

    My use case is a little complicated, but not terribly. I have an array of plain old data objects and an array of ints which are to be a sorted index of the data in the data array. The int array is filled with the index of each data object in the main array. Using a custom compare I would like to sort the int array.

    It may still turn out to be a bug in my code, but the fact that it works with std::sort, std::stable_sort, and a version of SmoothSort made me think it worth asking if you think this may be a bug in the timsort impl.

    Below is some psuedo code.

    // this seems to work
    std::sort( index_array.begin(), index_array.end(), compareHelper(data_array));  
    
    // this seems to consistently run into problems
    gfx::timsort( index_array.begin(), index_array.end(), compareHelper(data_array)); 
    
    -------------------------------------
    class data {
        int  col1;
        int  col 2; 
        int  col 3;
    }
    
    std::vector< data >  data_array;
    std::vector< uint32_t >      index_array;
    
    insert_data ( data d )
    {
           data_array.push_back(d);
           index_array.push_back( data_array.size()-1);
    }
    
    struct  compareHelper
    {
       const std::vector< data >& mArray;
       compareHelper( const data_array& aArray )
          :  mArray( aArray) {};
    
       bool operator()(const uint32_t x, const uint32_t y) const
       {
          const data& a = mArray[x];
          const data& b = mArray[y];
    
          if (  a.col1 == b.col1 )
          {
             return a.col2 < b.col2;
          }
          return a.col1 < b.col1;
       };
    }
    
    bug 
    opened by milonesmith 5
  • Incorrect minRunLength() computing minRun to be too small

    Incorrect minRunLength() computing minRun to be too small

    The following line seems incorrect.

    Since MIN_MERGE is 32, the loop would end with n containing the 5 most significant bits instead of 6 most significant bits. Therefore, the function would return [0, 32], which is less than what Python is doing here

    The Python detailed description of Timsort in the readme here also corresponds to returning a size of [32, 64] when possible.

    opened by balajimalanji 4
  • Remove

    Remove "break_outer" variable

    Yes, "goto considered harmful" and all that. However, it really is the most direct way of breaking out of multiple levels of loops.

    Using a temporary boolean like this code had been doing actually requires more work at the assembly level. An if statement that just has a break or goto as its contents will likely compile down to a single branch instruction. However, if you have to do anything else in the body, you now have a code block that needs to be branched around in the false case. So in pseudo-assembly instead of:

      beq FOO
    

    you could have:

      bne FALSE
      mov %r1, TRUE
      jmp FOO
    FALSE:
    

    It's possible that the compiler's flowgraph optimizer will manage to pull out the temporary variable and do this transformation itself... but it's really asking a lot. I don't think it's any less readable to just change to a goto and express directly where we want controlflow to go

    opened by mitchblank 4
  • C++03/move-compliance checks

    C++03/move-compliance checks

    Additional checks on compiler move/type-trait compliance, check for move capability of element type (these checks are compile-time and should be optimized out by compiler), fallback to copy for non-move-capable types.

    opened by mattreecebentley 3
  • namespace gfx

    namespace gfx

    I just realized that the namespace gfx has two million hits on GitHub alone: it is most likely a common abbreviation for "graphics" and it's used in big engines (it notably appears in the Fuchsia source code).

    I'm not sure what to do about this. We could probably play it safe and use namespace timsort instead, which apparently doesn't exist. That would give a timsort::timsort(...) function, a timsort/timsort.hpp header to include and a timsort::timsort CMake target. It doesn't feel like the coolest solution, but it's probably pragmatic enough.

    design 
    opened by Morwenn 2
  • Replace std::vector for better speed; also fix clang C++11

    Replace std::vector for better speed; also fix clang C++11

    First, sorry that this PR contains more than one thing. I have a few other changes I want to propose for timsort and I'll try to keep those in standalone PRs. Because of the need for C++11 support to test this I ended up having to fix that bug as part of this performance work. I can break the libc++ changes out into a different PR if you'd rather merge that as a separate PR first.

    In my application I need to sort a vector of pointers. I was profiling timsort and was surprised at how much time was being spent in copy_to_tmp() This method was implemented as:

    tmp_.clear();
    tmp_.reserve(len);
    GFX_TIMSORT_MOVE_RANGE(begin, begin + len, std::back_inserter(tmp_));
    

    Unfortunately this performs badly on simple types like pointers. The GFX_TIMSORT_MOVE_RANGE() macro itself can reduce to a single memmove() but using std::back_inserter (which is required for move-only types) breaks this optimization. Instead of a nice SSE-optimized memory copy you end up with an element-by-element loop with a vector capacity check each time.

    As an experiment I did some metaprogramming so that TriviallyCopyable types would just do:

    tmp_.assign(begin, begin + len);
    

    This basically fixed the problem, but it's still not quite perfect, since the non-trivial case still is doing extra capacity checks that aren't required. I did a more aggressive fix that replaced the std::vector use entirely with a custom data structure (since we don't really need a general purpose vector here, just a managed array) which bought another couple percent in speed.

    First I had to make two preparatory changes, though:

    1. The C++11 support was broken on libc++ (which is the default STL for most clang installs, including Xcode) The changes @mattreecebentley made to auto-detect C++11 were mostly good but they specifically rejected anything with _LIBCPP_VERSION set. Not sure why.

    Rather than one large binary expression I instead used a larger (but hopefully easier to understand) ifdef decision tree. This can also be overridden on the command-line with -DGFX_TIMSORT_USE_CXX11=[0|1] As before we default to using C++11 constructs unless we see some evidence that it won't work. However we now let modern versions of clang/libc++ pass.

    1. The parts of the TimSort class that don't very based on LessFunction are now in their own set of classes such as TimSortState

    This is partially just a cleanup I needed to make some template metaprogramming less gross. However, it's a good idea in any case. It's not unusual for a program to need to sort a type of data multiple ways, which means expanding the TimSort<RandomAccessIterator,LessFunction> template multiple times. With this change, the compiler can reuse the expansion of TimSortState<RandomAccessIterator> between them. If nothing else, this should compile faster.

    Now with those things out of the way, I could get to the meat of the change: replacing the std::vector<value_t> tmp_; with a custom TimSortMergeSpace<> type.

    This class allocates itself like a vector, but the only "setter" is a move_in() method that replaces its contents via move construction (similar to what the std::back_inserter loop was doing) We don't construct elements before they're needed (even if we allocated them) so it will work even for non-default-constructable types. The move-loop is faster than before since we don't need to re-check for capacity at every insertion.

    However, on C++11 we do even better: we use template-specialization to provide an alternate implementation of this data type for types that pass std::is_trivially_copyable<>. The big advantage is that we can just use std::memcpy() to refill the merge buffer. The code is also simpler in general since we don't need to worry about construction/destruction of the buffer elements.

    Since a lot of the overall cost of the timsort algorithm is spent merging, making this data structure as fast as possible is important. This change makes sorting randomized sequences about 10% faster when working with trivially-copyable types.

    While I was there I also replaced the std::vector<run> pending_ with my own TimSortRunStack<> type. This doesn't have the same importance for performance, but it's another place where we don't really need the full STL vector support... just a simple resizable stack. Since I was replacing one vector I thought it was more consistent to just replace both. This also removes the header dependency on <vector>

    RESULTS: "make bench" on Xcode 10.1, Mac Pro:

    • RANDOMIZED SEQUENCE [int] size=100K Before: 0.851 After: 0.745

    • RANDOMIZED SEQUENCE [std::string] size=100K Before: 5.389 After: 3.735

    The improvement with int is due to the vector replacement. The bigger improvement with std::string is making C++11 work with the libc++ STL so that move optimizations get applied.

    opened by mitchblank 14
Releases(v2.1.0)
  • v2.1.0(Jan 18, 2021)

    This release is mostly the work of @vedgy, thanks a lot to him! It brings the following changes:

    • New: a new function, gfx::timmerge is now available. It implements the merge algorithm used by timsort, and can be used as a drop-in replacement for std::inplace_merge. The most notable difference with the standard library function is that it can't fallback to an O(n log n) merge algorithm when no extra memory is available. Just like gfx::timsort, it supports projections as well as iterators that don't implement postfix ++ and -- iterators.
    • New: a new configuration macro, GFX_TIMSORT_ENABLE_AUDIT, can be defined to ask gfx::timsort and gfx::timmerge to perform more expensive checks than the ones performed when defining GFX_TIMSORT_ENABLE_ASSERT. These audits are disabled by default and only meant for debugging purposes since they can slow down the algorithms significantly.
    • Tests and benchmarks were completed to include the new timmerge function.
    • The project's GitHub wiki now contains a page with additional benchmark results.
    Source code(tar.gz)
    Source code(zip)
  • v1.3.0(Jan 18, 2021)

    This release is mostly the work of @vedgy, thanks a lot to him! It brings the following new features:

    • A new function, gfx::timmerge is now available. It implements the merge algorithm used by timsort, and can be used as a drop-in replacement for std::inplace_merge. The most notable difference with the standard library function is that it can't fallback to an O(n log n) merge algorithm when no extra memory is available. Just like gfx::timsort, it supports projections as well as iterators that don't implement postfix ++ and -- iterators.
    • A new configuration macro, GFX_TIMSORT_ENABLE_AUDIT, can be defined to ask gfx::timsort and gfx::timmerge to perform more expensive checks than the ones performed when defining GFX_TIMSORT_ENABLE_ASSERT. These audits are disabled by default and only meant for debugging purposes since they can slow down the algorithms significantly.
    Source code(tar.gz)
    Source code(zip)
  • v2.0.2(Jan 12, 2021)

    Minor release, mostly motivated by tooling changes:

    • Made functions TimSort::gallopLeft() and TimSort::gallopRight() static (#35, thanks @vedgy).
    • Changed the Catch2 branch downloaded by the test suite from master to v2.x - the project's master branch doesn't exist anymore (#34, thanks @vedgy).
    • Moved the continuous integration from Travis CI to GitHub Actions. This change means that the oldest tested compilers are a bit more recent than they used to be because of the differences between the old and new virtual environments - gfx::timsort should still run perfectly on older compilers nonetheless. The combinations of configurations tested in continuous integration are different but roughly equivalent to the old ones in terms of coverage.
    Source code(tar.gz)
    Source code(zip)
  • v1.2.2(Jan 12, 2021)

    Minor release, mostly motivated by tooling changes:

    • Made functions TimSort::gallopLeft() and TimSort::gallopRight() static (#35, thanks @vedgy).
    • Moved the continuous integration from Travis CI to GitHub Actions. This change means that the oldest tested compilers are a bit more recent than they used to be because of the differences between the old and new virtual environments - gfx::timsort should still run perfectly on older compilers nonetheless. The combinations of configurations tested in continuous integration are different but roughly equivalent to the old ones in terms of coverage.
    • Added instructions to the README to install the library via Conan.
    Source code(tar.gz)
    Source code(zip)
  • v2.0.1(Oct 17, 2020)

    Minor release in order to make the small changes made up to now available:

    • Changed loop condition in minRunLength() to match the one in the original CPython implementation (issue #33, thanks @weeyizhi).
    • Fix the project version number in CMakelists.txt.
    • Change the Conan badge in the README to point to https://conan.io/.
    Source code(tar.gz)
    Source code(zip)
  • v1.2.1(Oct 17, 2020)

    Minor release in order to make the small changes made up to now available:

    • Changed loop condition in minRunLength() to match the one in the original CPython implementation (issue #33, thanks @weeyizhi).
    • Fix the project version number in CMakelists.txt.
    • Add link and instructions to get the 1.x version with Conan.
    Source code(tar.gz)
    Source code(zip)
  • v2.0.0(Dec 2, 2019)

    cpp-TimSort release 2.0.0 is C++11-only: some amount of C++03 support will be maintained in the branch 1.x.y if needed, but most relevant development will now happen in the 2.x.y branch. This versions branches off 1.2.0, so every item in the changelog below describes changes that happened since this specific release.

    C++11 doesn't bring many new features to the table for gfx::timsort, so most of the changes are linked to tooling:

    • New: gfx::timsort now accepts range arguments to sort a whole collection.
    • Move semantics are used unconditionally, which means that GFX_TIMSORT_USE_STD_MOVE doesn't do anything anymore, and that the code should be easier to read.
    • The tests now use Catch 2.x instead of 1.x, which allows for better tooling around them.
    • The tests in Debug mode are now run through Valgrind on OSX.
    • Tests have been added for generalized callables support.
    Source code(tar.gz)
    Source code(zip)
  • v1.2.0(Nov 29, 2019)

    Apparently I was lying about the previous release being the last C++03 one before C++11 for here is another one!

    This release includes the following changes:

    • When std::invoke is available (C++17 and later), it is used to invoke the comparison and projection functions. Therefore the following code is now valid:
      struct Person {
          std::string name;
          int age;
      };
      
      // Sort a vector of persons by age
      std::vector<Person> vec;
      // ... fill vec ...
      gfx::timsort(vec.begin(), vec.end(), std::less{}, &Person::age);
      
    • gfx::timsort now accepts iterators that don't have a conforming postfix operator++ or operator--, or that simply don't provide these operations. Only the prefix versions of these operators are used by the algorithm.
    • <windows.h> can now be included alongside <gfx/timsort.hpp> without triggering issues because of the min() and max() macros defined by the former.
    Source code(tar.gz)
    Source code(zip)
  • v1.1.0(Nov 17, 2019)

    This is the last proper C++03 release before moving to C++11: from now on development will mostly target the 2.x version. That said the 1.x branch won't go totally unmaintained either: bugs will still be fixed and new features can still be backported if they are easy enough to implement in C++03.

    Projections support

    gfx::timsort now supports projections. Originally introduced in the Adobe Source Libraries (ASL), then later picked up by Eric Niebler's for range-v3 before making it all the way to the Ranges TS and C++20, projections are transformations that algorithms apply before examining the values of elements; in timsort a projection is applied to each element to be compared prior to a comparison.

    Consider that you've got a collection of strings and want to sort it by string size instead of string contents. You could use a projection as follows:

    #include <string>
    #include <vector>
    #include <gfx/timsort.hpp>
    
    size_t len(const std::string& str) {
        return str.size();
    }
    
    // Sort a vector of strings by length
    std::vector<std::string> vec;
    // ... fill vec ...
    gfx::timsort(vec.begin(), vec.end(), std::less<size_t>(), &len);
    

    In C++20 projections (but also comparisons) can be pretty much anything satisfying the std::invocable concept, which means all function-like objects and even pointer to members. The support in gfx::timsort is a bit rougher and projections can only be instances of types callable with parentheses.

    Miscellaneous changes

    The following changes were also applied to the project:

    • Assertions with an assert(xx && yy); pattern were split into assert(xx); assert(yy); to provide more precise diagnostics when something fails.

    • When running the testsuite in debug mode through CMake, the flag -Og is now passed to GCC instead of -O0. It was always the intended behaviour but didn't work because of a case bug.

    • Some useless files and configuration were removed from the project in an attempt to clean it a bit.

    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Oct 18, 2019)

    Hi everyone, this release marks a change of maintainers since Goro Fuji (gfx) couldn't take care of the project anymore. I didn't want to let this project go unmaintained so I offered to take care of it.

    The interface of the algorithm remains the same as it was and still works with C++98 and newer C++ standards. However the tooling changed significantly, here is a brief list of overarching changes, which will be described with more detail later:

    • The project now has numbered releases following semantic versioning, this is logically version 1.0.0.
    • The project layout was changed to conform to the Pitchfork Layout.
    • The project now has first-class CMake support.
    • The test suite has been revamped.

    Bug fixes

    The following bugs were fixed:

    • Fixed support for move-only types.
    • Fixed iterator pointing one element before the beginning of the collection in mergeHi: this sometimes caused issues with std::deque or the MSVC STL debug iterators (issue #30).

    Improvements to timsort

    The following improvements were made to the code of timsort:

    • The algorithm now performs fewer comparisons overall (thanks @mitchblank, #24 #26).
    • Some copies/moves to the temporary buffer are now faster (thanks @mitchblank for the ideas in #25).
    • mergeLo and mergeHi are faster and don't allocate extra memory anymore when one of the runs contains only one element.
    • Move semantics support is properly recognized for more compilers.
    • Fixed warning -Winline with GCC.
    • Fixed warning C4244 with MSVC.
    • Fixed warnings about dead code.

    The following changes also affect timsort.hpp:

    • The macro GFX_TIMSORT_USE_STD_MOVE can be defined to 0 or 1 to disable or enable move semantics support in gfx::timsort. If this macro is not set, we try to enable move semantics when we detect compiler support.
    • The logs are enabled only when the macro GFX_TIMSORT_ENABLE_LOG is defined.
    • The assertions are enabled only when the macro GFX_TIMSORT_ENABLE_ASSERT is defined (they previously always fired).
    • Everything except the gfx::timsort function was placed in a pseudo-private gfx::detail:: namespace.

    Tooling

    The tooling was significantly changed prior to this release. One of the main changes was to remove the old project's Makefile and to introduce CMake support instead:

    • The new CMakeLists.txt exports the INTERFACE target gfx::timsort.
    • The tests can be built with -DBUILD_TESTING=ON and run as CTest tests.
    • The benchmarks can be built with -DBUILD_BENCHMARKS=ON.
    • The tests can be run through with Valgrind with -DGFX_TIMSORT_USE_VALGRIND=ON.
    • The tests can be run with sanitizers with -DGFX_TIMSORT_SANITIZE=<comma-separated-list-of-sanitizers>.
    • Both C++98 and C++11 tests are run with the appropriate standard.

    The tests were partly rewritten:

    • They now use Catch2 instead of Boost.Test.
    • They were separated in two files: cxx_98_tests.cpp and cxx_11_tests.cpp which should be compiled with C++98 and C++11 respectively.
    • The tests for move-only types where updated to detect more problematic scenarios like self-moves.

    The continuous integration was also revised:

    • It is now based on CMake & CTest instead of make.
    • Tests are now run on Linux (with GCC and Clang), OSX (with AppleClang) and Windows (with MSVC and MinGW-w64).
    • Some tests are run with either Valgrind, ASAN or UBSAN.

    Other miscellaneous changes:

    • The examples don't rely on Boost anymore.
    • The benchmark in examples/ was moved to benchmarks/.
    Source code(tar.gz)
    Source code(zip)
  • legacy(Sep 23, 2019)

    This release marks the last commit from the original project before a change of tooling and architecture. After this release, the raw Makefile support will be replaced with CMake support, Boost.Test with Catch2, and the project structure will change a bit.

    The algorithm interface itself will remain stable through the new releases: C++ code using gfx::timsort should continue to work.

    Source code(tar.gz)
    Source code(zip)
Owner
TimSort
Implementation of TImSort algorithm
TimSort
IntX is a C++11 port of IntX arbitrary precision Integer library with speed, about O(N * log N) multiplication/division algorithms implementation.

IntX IntX is a C++11 port of IntX arbitrary precision Integer library with speed, about O(N * log N) multiplication/division algorithms implementation

Telepati 9 Mar 9, 2022
A C++ implementation of the graph algorithm of finding all strongly connected components with DFS

SinkSCC A C++ implementation of the graph algorithm of finding all strongly connected components with DFS. Details Each SCC of a graph can be treated

Schrodinger ZHU Yifan 2 Nov 2, 2021
The Implementation of quadtree-based multi-thread tiled pyramid building algorithm.

tile-map-parallel-builder Quadtree-based multi-thread tiled pyramid building algorithm. Core Concept NOTE: The level is different from TMS zoom level.

Shepard 5 May 16, 2022
C implementation of a sudoku solver with backtracking algorithm

SUDOKU SOLVER Sudoku solver using backtracking algorithm Sudoku game To solve a sudoku, you need a sudoku. So i made a basic implmentation of one with

Jonas STIRNEMANN 2 Mar 23, 2022
C implementation of a classical prefix tree with a search function

C implementation of a classical prefix tree with a search function

Liz3 1 Nov 19, 2021
Standalone c++ implementation for computing Motif Adjacency Matrices of large directed networks, for 3-node graphlets and 4-node graphletsa containing a 4 edge loop.

Building Motif Adjacency Matrices This is an efficient C++ software for building Motif Adjacency Matrices (MAM) of networks, for a range of motifs/gra

null 5 Mar 19, 2022
Implementation of Dijkstra's algorithm for finding the shortest paths between nodes in a graph using Priority Queue data structure in C

Implementation of Dijkstra's algorithm for finding the shortest paths between nodes in a graph using Priority Queue data structure in C. File "airport

Artem Kolpakov 1 Jan 24, 2022
C implementation of the Landau-Vishkin algorithm

This repo implements the Landau-Vishkin algorithm to compute the edit distance between two strings. This is a fast method for highly similar strings.

Heng Li 36 Apr 22, 2022
In DFS-BFS Implementation In One Program Using Switch Case I am Using an Simple And Efficient Code of DFS-BFS Implementation.

DFS-BFS Implementation-In-One-Program-Using-Switch-Case-in-C Keywords : Depth First Search(DFS), Breadth First Search(BFS) In Depth First Search(DFS),

Rudra_deep 1 Nov 17, 2021
EASTL stands for Electronic Arts Standard Template Library. It is an extensive and robust implementation that has an emphasis on high performance.

EA Standard Template Library EASTL stands for Electronic Arts Standard Template Library. It is a C++ template library of containers, algorithms, and i

Electronic Arts 6.6k Aug 10, 2022
An Open Source Implementation of the Actor Model in C++

CAF: C++ Actor Framework CAF is an open source implementation of the actor model for C++ featuring lightweight & fast actor implementations, pattern m

CAF: C++ Actor Framework 2.8k Aug 3, 2022
an efficient feature complete C++ bittorrent implementation

libtorrent is an open source C++ library implementing the BitTorrent protocol, along with most popular extensions, making it suitable for real world d

Arvid Norberg 4k Aug 5, 2022
Concurrency Kit 2.1k Aug 2, 2022
An implementation of Actor, Publish-Subscribe, and CSP models in one rather small C++ framework. With performance, quality, and stability proved by years in the production.

What is SObjectizer? What distinguishes SObjectizer? SObjectizer is not like TBB, taskflow or HPX Show me the code! HelloWorld example Ping-Pong examp

Stiffstream 293 Jul 20, 2022
C++ implementation of a fast hash map and hash set using hopscotch hashing

A C++ implementation of a fast hash map and hash set using hopscotch hashing The hopscotch-map library is a C++ implementation of a fast hash map and

Thibaut Goetghebuer-Planchon 559 Aug 1, 2022
C++ implementation of a fast hash map and hash set using robin hood hashing

A C++ implementation of a fast hash map and hash set using robin hood hashing The robin-map library is a C++ implementation of a fast hash map and has

Thibaut Goetghebuer-Planchon 791 Aug 1, 2022
s2n : an implementation of the TLS/SSL protocols

s2n is a C99 implementation of the TLS/SSL protocols that is designed to be simple, small, fast, and with security as a priority. It is released and l

Amazon Web Services 4.1k Aug 3, 2022
A simple C++ 03/11/etc timer class for ~microsecond-precision cross-platform benchmarking. The implementation is as limited and as simple as possible to create the lowest amount of overhead.

plf_nanotimer A simple C++ 03/11/etc timer class for ~microsecond-precision cross-platform benchmarking. The implementation is as limited and simple a

Matt Bentley 89 Jul 15, 2022
C++ implementation of the Google logging module

Google Logging Library The Google Logging Library (glog) implements application-level logging. The library provides logging APIs based on C++-style st

Google 5.5k Aug 4, 2022
jemalloc websitejemalloc - General purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support. [BSD] website

jemalloc is a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support. jemalloc first came

jemalloc memory allocator 7.3k Aug 7, 2022
Single C file TLS 1.2/1.3 implementation, using tomcrypt as crypto library

TLSe Single C file TLS 1.3, 1.2, 1.1 and 1.0(without the weak ciphers) implementation, using libtomcrypt as crypto library. It also supports DTLS 1.2

Eduard Suica 451 Jul 28, 2022
MessagePack implementation for C and C++ / msgpack.org[C/C++]

msgpack for C/C++ It's like JSON but smaller and faster. Overview MessagePack is an efficient binary serialization format, which lets you exchange dat

MessagePack 2.5k Jul 31, 2022
Protocol Buffers implementation in C

Overview This is protobuf-c, a C implementation of the Google Protocol Buffers data serialization format. It includes libprotobuf-c, a pure C library

null 2.1k Aug 1, 2022
a small protobuf implementation in C

μpb - a small protobuf implementation in C Platform Build Status macOS ubuntu μpb (often written 'upb') is a small protobuf implementation written in

Protocol Buffers 1.3k Aug 9, 2022
Open h.265 video codec implementation.

libde265 - open h.265 codec implementation libde265 is an open source implementation of the h.265 video codec. It is written from scratch and has a pl

struktur AG 1.4k Aug 4, 2022
MicroPython - a lean and efficient Python implementation for microcontrollers and constrained systems

The MicroPython project This is the MicroPython project, which aims to put an implementation of Python 3.x on microcontrollers and small embedded syst

MicroPython 14.9k Aug 11, 2022
Implementation of python itertools and builtin iteration functions for C++17

CPPItertools Range-based for loop add-ons inspired by the Python builtins and itertools library. Like itertools and the Python3 builtins, this library

Ryan Haining 1.2k Aug 8, 2022
Minimal A* implementation in C. No dynamic memory allocation.

Micro A Star Path Finder This is a minimal A* path finder implementation in C, without any dynamic memory allocation. Usage The maximum size of the ma

Felipe da Silva 79 Aug 9, 2022