C++ header-only library with methods to efficiently encode/decode Morton codes in/from 2D/3D coordinates

Overview

Libmorton v0.2.7

Build Status license Donate

  • Libmorton is a C++ header-only library with methods to efficiently encode/decode 64, 32 and 16-bit Morton codes and coordinates, in 2D and 3D. Morton order is also known as Z-order or the Z-order curve.
  • Libmorton is a lightweight and portable library - the only dependencies are standard C++ headers. Architecture-specific optimizations are available.
  • More info and some benchmarks in these blogposts: Morton encoding, Libmorton and BMI2 instruction set

Usage

Include libmorton/morton.h. This will always have stub functions that point to the most efficient way to encode/decode Morton codes. If you want to test out alternative (and possibly slower) methods, you can find them in libmorton/morton2D.h and libmorton/morton3D.h. All libmorton functionality is in the libmorton namespace to avoid conflicts.

// ENCODING 2D / 3D morton codes, of length 32 and 64 bits
inline uint_fast32_t morton2D_32_encode(const uint_fast16_t x, const uint_fast16_t y);
inline uint_fast64_t morton2D_64_encode(const uint_fast32_t x, const uint_fast32_t y);
inline uint_fast32_t morton3D_32_encode(const uint_fast16_t x, const uint_fast16_t y, const uint_fast16_t z);
inline uint_fast64_t morton3D_64_encode(const uint_fast32_t x, const uint_fast32_t y, const uint_fast32_t z);
// DECODING 2D / 3D morton codes, of length 32 and 64 bits
inline void morton2D_32_decode(const uint_fast32_t morton, uint_fast16_t& x, uint_fast16_t& y);
inline void morton2D_64_decode(const uint_fast64_t morton, uint_fast32_t& x, uint_fast32_t& y);
inline void morton3D_32_decode(const uint_fast32_t morton, uint_fast16_t& x, uint_fast16_t& y, uint_fast16_t& z);
inline void morton3D_64_decode(const uint_fast64_t morton, uint_fast32_t& x, uint_fast32_t& y, uint_fast32_t& z);

Installation

No installation is required (just download the headers and include them), but I was informed libmorton is packaged for Microsoft's VCPKG system as well, if you want a more controlled environment to install C++ packages in.

Instruction sets

In the standard case, libmorton only uses operations that are supported on pretty much any CPU you can throw it at. If you know you're compiling for a specific architecture, you might gain a speed boost in encoding/decoding operations by enabling implementations for a specific instruction set. Libmorton ships with support for:

  • BMI2 instruction set: Intel: Haswell CPU's and newer. AMD: Ryzen CPU's and newer. Define __BMI2__ before including morton.h. This is definitely a faster method when compared to the standard case.
  • AVX512 instruction set (experimental): Intel Ice Lake CPU's and newer. Uses _mm512_bitshuffle_epi64_mask. Define __AVX512BITALG__ before including morton.h. For more info on performance, see this PR.

When using MSVC, these options can be found under Project Properties -> Code Generation -> Enable Enhanced Instruction set. When using GCC (version 9.0 or higher), you can use -march=haswell (or -march=znver2) for BMI2 support and -march=icelake-client for AVX512 support.

Testing

The test folder contains tools I use to test correctness and performance of the libmorton implementation. You can regard them as unit tests. This section is under heavy re-writing, but might contain some useful code for advanced usage.

You can build the test suite:

Citation

If you use libmorton in your published paper or work, please reference it, for example as follows:

@Misc{libmorton18,
author = "Jeroen Baert",
title = "Libmorton: C++ Morton Encoding/Decoding Library",
howpublished = "\url{https://github.com/Forceflow/libmorton}",
year = "2018"}

Publications / products that use libmorton

I'm always curious what libmorton ends up on. If you end up using it, send me an e-mail!

  • Thomas Bläsius, Tobias Friedrich et al, 2019. Efficiently Generating Geometric Inhomogeneous and Hyperbolic Random Graphs (link)
  • Alexander Dieckmann, Reinhard Klein, 2018. Hierarchical additive poisson disk sampling (link)
  • Sylvain Rousseau and Tamy Boubekeur, 2017. Fast lossy compression of 3D unit vector sets (PDF)
  • Jan Watter, 2018. Generation of complex numerical meshes using space-filling curves (PDF)
  • Esri
  • Cesium Ion
  • CLAIRE

Thanks / See ALso

  • To @gnzlbg and his Rust implementation bitwise for finding bugs in the Magicbits code
  • @kevinhartman made a C++14 library that supports N-dimensional morton codes morton-nd. He upstreamed a lot of fixes back to libmorton - thanks!
  • Everyone making comments and suggestions on the original blogpost
  • @Wunkolo for AVX512 implementation
  • Fabian Giesen's post on Morton Codes

Contributing

See Contributing.md

TODO

  • Add morton operations like described here
  • Write better test suite (with L1/L2 trashing, better tests, ...)
  • A better naming system for the functions, because m3D_e_sLUT_shifted? That escalated quickly.
Comments
  • Rework CMake scripts for install rules

    Rework CMake scripts for install rules

    Introduce CMake install rules, respecting GNUInstallDirs. Also prefer the standard BUILD_TESTING build option over LIBMORTON_BUILD_TESTS.

    Testing on Fedora 33, with CMake + Ninja :

    cmake -B build/ -S . -G Ninja -DCMAKE_INSTALL_PREFIX:PATH=$PWD/install && ninja -C build/ install
    [...]
    -- Installing: .../libmorton/install/include/libmorton/morton_common.h
    -- Installing: .../libmorton/install/include/libmorton/morton_AVX512BITALG.h
    -- Installing: .../libmorton/install/include/libmorton/morton_BMI.h
    -- Installing: .../libmorton/install/include/libmorton/morton2D_LUTs.h
    -- Installing: .../libmorton/install/include/libmorton/morton2D.h
    -- Installing: .../libmorton/install/include/libmorton/morton3D_LUTs.h
    -- Installing: .../libmorton/install/include/libmorton/morton3D.h
    -- Installing: .../libmorton/install/include/libmorton/morton.h
    -- Installing: .../libmorton/install/share/cmake/libmorton/libmortonTargets.cmake
    -- Installing: .../libmorton/install/share/cmake/libmorton/libmortonConfigVersion.cmake
    -- Installing: .../libmorton/install/share/cmake/libmorton/libmortonConfig.cmake
    -- Installing: .../libmorton/install/share/pkgconfig/libmorton.pc
    
    ninja -C build/ test
    [...]
    [0/1] Running tests...
    Test project .../libmorton/build
        Start 1: libmorton-test
    1/1 Test #1: libmorton-test ...................   Passed   66.82 sec
    
    100% tests passed, 0 tests failed out of 1
    

    Closes #53.

    opened by mablanchard 14
  • Is the magicbit version correct?

    Is the magicbit version correct?

    I have a morton code library in rust that just uses BMI2 and wanted to use magicbits when BMI2 was not available. For testing, I used an invariant of the form:

    fn invariant<T: Int>(v: T) {
      { // 2D
        let (x, y) = morton_decode_2d(v);
        let vs = morton_encode_2d(x, y);
        assert_eq!(v, vs);
      }
      { // 3D
        let (x, y, z) = morton_decode_3d(v);
        let vs = morton_encode_3d(x, y, z);
        assert_eq!(v, vs);
      }
    }
    

    for many integer ranges. I am testing for all u8, u16, [u16::max as u32, u16::max as u32 - 1000000], [u32::max - 1000000, u32::max], [u32::max as u64, u32::max as u64 - 1000000], [u64::max - 1000000, u64::max].

    The BMI2 version always passes all tests correctly so it must be at least be a bug with my implementation.

    I needed to change your magicbits implementation for the 2D case to:

    // 2D decode 32-bit:
                x = x & MORTON_MASK2D_U32[5];
                x = (x ^ (x >> 1)) & MORTON_MASK2D_U32[4];
                x = (x ^ (x >> 2)) & MORTON_MASK2D_U32[3];
                x = (x ^ (x >> 4)) & MORTON_MASK2D_U32[2];
                x = (x ^ (x >> 8)) & MORTON_MASK2D_U32[1];
                x = (x ^ (x >> 16)) & MORTON_MASK2D_U32[0];
    // 2D encode 32-bit:
                x = (x | x << 16) & MORTON_MASK2D_U32[1];
                x = (x | x << 8) & MORTON_MASK2D_U32[2];
                x = (x | x << 4) & MORTON_MASK2D_U32[3];
                x = (x | x << 2) & MORTON_MASK2D_U32[4];
                x = (x | x << 1) & MORTON_MASK2D_U32[5];
    // 2D decode 64-bit:
                x = x & MORTON_MASK2D_U64[5];
                x = (x ^ (x >> 1)) & MORTON_MASK2D_U64[4];
                x = (x ^ (x >> 2)) & MORTON_MASK2D_U64[3];
                x = (x ^ (x >> 4)) & MORTON_MASK2D_U64[2];
                x = (x ^ (x >> 8)) & MORTON_MASK2D_U64[1];
                x = (x ^ (x >> 16)) & MORTON_MASK2D_U64[0];
    // 2D encode 64-bit:
                x = (x | x << 32) & MORTON_MASK2D_U64[0];
                x = (x | x << 16) & MORTON_MASK2D_U64[1];
                x = (x | x << 8) & MORTON_MASK2D_U64[2];
                x = (x | x << 4) & MORTON_MASK2D_U64[3];
                x = (x | x << 2) & MORTON_MASK2D_U64[4];
                x = (x | x << 1) & MORTON_MASK2D_U64[5];
    

    to make it pass this test. I am still having trouble with the 3D version though...

    Just wanted to ping you on the issue that the magic bits implementation might not be 100% correct. It seems to work for small enough morton codes, but if you only need those you should probably be using u8 or u16 specific implementations.

    A good test is just to test it against the BMI2 version for the full u8, u16, and u32 integer ranges, and for some subranges of the u64 integer range (testing the whole u64 range takes N * 100 years on a 3Ghz CPU).

    If it turns out that this is not a bug in my code, but actually a problem with libmorton, you might want to repeat the benchmarks since if more operations are needed for correctness maybe BMI2 will compare better.

    bug 
    opened by gnzlbg 9
  • Use BMI2 pdep/pext instructions

    Use BMI2 pdep/pext instructions

    BMI2 provides parallel bit deposit/extract instructions that allow an efficient encoding/decoding of morton codes. Something like the following should do the trick. Basically in 3D one needs 3 calls to pdep/pext intrinsics to encode/decode morton codes.

    #include <array>
    #include <type_traits>
    #if defined(__BMI2__)
    #include <immintrin.h>
    #endif
    
    // wrappers around the bmi2 pdep/pext intrinsics
    namespace bmi2_detail {
    
    constexpr uint32_t pdep(uint32_t source, uint32_t mask) noexcept {
      return _pdep_u32(source, mask);
    }
    constexpr uint64_t pdep(uint64_t source, uint64_t mask) noexcept {
      return _pdep_u64(source, mask);
    }
    
    constexpr uint32_t pext(uint32_t source, uint32_t mask) noexcept {
      return _pext_u32(source, mask);
    }
    constexpr uint64_t pext(uint64_t source, uint64_t mask) noexcept {
      return _pext_u64(source, mask);
    }
    
    }  // namespace bmi2_detail
    
    /// Parallel Bits Deposit implementation w/o BMI2 intrinsics
    template <typename Integral>
    __attribute__((no_sanitize("integer"))) 
    constexpr Integral deposit_bits(Integral x, Integral mask) {
    #if !defined(__BMI2__)
      Integral res = 0;
      for (Integral bb = 1; mask != 0; bb += bb) {
        if (x & bb) { res |= mask & (-mask); }
        mask &= (mask - 1);
      }
      return res;
    #else
      return bmi2_detail::pdep(x, mask);
    #endif
    }
    
    /// Parallel Bits Extract implementation w/o BMI2 intrinsics
    template <typename Integral>
    __attribute__((no_sanitize("integer"))) 
    constexpr Integral extract_bits(Integral x, Integral mask) {
    #if !defined(__BMI2__)
      Integral res = 0;
      for (Integral bb = 1; mask != 0; bb += bb) {
        if (x & mask & -mask) { res |= bb; }
        mask &= (mask - 1);
      }
      return res;
    #else
      return bmi2_detail::pext(x, mask);
    #endif
    }
    
    // restrict to unsigned integer types
    template <typename T>
    using enable_if_unsigned_t = std::enable_if_t<std::is_unsigned<T>{}>;
    
    template <typename UInt, typename = enable_if_unsigned_t<UInt>>
    constexpr UInt encode(std::array<UInt, 1> xs) noexcept {
      return xs[0];
    }
    template <typename UInt, typename = enable_if_unsigned_t<UInt>>
    constexpr std::array<UInt, 1> decode(UInt code, std::array<UInt, 1>) noexcept {
      // I use std::array for decoding but using anything else should be trivial
      return {{code}};
    }
    
    template <typename UInt, typename = enable_if_unsigned_t<UInt>> 
    UInt encode(std::array<UInt, 2> xs) noexcept {
      return deposit_bits(xs[1], static_cast<UInt>(0xAAAAAAAAAAAAAAAA))
             | deposit_bits(xs[0], static_cast<UInt>(0x5555555555555555));
    }
    template <typename UInt,  typename = enable_if_unsigned_t<UInt>>
    std::array<UInt, 2> decode(UInt code, std::array<UInt, 2>) noexcept {
      return {{extract_bits(code, static_cast<UInt>(0x555555555555555)),
               extract_bits(code, static_cast<UInt>(0xAAAAAAAAAAAAAAAA))}};
    }
    
    template <typename UInt, typename = enable_if_unsigned_t<UInt>>
    UInt encode(std::array<UInt, 3> xs) noexcept {
      return deposit_bits(xs[2], static_cast<UInt>(0x4924924924924924))
             | deposit_bits(xs[1], static_cast<UInt>(0x2492492492492492))
             | deposit_bits(xs[0], static_cast<UInt>(0x9249249249249249));
    }
    template <typename UInt, enable_if_unsigned_t<UInt>>
    std::array<UInt, 3> decode(UInt code, std::array<UInt, 3>) noexcept {
      return {{extract_bits(code, static_cast<UInt>(0x9249249249249249)),
               extract_bits(code, static_cast<UInt>(0x2492492492492492)),
               extract_bits(code, static_cast<UInt>(0x4924924924924924))}};
    }
    
    enhancement 
    opened by gnzlbg 7
  • A _mm512_bitshuffle_epi64_mask method (vpshufbitqmb, AVX512_BITALG)

    A _mm512_bitshuffle_epi64_mask method (vpshufbitqmb, AVX512_BITALG)

    In anticipation of Icelake bringing AVX512 to consumer hardware, I've come up with an alternative to the pdep method for encoding a 32-bit 2D coordinate into a 64-bit morton code which utilizes the vpshufbitqmb instruction of the BITALG subset of AVX512. vpshufbitqmb allows building a new value bit by bit by indexing into a 64-bit lane(into an AVX512 k-register mask).

    It relies on interpreting two 32-bit X and Y coordinates as a continuous 64-bit value though. It can support much higher dimensions provided you zip(as in, feed in and interleave the leading bytes or so of the coordinates vector elements into each 64-bit lane before selecting bits) certain bytes of the input coordinates.

    There is no throughput or latency data at the moment for this instruction or any hardware to actually test this upon but the assembly looks very promising. Especially for any bulk-processing where all the constants are loaded outside of the loop.

    An illustrative example code of the two methods:

    #pragma pack(push, 1)
    union Coord2D
    {
    	std::uint32_t u32[2]; // x, y
    	std::uint64_t u64;
    };
    #pragma pack(pop)
    
    std::uint64_t Morton2DEncode( const Coord2D& coord )
    {
    	const __m512i CoordVec = _mm512_set1_epi64(coord.u64);
    	// Interleave bits from 32-bit X and Y coordinate
    	const __mmask64 Interleave = _mm512_bitshuffle_epi64_mask(
    		CoordVec,+
    		_mm512_set_epi16(
    			0x20'00 + 0x01'01 * 31, 0x20'00 + 0x01'01 * 30,
    			0x20'00 + 0x01'01 * 29, 0x20'00 + 0x01'01 * 28,
    			0x20'00 + 0x01'01 * 27, 0x20'00 + 0x01'01 * 26,
    			0x20'00 + 0x01'01 * 25, 0x20'00 + 0x01'01 * 24,
    			0x20'00 + 0x01'01 * 23, 0x20'00 + 0x01'01 * 22,
    			0x20'00 + 0x01'01 * 21, 0x20'00 + 0x01'01 * 20,
    			0x20'00 + 0x01'01 * 19, 0x20'00 + 0x01'01 * 18,
    			0x20'00 + 0x01'01 * 17, 0x20'00 + 0x01'01 * 16,
    			0x20'00 + 0x01'01 * 15, 0x20'00 + 0x01'01 * 14,
    			0x20'00 + 0x01'01 * 13, 0x20'00 + 0x01'01 * 12,
    			0x20'00 + 0x01'01 * 11, 0x20'00 + 0x01'01 * 10,
    			0x20'00 + 0x01'01 *  9, 0x20'00 + 0x01'01 *  8,
    			0x20'00 + 0x01'01 *  7, 0x20'00 + 0x01'01 *  6,
    			0x20'00 + 0x01'01 *  5, 0x20'00 + 0x01'01 *  4,
    			0x20'00 + 0x01'01 *  3, 0x20'00 + 0x01'01 *  2,
    			0x20'00 + 0x01'01 *  1, 0x20'00 + 0x01'01 *  0
    		)
    	);
    	return _cvtmask64_u64(Interleave);
    }
    
    std::uint64_t m2D_e_BMI(const Coord2D& coord)
    {
    	return _pdep_u64(static_cast<std::uint64_t>(coord.u32[0]), 0x5555555555555555)
    		| _pdep_u64(static_cast<std::uint64_t>(coord.u32[1]), 0xAAAAAAAAAAAAAAAA);
    }
    

    The generated assembly using gcc 9.1 with -Ofast -march=icelake-client:

    Morton2DEncode(Coord2D const&):
            vpbroadcastq    zmm0, QWORD PTR [rdi]
            vpshufbitqmb    k0, zmm0, ZMMWORD PTR .LC0[rip]
            kmovq   rax, k0
            vzeroupper
            ret
    m2D_e_BMI(Coord2D const&):
            mov     eax, DWORD PTR [rdi]
            movabs  rdx, 6148914691236517205
            pdep    rax, rax, rdx
            mov     edx, DWORD PTR [rdi+4]
            movabs  rcx, -6148914691236517206
            pdep    rdx, rdx, rcx
            or      rax, rdx
            ret
    

    I have verified the implementation with Intel's emulator and I'd love to update with some actual benchmarks once I get some Icelake hardware in my hands.

    What do you think?

    enhancement 
    opened by Wunkolo 6
  • 3D Morton for-loop encoding is broken

    3D Morton for-loop encoding is broken

    Hi

    In morton3D.h, lines 120 and 141, the shift variable should be initialized to 3_i, not 2_i, otherwise some bits are overwriting bits from the previous iterations.

    I made the same comment on the original blogpost: http://www.forceflow.be/2013/10/07/morton-encodingdecoding-through-bit-interleaving-implementations/#comment-7508

    Thanks for the lib ! Eric

    bug 
    opened by EBatut-ALG 6
  • Tests for correctness, help request

    Tests for correctness, help request

    Dear @Forceflow thank you very much for sharing your work, it is very appreciated.

    For my own applications I am developing a library similar to libmorton, but in pure Fortran. I am using a bit interleaving technique (see Stocco 2009) that I guess is similar to your (I have not yet studied your method).

    Anyhow, I am looking to your work for validating my implementation (at least for now, when I'll have validated my implementation, I'll try to improve the algorithm performances to match yours, maybe adopting your method) because I am almost sure that my implementation is bugged. Unfortunately, I really have not knowledge of C++ and a very very limited one of C. I tried to read the contents of your test subdirectory, but I cannot understand how you test for correctness of results. Have you hard-coded a list of expected-results into a tests-suite? In particular

    Can you give some hints (online-calculator, list of already-validated results, other-validated codes) on how validate my implementation?

    I can write my self a list of expected-validated results by means of the Morton definition, but if there are external resources ready to use, I really prefer to use them (for my laziness, but also for safety reasons).

    My best regards.

    opened by szaghi 5
  • Interpreting output of morton3D_64_encode

    Interpreting output of morton3D_64_encode

    Here is a demo that is just a loop over a 3x3 cube. I guess I expected the values 0-26 to come back in some shuffled version, but instead I got:

    0 1 8 2 3 10 16 17 24 4 5 12 6 7 14 20 21 28 32 33 40 34 35 42 48 49 56

    
    #include <iostream>
    
    #include "include/morton.h"
    
    int main()
    {
        for(unsigned int z = 0; z < 3; ++z)
        {
            for(unsigned int y = 0; y < 3; ++y)
            {
                for(unsigned int x = 0; x < 3; ++x)
                {
                    int index = morton3D_64_encode(x,y,z);
                    std::cout << "index: " << index << std::endl;
                }
            }
        }
    
      return 0;
    }
    

    Can you explain how to interpret these indices?

    Thanks!

    David

    question 
    opened by daviddoria 4
  • Compilation failure in VirtualBox when AVX2 is enabled

    Compilation failure in VirtualBox when AVX2 is enabled

    In multiple cases now I've seen my users have compilation issues when AVX2 is enabled in a VM but not BMI2.

    I understand that you used a check for __AVX2__ as a substitute for __BMI2__ on MSVC, but this breaks on *nix if AVX2 happens to be enabled without BMI2 (e.g. weird CPU flags or VM).

    The checks for AVX2 should likely be coupled with a check for _MSC_VER.

    https://github.com/pmmp/php-build-scripts/issues/109

    bug fixed 
    opened by dktapps 3
  • Fix compilation errors and add cmake scripts

    Fix compilation errors and add cmake scripts

    Hi Danger,

    This is a revised version of the pull request I made yesterday. I made the following changes:

    • Fix narrowing conversion errors.
    • Undefined function error due to not including cmath.
    • Suppress warnings of sign and unsigned integer comparisons.
    • Add cmake scripts for cross-platform builds.
    • Refactor findFirstSetBitZeroIdx function for MSVC & Win32 environment.

    In addition, I have found there exits a bug in encoding/deconding functions for 3D using ET in msys2 with mingw64 environment. For MSVC, it has no problem. I am now trying to figure out what causes the bug.

    Best, Sho

    opened by shohirose 3
  • _mm512_bitshuffle_epi64_mask implementation

    _mm512_bitshuffle_epi64_mask implementation

    Typing from an Icelake machine(i7-1065G7)! I can peck at this again, even if it ends up slower in the 2D case it'll still be interesting as it can also solve the more general case of trying to interleave the bits of three input variables in parallel possibly much faster than a series of pext/pdep.

    Here's measured latency data from uops.info for vpshufbitqmb.

    Instruction									Lat		TP			Uops	Ports
    VPSHUFBITQMB (K, K, XMM, M128)	AVX512EVEX	[1;8]	1.00 / 1.00	3		1*p0+1*p23+1*p5
    VPSHUFBITQMB (K, K, XMM, XMM)	AVX512EVEX	[1;8]	1.00 / 1.00	2		1*p0+1*p5
    VPSHUFBITQMB (K, K, YMM, M256)	AVX512EVEX	[1;8]	1.00 / 1.00	3		1*p0+1*p23+1*p5
    VPSHUFBITQMB (K, K, YMM, YMM)	AVX512EVEX	[1;8]	1.00 / 1.00	2		1*p0+1*p5
    VPSHUFBITQMB (K, K, ZMM, M512)	AVX512EVEX	[1;8]	1.00 / 1.00	3		1*p0+1*p23+1*p5
    VPSHUFBITQMB (K, K, ZMM, ZMM)	AVX512EVEX	[1;8]	1.00 / 1.00	2		1*p0+1*p5
    VPSHUFBITQMB (K, XMM, M128)		AVX512EVEX	6		1.00 / 1.00	3		1*p0+1*p23+1*p5
    VPSHUFBITQMB (K, XMM, XMM)		AVX512EVEX	6		1.00 / 1.00	2		1*p0+1*p5
    VPSHUFBITQMB (K, YMM, M256)		AVX512EVEX	6		1.00 / 1.00	3		1*p0+1*p23+1*p5
    VPSHUFBITQMB (K, YMM, YMM)		AVX512EVEX	6		1.00 / 1.00	2		1*p0+1*p5
    VPSHUFBITQMB (K, ZMM, M512)		AVX512EVEX	6		1.00 / 1.00	3		1*p0+1*p23+1*p5
    VPSHUFBITQMB (K, ZMM, ZMM)		AVX512EVEX	6		1.00 / 1.00	2		1*p0+1*p5			
    

    compared to pext

    Instruction						Lat		TP			Uops	Ports
    PEXT (R32, R32, M32)	BMI2	[3;8]	1.00 / 1.00	2		1*p1+1*p23
    PEXT (R32, R32, R32)	BMI2	3		1.00 / 1.00	1		1*p1
    PEXT (R64, R64, M64)	BMI2	[3;8]	1.00 / 1.00	2		1*p1+1*p23
    PEXT (R64, R64, R64)	BMI2	3		1.00 / 1.00	1		1*p1	
    

    and pdep

    Instruction						Lat		TP			Uops	Ports
    PDEP (R32, R32, M32)	BMI2	[3;8]	1.00 / 1.00	2	1*p1+1*p23
    PDEP (R32, R32, R32)	BMI2	3		1.00 / 1.00	1	1*p1
    PDEP (R64, R64, M64)	BMI2	[3;8]	1.00 / 1.00	2	1*p1+1*p23
    PDEP (R64, R64, R64)	BMI2	3		1.00 / 1.00	1	1*p1	
    
    opened by Wunkolo 3
  • Encode and decode mismatch

    Encode and decode mismatch

    Hi.

    This library is nice, however I've found an issue.

    Consider:

    template<typename T,typename U>
    ostream & operator << (ostream & os, const pair<T,U> &p)
    {
        os << "(" << p.first << "," << p.second << ")";
        return os;
    }
    
    int main()
    {
        uint_fast32_t i = 16, j = 16, x, y;
        morton2D_64_decode(morton2D_64_encode(i, j), x, y);
        cout << make_pair(i,j) << endl << make_pair(x,y)<<endl;
        return 0;
    }
    

    Compiling with G++ for C++11, I've got the following:

    g++ --std=c++11 morton.cpp

    (16,16)
    (4,4)
    

    Which is a mismatch between encode and decode.

    Thanks.

    bug 
    opened by expedit85 3
  • findFirstSetBitZeroIdx() does not work correctly for small morton code

    findFirstSetBitZeroIdx() does not work correctly for small morton code

    The function findFirstSetBitZeroIdx() does not work correctly for small (<= 4byte) morton values when compiling on 64-bit Linux using gcc or clang.

    const uint32_t x = 545658634;
    unsigned long c;
    findFirstSetBitZeroIdx(x, c); // Expected: c == 29
    

    The issue is that internally, the builtin function __builtin_clzll is used for all inputs, i.e. the input variable x is cast to unsigned long long. For the ID above, __builtin_clzll returns 34, which is correct if x were a 64-bit integer. But that result is substracted from sizeof(morton) * 8, which in this case is 32. In the end we have 32 - 34 -1, cast to unsigned long, which is an integer overflow.

    Possible fix:

    *firstbit_location = static_cast<unsigned long>((sizeof(unsigned long long) * 8) - __builtin_clzll(x) - 1);
    
    bug 
    opened by frank-aurich 1
  • Compilation failing on Mac

    Compilation failing on Mac

    My configuration: MacOS Mojave 10.14.6Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/c++/4.2.1 Apple LLVM version 10.0.1 (clang-1001.0.46.4) Target: x86_64-apple-darwin18.7.0 Thread model: posix InstalledDir: /Library/Developer/CommandLineTools/usr/bin

    Errors I am seeing: template<typename...T> ^ /Users/kiran/libmorton/libmorton/test/libmorton_test.h:98:107: error: expected expression return control_encode_impl<sizeof...(fields)>(std::numeric_limits<uint64_t>::digits / sizeof...(fields), { fields... }); ^ /Users/kiran/libmorton/libmorton/test/libmorton_test.h:102:118: error: a space is required between consecutive right angle brackets (use '> >') void control_decode_impl(uint64_t encoding, std::size_t bitCount, const std::valarray<std::reference_wrapper<uint64_t>>& fields) {

    bug 
    opened by kiranns 9
  • Morton Offsets

    Morton Offsets

    It is possible to quickly (that is, without going through a decode-modify-encode pipeline) compute neighbours of a Morton-encoded coordinate. The technique is mentioned on wikipedia.

    That could be useful to add, though I realise the description of this library only mentions encoding and decoding.

    enhancement 
    opened by IJzerbaard 2
  • A better naming system using tag dispatch

    A better naming system using tag dispatch

    Hi,

    You mentioned in TODO that a better naming system is required. I solved the issue by using a combination of tag dispatch, function overloading, and partial specialization of template classes. Please see the implementation in shohirose/morton for details.

    A brief overview of my idea is the following:

    namespace tag {
    
    // Tags representing each implementation
    struct bmi {};
    struct preshifted_lookup_table {};
    struct lookup_table {};
    
    } // namespace tag
    
    namespace detail {
    
    template <typename MortonCode, typename Coordinate, typename Tag>
    class morton2d {};
    
    // Partial specialization of morton2d for tag::bmi
    template <typename MortonCode, typename Coordinate>
    class morton2d<MortonCode, Coordinate, tag::bmi> {
     public:
      static MortonCode encode(const Coordinate x, const Coordinate y) noexcept {
        // ...
      }
      static void decode(const MortonCode m, Coordinate& x, Coordinate& y) noexcept {
        // ...
      }
    };
    
    // Partial specialization of morton2d for tag::preshifted_lookup_table
    template <typename MortonCode, typename Coordinate>
    class morton2d<MortonCode, Coordinate, tag::preshifted_lookup_table> {
      // ...
    };
    
    // Partial specialization of morton2d for tag::lookup_table
    template <typename MortonCode, typename Coordinate>
    class morton2d<MortonCode, Coordinate, tag::lookup_table> {
      // ...
    };
    
    } // namespace detail
    
    // Switch implementation by using a tag dispatch technique.
    template <typename Tag>
    inline uint_fast32_t encode(const uint_fast16_t x, const uint_fast16_t y, Tag) noexcept {
      return detail::morton2d<uint_fast32_t, uint_fast16_t, Tag>::encode(x, y);
    }
    
    template <typename Tag>
    inline void decode(const uint_fast32_t m, uint_fast16_t& x, uint_fast16_t& y, Tag) noexcept {
      detail::morton2d<uint_fast32_t, uint_fast16_t, Tag>::decode(x, y);
    }
    

    Then, we can specify an implementation through a uniform API:

    uint_fast16_t x = // ...
    uint_fast16_t y = // ...
    
    // Encode using BMI instruction set (if available)
    const auto m1 = encode(x, y, tag::bmi{});
    // Encode using preshifted lookup table
    const auto m2 = encode(x, y, tag::preshifted_lookup_table{});
    // Encode using lookup table
    const auto m3 = encode(x, y, tag::lookup_table{});
    

    I hope this might help you.

    enhancement 
    opened by shohirose 1
  • Creating Tables for N-Dimenstion

    Creating Tables for N-Dimenstion

    Hi,

    Sorry for opening such and issue. However, I was interested if its possible to generate Morton code for n-dimesnsions using the precomputed tables. That is, generating such tables for LUT. Cheers

    PS: I can't seems to find the coding and decoding version of naive for-loop implementation (as discussed in your blogs).

    enhancement 
    opened by Gillani0 3
Releases(v0.2.10)
  • v0.2.10(Apr 25, 2022)

    Changes

    • Removed MSVC dependency for compiling with AVX2 support (#71)
    • Cleaned up computations of checkbit in for-loop examples (#75)
    • Added Visual Studio 2022 solution for test suite
    • Test suite fixes for early termination functions
    Source code(tar.gz)
    Source code(zip)
  • v0.2.9(Feb 16, 2022)

    Changes

    • Added improved 2D encoding/decoding for 32-bit morton codes, based on this gist by @JarkkoPFC, which combines the magicbits method in the lower and upper part of a 64-bit code. These have been promoted to be the default method.
    • GCC compilation fixes
    • Test suite improvements
      • Colored output for correctness tests
      • Seperated performance testing from correctness testing
      • Added linear and random performance testing for 2D methods
    Source code(tar.gz)
    Source code(zip)
  • v0.2.8(Sep 18, 2021)

    Changes

    • Important: Include files have moved to \include\libmorton. Change your projects accordingly
    • Better CMake building for test suite
    • Added Github workflows for test suite
    • Use minimal windows headers for test suite building on MSVC
    • Bump Cmake requirement to >3.15.0
    Source code(tar.gz)
    Source code(zip)
  • v0.2.7(Feb 6, 2021)

  • v0.2.6(Nov 12, 2020)

    Changes

    • Fix compilation problems when both AVX512 and BMI2 instruction sets are available
    • Fix AVX512 include
    • Added Ryzen build to makefile
    Source code(tar.gz)
    Source code(zip)
  • v0.2.5(Aug 23, 2020)

  • v0.2.4(May 22, 2020)

  • v0.2.3(May 19, 2020)

  • v0.2.2(Mar 4, 2020)

  • v0.2.1(Oct 6, 2019)

  • v0.2(Apr 10, 2019)

Owner
Jeroen Baert
Computer scientist, engineer, graphics programmer.
Jeroen Baert
A C++ header-only library of statistical distribution functions.

StatsLib StatsLib is a templated C++ library of statistical distribution functions, featuring unique compile-time computing capabilities and seamless

Keith O'Hara 423 Jan 3, 2023
RcppFastFloat: Rcpp Bindings for the fastfloat C++ Header-Only Library

Converting ascii text into (floating-point) numeric values is a very common problem. The fast_float header-only C++ library by Daniel Lemire does this very well, and very fast at up to or over to 1 gigabyte per second as described in more detail in a recent arXiv paper.

Dirk Eddelbuettel 18 Nov 15, 2022
Header only FFT library

dj_fft: Header-only FFT library Details This repository provides a header-only library to compute fourier transforms in 1D, 2D, and 3D. Its goal is to

Jonathan Dupuy 134 Dec 29, 2022
C++ header-only fixed-point math library

fpm A C++ header-only fixed-point math library. "fpm" stands for "fixed-point math". It is designed to serve as a drop-in replacement for floating-poi

Mike Lankamp 392 Jan 7, 2023
Extremely simple yet powerful header-only C++ plotting library built on the popular matplotlib

matplotlib-cpp Welcome to matplotlib-cpp, possibly the simplest C++ plotting library. It is built to resemble the plotting API used by Matlab and matp

Benno Evers 3.6k Dec 30, 2022
A modern, C++20-native, single-file header-only dense 2D matrix library.

A modern, C++20-native, single-file header-only dense 2D matrix library. Contents Example usage creating matrices basic operations row, col, size, sha

feng wang 62 Dec 17, 2022
A header-only C++ library for large scale eigenvalue problems

NOTE: Spectra 1.0.0 is released, with a lot of API-breaking changes. Please see the migration guide for a smooth transition to the new version. NOTE:

Yixuan Qiu 609 Jan 2, 2023
libmpc++ is a C++ header-only library to solve linear and non-linear MPC

libmpc++ libmpc++ is a C++ library to solve linear and non-linear MPC. The library is written in modern C++17 and it is tested to work on Linux, macOS

Nicola Piccinelli 46 Dec 20, 2022
Header-only C++11 library to handle physical measures

cpp-measures Header-only C++11 library to handle physical measures License: This project is released under the Mozilla Public License 2.0. Purpose Thi

Carlo Milanesi 20 Jun 28, 2018
Header only, single file, simple and efficient C++ library to compute the signed distance function to a triangle mesh

TriangleMeshDistance Header only, single file, simple and efficient C++11 library to compute the signed distance function to a triangle mesh. The dist

Interactive Computer Graphics 100 Dec 28, 2022
A work-in-progress C++20/23 header-only maths library for game development, embedded, kernel and general-purpose that works in constant context.

kMath /kmæθ/ A work-in-progress general-purpose C++20/23 header-only maths library that works in constant context Abstract The kMath Project aims to p

The λ Project 13 Sep 5, 2022
A C++ header only library for decomposition of spectra into a sum of response functions whose weights are positive definite.

DecompLib A C++ header only library for decomposition of spectra into a sum of response functions whose weights are positive definite. Introduction Bu

James T. Matta 2 Jan 22, 2022
A matrix header-only library, uses graphs internally, helpful when your matrix is part of a simulation where it needs to grow many times (or auto expand)

GraphMat Header-only Library Matrix implemented as a graph, specially for the use case when it should be auto expanding at custom rate, specially in s

Aditya Gupta 3 Oct 25, 2021
linalg.h is a single header, public domain, short vector math library for C++

linalg.h linalg.h is a single header, public domain, short vector math library for C++. It is inspired by the syntax of popular shading and compute la

Sterling Orsten 758 Jan 7, 2023
MIRACL Cryptographic SDK: Multiprecision Integer and Rational Arithmetic Cryptographic Library is a C software library that is widely regarded by developers as the gold standard open source SDK for elliptic curve cryptography (ECC).

MIRACL What is MIRACL? Multiprecision Integer and Rational Arithmetic Cryptographic Library – the MIRACL Crypto SDK – is a C software library that is

MIRACL 527 Jan 7, 2023
A C library for statistical and scientific computing

Apophenia is an open statistical library for working with data sets and statistical or simulation models. It provides functions on the same level as t

null 186 Sep 11, 2022
P(R*_{3, 0, 1}) specialized SIMD Geometric Algebra Library

Klein ?? ?? Project Site ?? ?? Description Do you need to do any of the following? Quickly? Really quickly even? Projecting points onto lines, lines t

Jeremy Ong 635 Dec 30, 2022
LibTomMath is a free open source portable number theoretic multiple-precision integer library written entirely in C.

libtommath This is the git repository for LibTomMath, a free open source portable number theoretic multiple-precision integer (MPI) library written en

libtom 543 Dec 27, 2022
a lean linear math library, aimed at graphics programming. Supports vec3, vec4, mat4x4 and quaternions

linmath.h -- A small library for linear math as required for computer graphics linmath.h provides the most used types required for programming compute

datenwolf 729 Jan 9, 2023