Very Fast Non-Cryptographic Hash Function

Overview

KOMIHASH - Very Fast Hash Function

Introduction

The komihash() function available in the komihash.h file implements a very fast 64-bit hash function, mainly designed for hash-table uses; produces identical hashes on both big- and little-endian systems. Function's code is portable, scalar.

This function features both a high large-block hashing performance (27.7 GB/s on Ryzen 3700X) and a high hashing throughput of small messages (about 12 cycles/hash for 0-15-byte messages). Performance on 32-bit systems is, however, quite low. Also, large-block hashing performance on big-endian systems may be lower due to the need of byte-swapping.

It passes all SMHasher tests. Performance estimates on that page may be unreliable.

Technically, komihash is close to the class of hash functions like wyhash and CircleHash, which are, in turn, close to the lehmer64 PRNG. However, komihash is structurally different to them in that it accumulates the full 128-bit multiplication result without "compression" into a single 64-bit state variable. Thus komihash does not lose differentiation between consecutive states while others may. Another important difference in komihash is that it parses the input message without overlaps. While overlaps allow a function to have fewer code branches, they are considered "non-ideal", potentially causing collisions and seed value flaws. Beside that, komihash features a superior seed value handling and PerlinNoise hashing.

Note that this function is not cryptographically-secure: in open systems it should only be used with a secret seed, to minimize the chance of a collision attack.

Comparisons

These are the performance comparisons made and used by the author during the development of komihash.

LLVM clang-cl 8.0.1 64-bit, Windows 10, Ryzen 3700X (Zen2), 4.2 GHz

Compiler options: /Ox /arch:sse2; overhead: 1.8 cycles/h.

Hash function 0-15b, cycles/h 8-28b, cycles/h bulk, GB/s
komihash 2.8 11.3 17.4 27.7
wyhash_final3 13.4 17.8 29.7
XXH3_64 0.8.0 17.5 21.1 29.0
prvhash64m 4.1 19.9 26.1 4.1

Compiler options: /Ox -mavx2; overhead: 1.8 cycles/h.

Hash function 0-15b, cycles/h 8-28b, cycles/h bulk, GB/s
komihash 2.8 11.1 17.7 27.8
wyhash_final3 13.4 17.7 29.8
XXH3_64 0.8.0 17.7 21.3 61.0
prvhash64m 4.1 20.0 26.2 4.1

GCC 8.5.0 64-bit, CentOS 8, Xeon E-2176G (CoffeeLake), 4.5 GHz

Compiler options: -O3 -msse2; overhead: 5.8 cycles/h.

Hash function 0-15b, cycles/h 8-28b, cycles/h bulk, GB/s
komihash 2.8 18.5 22.4 24.7
wyhash_final3 14.9 19.5 29.8
XXH3_64 0.8.0 16.9 22.3 26.6
prvhash64m 4.1 23.2 27.8 4.3

Compiler options: -O3 -mavx2; overhead: 5.8 cycles/h.

Hash function 0-15b, cycles/h 8-28b, cycles/h bulk, GB/s
komihash 2.8 16.6 21.2 24.7
wyhash_final3 15.4 19.0 30.1
XXH3_64 0.8.0 18.8 23.4 38.0
prvhash64m 4.1 21.7 27.1 4.4

LLVM clang 12.0.1 64-bit, CentOS 8, Xeon E-2176G (CoffeeLake), 4.5 GHz

Compiler options: -O3 -mavx2; overhead: 5.3 cycles/h.

Hash function 0-15b, cycles/h 8-28b, cycles/h bulk, GB/s
komihash 2.8 18.1 22.3 23.5
wyhash_final3 14.0 18.7 28.4
XXH3_64 0.8.0 18.0 29.3 51.0
prvhash64m 4.1 27.0 29.9 4.3

ICC 19.0 64-bit, Windows 10, Core i7-7700K (KabyLake), 4.5 GHz

Compiler options: /O3 /QxSSE2; overhead: 5.9 cycles/h.

Hash function 0-15b, cycles/h 8-28b, cycles/h bulk, GB/s
komihash 2.8 20.1 23.6 18.4
wyhash_final3 19.2 24.5 20.0
XXH3_64 0.8.0 19.9 25.8 28.0
prvhash64m 4.1 25.5 32.4 3.2

Apple clang 12.0.0 64-bit, macOS 12.0.1, Apple M1, 3.5 GHz

Compiler options: -O3; overhead: unestimatable.

Hash function 0-15b, cycles/h 8-28b, cycles/h bulk, GB/s
komihash 2.8 10.1 11.4 23.5
wyhash_final3 7.9 8.1 26.1
XXH3_64 0.8.0 8.2 8.2 30.5
prvhash64m 4.1 12.9 16.8 3.5

Notes: XXH3_64 is unseeded (seeded variant is 1 cycle/h higher). bulk is 256000 bytes (this means it is mainly a cache-bound performance). GB/s should not be misinterpreted as GiB/s. cycles/h means processor clock ticks per hash value, including overhead. Measurement error is approximately 3%.

Averages over all measurements (overhead excluded)

Hash function 0-15b, cycles/h 8-28b, cycles/h
komihash 2.8 11.3 15.7
wyhash_final3 10.3 14.1
XXH3_64 0.8.0 12.9 17.9
prvhash64m 4.1 17.7 22.8

The following methodology was used to obtain the cycles/h values:

	const uint64_t rc = 1ULL << 26;
	const int minl = 8; const int maxl = 28;
	volatile uint64_t msg[ 8 ] = { 0 };
	uint64_t v = 0;

	const TClock t1( CSystem :: getClock() );

	for( int k = minl; k <= maxl; k++ )
	{
		volatile size_t msgl = k;
		volatile uint64_t sd = k + 1;

		for( uint64_t i = 0; i < rc; i++ )
		{
			v ^= komihash( (uint8_t*) &msg, msgl, sd );
//			v ^= wyhash( (uint8_t*) &msg, msgl, sd, _wyp );
//			v ^= XXH3_64bits( (uint8_t*) &msg, msgl );
//			v ^= msg[ 0 ]; // Used to estimate the overhead.
			msg[ 0 ]++;
		}
	}

	printf( "%016llx\n", v );
	printf( "%.1f\n", CSystem :: getClockDiffSec( t1 ) * 4.2e9 /
		( rc * ( maxl - minl + 1 ))); // 4.5 on Xeon, 4.5 on i7700K, 3.5 on M1

Discussion

You may wonder, why komihash does not include a quite common ^MsgLen XOR instruction at some place in the code? The main reason is that due to the way komihash parses the input message such instruction is not necessary. Another reason is that for a non-cryptographic hash function such instruction provides no additional security: while it may seem that such instruction protects from simple "state XORing" collision attacks, in practice it offers no protection, if one considers how powerful SAT solvers are: in a matter of seconds they can "forge" a preimage that produces a required hash value. It is also important to note that in such "fast" hash functions like komihash the input message has complete control over the state variables.

Is 128-bit version of this hash function planned? Most probably, it is not. While such version may be reasonable for data structure compatibility reasons, there is no much practical sense to use 128-bit hashes at a local level: a reliable 64-bit hash allows one to have 2.1 billion diverse binary objects (e.g. files in a file system, or entries in a hash-table) without collisions, on average. On the other hand, on a worldwide scale, having 128-bit hashes is clearly not enough considering the number of existing digital devices and the number of diverse binary objects (e.g. files, and records in databases) on each of them.

A similarly efficient streamed version of komihash is doable given a serious interest in one is expressed.

An opinion on the "bulk" performance of "fast" hash functions: in most practical situations, when processor's total memory bandwidth is limited to e.g. 41 GB/s, a "bulk" single-threaded hashing performance on the order of 30 GB/s is excessive considering memory bandwidth has to be spread over multiple cores. So, practically, such "fast" hash function, working on a high-load 8-core server, rarely receives more than 8 GB/s of bandwidth. Another factor worth mentioning is that a server rarely has more than 10 Gb/s network connectivity, thus further reducing practical hashing performance of incoming data. The same applies to disk system's throughput, if on-disk data is not already in memory.

Other

This function is named the way it is named is to honor Komi Republic (located in Russia), native to the author.

Issues
  • random string benckmark will be better indicator of speed

    random string benckmark will be better indicator of speed

    Hi, I benchmark komihash with my benckmark suite: https://github.com/wangyi-fudan/wyhash/blob/master/benchmark.cpp

    It seems that random string hash has a big gap using gcc 9.3 and ubuntu 20.04 @ 11th generation i3

    /usr/share/dict/words |hash function |short hash/us |bulk_256B GB/s |bulk_64KB GB/s | |---- |---- |---- |---- | |wyhash |460.71 |23.90 |28.74 | |komihash |104.08 |15.82 |23.16 |

    opened by wangyi-fudan 9
  • Undefined behavior?

    Undefined behavior?

    I am trying to understand what happens when I hash an 8 byte message.

    As far as I can tell, kh_lpu64ec_l3 accesses memory outside my message. Is that correct? If yes, then that is undefined behavior and can lead to an IllegalAccess exception.

    If I have no need for cryptographic properties, what is the reason for the elaborate "final byte" processing? What harm would come from padding with all zeros or all ones or any other constant?

    opened by jsyjr 2
Owner
Aleksey Vaneev
Aleksey Vaneev
C++ implementation of a fast hash map and hash set using hopscotch hashing

A C++ implementation of a fast hash map and hash set using hopscotch hashing The hopscotch-map library is a C++ implementation of a fast hash map and

Thibaut Goetghebuer-Planchon 551 Jun 17, 2022
C++ implementation of a fast hash map and hash set using robin hood hashing

A C++ implementation of a fast hash map and hash set using robin hood hashing The robin-map library is a C++ implementation of a fast hash map and has

Thibaut Goetghebuer-Planchon 768 Jun 19, 2022
Explore building a hash table with two different hash functions that balances chain length

hash-duo Explore building a hash table with two different hash functions that balances chain length. There is a really cool article called Don't Throw

Terence Parr 3 May 7, 2022
C++ hash map and hash set which preserve the order of insertion

C++ hash map and hash set which preserves the order of insertion The ordered-map library provides a hash map and a hash set which preserve the order o

Thibaut Goetghebuer-Planchon 350 Jun 26, 2022
A benchmark for hash tables and hash functions in C++, evaluate on different data as comprehensively as possible

Hash Table Benchmark This is yet another benchmark for hash tables(hash maps) with different hash functions in C++, attempting to evaluate the perform

Ren Zibei 2 Mar 14, 2022
A simple single header 64 and 32 bit hash function using only add, sub, ror, and xor.

K-HASH A simple single header 64 bit hash function using only add, sub, ror, and xor. This a just general-purpose hash function for like making hash m

null 68 Jan 12, 2022
Provides very lightweight outcome and result (non-Boost edition)

master branch develop branch CTest dashboard: https://my.cdash.org/index.php?project=Boost.Outcome All tests passing source tarballs: https://github.c

Niall Douglas 546 Jun 20, 2022
A fast and efficient non-iterating hashmap library

libhash: a fast and efficient non-iterating hashmap library Libhash is a fast and efficient non-iterating hashmap library Usage Usage is easy and simp

Oğuzhan Eroğlu 5 May 29, 2022
A family of header-only, very fast and memory-friendly hashmap and btree containers.

The Parallel Hashmap Overview This repository aims to provide a set of excellent hash map implementations, as well as a btree alternative to std::map

Gregory Popovitch 1.5k Jun 27, 2022
Simple hash table implemented in C

Simple hash table written in C. To go with my article How to implement a hash table (in C). This is a learning exercise, not a battle-tested data stru

Ben Hoyt 72 Jun 25, 2022
simple hash table linear probing method can expand/reduce

hash table a simple c hash table implementation based on https://benhoyt.com/writings/hash-table-in-c/ project can store different data types (data al

Fabio Murer 1 Oct 3, 2021
HashTableBenchmark - A simple cross-platform speed & memory-efficiency benchmark for the most common hash-table implementations in the C++ world

Hash-Tables Benchmarks This repository contains a bunch of extendable benchmarks mostly for following containers: std:unordered_map from STL. google::

Unum 7 Jun 15, 2022
A collection of hash tables for parallel programming, including lock-free, wait-free tables.

Hatrack Hash tables for parallel programming This project consisists of fast hash tables suitable for parallel programming, including multiple lock-fr

null 55 Jun 10, 2022
Well Factored, Non-Recursive, General & Generic BSTs in ANSI C

This one set of files implements well factored/layered binary search trees (BSTs) with 5 balancing schemes (none, avl, red-black, L(k), and splay) and

null 2 Dec 7, 2021
Rajesh Kumar Sah 1 Nov 20, 2021
A 3GPP R16 compliant open source 5G core UPF (User Plane Function).

OpenUPF A 3GPP R16 compliant open source UPF. The OpenUPF is an open source project for 5th generation (5G) mobile core networks User Plane Function.

openupf 67 Jun 3, 2022
A tool to collect the exceptions that can reach a C++ function

Exceptions Reporter This tool tries to answer this r/cpp question for a tool to find out, for a given function in my code base, which exceptions it ma

Niels Lohmann 18 Nov 8, 2021
Simple C++ code to benchmark fast division algorithms

fast_division Simple C++ code to benchmark fast division algorithms relying on constant divisors. The code is a companion to the paper Integer Divisio

Daniel Lemire 35 Apr 19, 2022