FastDynamicCast - Fast dynamic cast in C++ for MSVC, outperforming the regular dynamic cast by up to 25 times

Overview

Fast dynamic cast

This is a single header, dynamic cast implementation which outperforms the regular dynamic_cast by up to 25 times.

Works on MSVC 2013 or newer.

Performance

All performance tests ran on 2 Xeons E5-2623. I ran each iteration count multiple times, and took the average.

The solution used to measure the performance can be found in dcast_performance.

Notice that all performance tests are more or less best-case scenarios which only perform dynamic_cast (see How it works).

Simple class hierarchy

Complex class hierarchy

Speedup vs regular dynamic_cast

Usage

Copy the fast_dynamic_cast.h header to your project, and change your dynamic_cast calls to fast_dynamic_cast, e.g.:

A a;
B& a_as_b = a;
A& a_as_a = fast_dynamic_cast<A&>(a_as_b);

The syntax is identical to the regular dynamic cast. Theres also a fast_dynamic_pointer_cast available, which behaves identical to std::dynamic_pointer_cast.

The dynamic cast implementation only works on MSVC 2013 or newer. For all other compilers, it will fallback to the default dynamic cast.

If you want to use it in a multithreaded environment, change DCAST_MULTITHREADED to 1.

How it works

The dynamic cast has two tasks to do:

  • Ensure the object is related and castable to the requested type
  • Move the pointer so it points to the requested class block.

The fast dynamic cast is built on top on the regular dynamic cast. It improves performance by observing that, after we ensured the class is convertible, we can cache the offset required to shift the pointer.

For example, assuming we have a class A and B, whereas B derives from A, and we want to cast an A pointer to a B pointer. The first time this cast is invoked, the fast dynamic cast uses the regular dynamic cast to retrieve the casted pointer.

Lets say the dynamic cast shifted the A pointer by +8, then the fast dynamic cast stores this offset. The next time the cast is invoked, the incoming pointer is just shifted by +8 and returned, which is much faster and avoids the dynamic_cast call.

However, there are two cases where this approach cannot be used:

  • The pointer passed in is not actually castable
  • The pointer is already shifted

Case 2 can occur, if we for example call fast_dynamic_cast<A*>(B*), but the passed pointer points to a derived C class (whereas C derives from B). This is a problem, since the offset from C -> A is different than the offset from B -> A.

The fast dynamic cast solves both problems by additionally also storing the VTable pointer in the cache. When an object is casted, the fast dynamic cast only detects a cache hit if the vtable pointer matches, and if not, performs the regular dynamic cast.

In theory, casting many different objects to the same superclass would then be even less performant than the regular dynamic cast. However, in practice, the cache usage is over 95%, and thus the fast dynamic cast is up to a order of magnitude faster than the regular dynamic cast.

You might also like...
This is a curve topology verification tool based on Fast Linking Numbers for Topology Verification of Loopy Structures.
This is a curve topology verification tool based on Fast Linking Numbers for Topology Verification of Loopy Structures.

Fast Linking Numbers This tool, called verifycurves, takes input models that consist of closed-loop curves, and outputs a topology certificate as a .t

merge two sorted lists fast

Py Merge Merge sorted list faster than using list.sort or heapq.merge. import merge # create some sorted lists a = list(range(-100, 1700)) b = list(r

A fast multi-producer, multi-consumer lock-free concurrent queue for C++11

moodycamel::ConcurrentQueue An industrial-strength lock-free queue for C++. Note: If all you need is a single-producer, single-consumer queue, I have

C++ implementation of a fast hash map and hash set using hopscotch hashing

A C++ implementation of a fast hash map and hash set using hopscotch hashing The hopscotch-map library is a C++ implementation of a fast hash map and

A family of header-only, very fast and memory-friendly hashmap and btree containers.
A family of header-only, very fast and memory-friendly hashmap and btree containers.

The Parallel Hashmap Overview This repository aims to provide a set of excellent hash map implementations, as well as a btree alternative to std::map

🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes

The Piecewise Geometric Model index (PGM-index) is a data structure that enables fast lookup, predecessor, range searches and updates in arrays of bil

Fast & memory efficient hashtable based on robin hood hashing for C++11/14/17/20

➵ robin_hood unordered map & set robin_hood::unordered_map and robin_hood::unordered_set is a platform independent replacement for std::unordered_map

C++ implementation of a fast hash map and hash set using robin hood hashing

A C++ implementation of a fast hash map and hash set using robin hood hashing The robin-map library is a C++ implementation of a fast hash map and has

A fast Python Common substrings of multiple strings library with C++ implementation

A fast Python Common substrings of multiple strings library with C++ implementation Having a bunch of strings, can I print some substrings which appea

Comments
  • Testing in mac

    Testing in mac

    ~/test/FastDynamicCast/dcast_performance [master] $ g++ --version
    Configured with: --prefix=/Applications/Xcode.app/Contents/Developer//usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/c++/4.2.1
    Apple LLVM version 8.0.0 (clang-800.0.42.1)
    Target: x86_64-apple-darwin16.4.0
    Thread model: posix
    InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
    ~/test/FastDynamicCast/dcast_performance [master] $ g++ main.cpp --std=c++11  && ./a.out
    main.cpp:64:13: warning: backslash and newline separated by space [-Wbackslash-newline-escape]
      //      | \
                ^
    1 warning generated.
    Measure Regular dynamic cast simple: 62 ms
    Measure Fast dynamic cast simple: 67 ms
    Measure Regular dynamic cast complex: 171 ms
    Measure Fast dynamic cast complex: 180 ms
    Measure Threaded regular dynamic cast complex: 190 ms
    Measure Threaded fast dynamic cast complex: 182 ms
    ~/test/FastDynamicCast/dcast_performance [master] $ g++ main.cpp --std=c++11 -O2 && ./a.out
    main.cpp:64:13: warning: backslash and newline separated by space [-Wbackslash-newline-escape]
      //      | \
                ^
    1 warning generated.
    Measure Regular dynamic cast simple: 36 ms
    Measure Fast dynamic cast simple: 39 ms
    Measure Regular dynamic cast complex: 81 ms
    Measure Fast dynamic cast complex: 76 ms
    Measure Threaded regular dynamic cast complex: 91 ms
    Measure Threaded fast dynamic cast complex: 84 ms
    
    opened by ghost 2
  • Slight optimisation on cache hit without match.

    Slight optimisation on cache hit without match.

    Actually, on cache hit, there is a new pointless dynamic_cast done if ptr is not an instance of required class.

    It would be slightly faster to return directly nullptr if there is a cache hit but if this_vtable doesn't match with the stored one.

    opened by Aracthor 2
  • Comparison to memoized_cast

    Comparison to memoized_cast

    Can you elaborate on how fast dynamic cast compares to memoized_cast of Mach7? You can find an overview of it in section 5.5 here. Also it would be great if you can include memoized_cast in your performance comparisons.

    A different name would probably help too as Gibbs and Stroustrup had a paper Fast Dynamic Casting that exploits a different idea for improving speed of dynamic_cast.

    opened by solodon4 3
Owner
tobspr
Full Stack Web (Game) Development. shapez.io, YORG.io, YORG.io 3, schoolbreak.io, Graphics Engineering, and more ...
tobspr
Perception-Aware Trajectory Planner in Dynamic Environments

PANTHER: Perception-Aware Trajectory Planner in Dynamic Environments Citation When using PANTHER, please cite PANTHER: Perception-Aware Trajectory Pla

MIT Aerospace Controls Laboratory 113 Dec 27, 2022
Simple Useful Libraries: C++17/20 header-only dynamic bitset

dynamic_bitset Simple Useful Libraries: C++17/20 header-only dynamic bitset Requirements To use this dynamic bitset, you will need a C++17 (or later)

Maxime Pinard 118 Dec 12, 2022
Build a tree-sitter dynamic module

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! I should clarify that this module is NOT a standalone tree-sitter module. It is supo

Yuan Fu 53 Jan 5, 2023
An intrusive C++17 implementation of a Red-Black-Tree, a Weight Balanced Tree, a Dynamic Segment Tree and much more!

This is Ygg (short for Yggdrasil), a C++17 implementation of several intrusive data structures: several balanced binary search trees: a red-black Tree

Lukas Barth 98 Dec 25, 2022
Implementing dynamic lists in C

Dynamic-Lists Implementing dynamic lists in C <begginer level> DESCRIPTION: This header file <list.h> implements python-like lists and it's functions.

null 2 Feb 2, 2022
C++ library to create dynamic structures in plain memory of shared-memory segments

Ищи описание на хабре @mrlolthe1st. #define _CRT_SECURE_NO_WARNINGS #include "shared_structures.h" #include <iostream> #include <fstream> #include <ca

Александр Новожилов 4 Mar 30, 2022
A fast hash map/hash table (whatever you want to call it) for the C programming language.

C HashMap A fast hash map/hash table (whatever you want to call it) for the C programming language. It can associate a key with a pointer or integer v

Mashpoe 74 Dec 27, 2022
Simple C++ code to benchmark fast division algorithms

fast_division Simple C++ code to benchmark fast division algorithms relying on constant divisors. The code is a companion to the paper Integer Divisio

Daniel Lemire 39 Dec 27, 2022
This project Orchid-Fst implements a fast text string dictionary search data structure: Finite state transducer (short for FST) in c++ language.This FST C++ open source project has much significant advantages.

Orchid-Fst 1. Project Overview This project Orchid-Fst implements a fast text string dictionary search data structure: Finite state transducer , which

Bin Ding 10 Oct 18, 2022
Another neofetch-like utility but this time it's fast.

SystemFetch Another neofetch-like utility but this time it's fast. Example Speed Here is a table of the time it took to execute all of these programs,

YSU 10 Jul 22, 2021