THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.

Overview

About CUB

CUB provides state-of-the-art, reusable software components for every layer of the CUDA programming model:

Orientation of collective primitives within the CUDA software stack

CUB is included in the NVIDIA HPC SDK and the CUDA Toolkit.

We recommend the CUB Project Website for further information and examples.



A Simple Example

#include <cub/cub.cuh>

// Block-sorting CUDA kernel
__global__ void BlockSortKernel(int *d_in, int *d_out)
{
     using namespace cub;

     // Specialize BlockRadixSort, BlockLoad, and BlockStore for 128 threads
     // owning 16 integer items each
     typedef BlockRadixSort<int, 128, 16>                     BlockRadixSort;
     typedef BlockLoad<int, 128, 16, BLOCK_LOAD_TRANSPOSE>   BlockLoad;
     typedef BlockStore<int, 128, 16, BLOCK_STORE_TRANSPOSE> BlockStore;

     // Allocate shared memory
     __shared__ union {
         typename BlockRadixSort::TempStorage  sort;
         typename BlockLoad::TempStorage       load;
         typename BlockStore::TempStorage      store;
     } temp_storage;

     int block_offset = blockIdx.x * (128 * 16);	  // OffsetT for this block's ment

     // Obtain a segment of 2048 consecutive keys that are blocked across threads
     int thread_keys[16];
     BlockLoad(temp_storage.load).Load(d_in + block_offset, thread_keys);
     __syncthreads();

     // Collectively sort the keys
     BlockRadixSort(temp_storage.sort).Sort(thread_keys);
     __syncthreads();

     // Store the sorted segment
     BlockStore(temp_storage.store).Store(d_out + block_offset, thread_keys);
}

Each thread block uses cub::BlockRadixSort to collectively sort its own input segment. The class is specialized by the data type being sorted, by the number of threads per block, by the number of keys per thread, and implicitly by the targeted compilation architecture.

The cub::BlockLoad and cub::BlockStore classes are similarly specialized. Furthermore, to provide coalesced accesses to device memory, these primitives are configured to access memory using a striped access pattern (where consecutive threads simultaneously access consecutive items) and then transpose the keys into a blocked arrangement of elements across threads.

Once specialized, these classes expose opaque TempStorage member types. The thread block uses these storage types to statically allocate the union of shared memory needed by the thread block. (Alternatively these storage types could be aliased to global memory allocations).



Supported Compilers

CUB is regularly tested using the specified versions of the following compilers. Unsupported versions may emit deprecation warnings, which can be silenced by defining CUB_IGNORE_DEPRECATED_COMPILER during compilation.

  • NVCC 11.0+
  • NVC++ 20.9+
  • GCC 5+
  • Clang 7+
  • MSVC 2019+ (19.20/16.0/14.20)



Releases

CUB is distributed with the NVIDIA HPC SDK and the CUDA Toolkit in addition to GitHub.

See the changelog for details about specific releases.

CUB Release Included In
1.11.0
1.10.0 NVIDIA HPC SDK 20.9
1.9.10-1 NVIDIA HPC SDK 20.7 & CUDA Toolkit 11.1
1.9.10 NVIDIA HPC SDK 20.5
1.9.9 CUDA Toolkit 11.0
1.9.8-1 NVIDIA HPC SDK 20.3
1.9.8 CUDA Toolkit 11.0 Early Access
1.9.8 CUDA 11.0 Early Access
1.8.0
1.7.5 Thrust 1.9.2
1.7.4 Thrust 1.9.1-2
1.7.3
1.7.2
1.7.1
1.7.0 Thrust 1.9.0-5
1.6.4
1.6.3
1.6.2 (previously 1.5.5)
1.6.1 (previously 1.5.4)
1.6.0 (previously 1.5.3)
1.5.2
1.5.1
1.5.0
1.4.1
1.4.0
1.3.2
1.3.1
1.3.0
1.2.3
1.2.2
1.2.0
1.1.1
1.0.2
1.0.1
0.9.4
0.9.2
0.9.1
0.9.0



Development Process

CUB uses the CMake build system to build unit tests, examples, and header tests. To build CUB as a developer, the following recipe should be followed:

# Clone CUB repo from github:
git clone https://github.com/NVIDIA/cub.git
cd cub

# Create build directory:
mkdir build
cd build

# Configure -- use one of the following:
cmake ..   # Command line interface.
ccmake ..  # ncurses GUI (Linux only)
cmake-gui  # Graphical UI, set source/build directories in the app

# Build:
cmake --build . -j <num jobs>   # invokes make (or ninja, etc)

# Run tests and examples:
ctest

By default, the C++14 standard is targeted, but this can be changed in CMake. More information on configuring your CUB build and creating a pull request is found in CONTRIBUTING.md.



Open Source License

CUB is available under the "New BSD" open-source license:

Copyright (c) 2010-2011, Duane Merrill.  All rights reserved.
Copyright (c) 2011-2018, NVIDIA CORPORATION.  All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
   *  Redistributions of source code must retain the above copyright
      notice, this list of conditions and the following disclaimer.
   *  Redistributions in binary form must reproduce the above copyright
      notice, this list of conditions and the following disclaimer in the
      documentation and/or other materials provided with the distribution.
   *  Neither the name of the NVIDIA CORPORATION nor the
      names of its contributors may be used to endorse or promote products
      derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL NVIDIA CORPORATION BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
You might also like...
Trackable ptr - Smart pointer for any movable objects. When trackable object moved/destroyed, trackers updated with new object's pointer.

trackable_ptr Trackable pointer. When trackable object moved/destroyed, trackable_ptrs updated with new object's location. Allow to have stable pointe

OpenGL®-Starter is a template for your upcoming OpenGL Projects which has been compiled to run the most basic Hello World OpenGL Program from LearnOpenGL.com.
OpenGL®-Starter is a template for your upcoming OpenGL Projects which has been compiled to run the most basic Hello World OpenGL Program from LearnOpenGL.com.

OpenGL®-Starter OpenGL®-Starter is a template for your upcoming OpenGL Projects which has been compiled to run the most basic Hello World OpenGL Progr

A template to build a 3DS firmware binary which just has an Arm9 section

minifirm A template to build a 3DS firmware binary which just has an Arm9 section. Install dependencies $ sudo apt install gcc-arm-none-eabi binutils-

Log.c2 is based on rxi/log.c with MIT LICENSE which is inactive now. Log.c has a very flexible and scalable architecture

log.c2 A simple logging library. Log.c2 is based on rxi/log.c with MIT LICENSE which is inactive now. Log.c has a very flexible and scalable architect

CyberVal is a paste of a internal Valorant Cheat which has been used by several providers like LeagueHell, Enduty and several other pasted chairs.

CyberVal CyberVal is a paste of a internal Valorant Cheat which has been used by several providers like LeagueHell, Enduty and several other pasted ch

ESP32 + GitHub Actions + Husarnet. A boilerplate project for ESP32 allowing in-field firmware update using GitHub Actions workflow.

esp32-internet-ota ESP32 + GitHub Actions + Husarnet. A boilerplate project for ESP32 allowing in-field firmware update using GitHub Actions workflow.

Get github repo data, operate github actions remotely

What is GithubGrabber? Use the github api to get data in the repo or perform authorized operations, to export issue/PR data or perform githubbot opera

An automatically updated mirror of the Irrlicht SVN repository on sourceforge

========================================================================== The Irrlicht Engine SDK version 1.9 =======================================

Welcome to my dungeon. Here, I keep all my configuration files in case I have a stroke and lose all my memory. You're very welcome to explore and use anything in this repository. Have fun!

Fr1nge's Dotfiles Welcome to my dungeon. Here, I keep all my configuration files in case I have a stroke an d lose all my memory. You're very welcome

It is a Simple Telegram Bot, which will listen to GitHub Webhook and inform via Telegram

GitHub-Webhook-Bot 🤖 Simple Telegram Bot, which will listen to GitHub Webhook and inform via Telegram Setting Up Config ✍ Go to src/helper.h --- Her

GitHub Repository Card for Every Web Site
GitHub Repository Card for Every Web Site

gh-card GitHub Repository Card for Every Web Site: https://gh-card.dev Example SVG card Demo How it works? The idea is similar to status badges from T

Maker of special .exe, which contains additional files which are unpacked when .exe is run

exe-archivator Program that make exec-me.exe, which contains additional files which are unpacked when exec-me.exe is run. After compleating unpacking

EASTL stands for Electronic Arts Standard Template Library. It is an extensive and robust implementation that has an emphasis on high performance.

EA Standard Template Library EASTL stands for Electronic Arts Standard Template Library. It is a C++ template library of containers, algorithms, and i

Easy to use cryptographic framework for data protection: secure messaging with forward secrecy and secure data storage. Has unified APIs across 14 platforms.
Easy to use cryptographic framework for data protection: secure messaging with forward secrecy and secure data storage. Has unified APIs across 14 platforms.

Themis provides strong, usable cryptography for busy people General purpose cryptographic library for storage and messaging for iOS (Swift, Obj-C), An

SoundMixr is a simple audio mixer for ASIO devices, but specifically made for SAR (http://sar.audio/), it sees all the channels in the ASIO device and shows them with the option to route any input to any output. Each channel also has a mute, mono, stereo pan and volume parameter. This project Orchid-Fst implements a fast text string dictionary search data structure: Finite state transducer (short for FST) in c++ language.This FST C++ open source project has much significant advantages.
This project Orchid-Fst implements a fast text string dictionary search data structure: Finite state transducer (short for FST) in c++ language.This FST C++ open source project has much significant advantages.

Orchid-Fst 1. Project Overview This project Orchid-Fst implements a fast text string dictionary search data structure: Finite state transducer , which

Comments
  • 1.16.0 docs

    1.16.0 docs

    • [x] Generated latest doxygen docs
    • [x] updated download links.
    • [x] update copyright years
    • [x] add newly added .cuh files (some files are excluded deliberately for some reason in old Doxyfile)
    • [x] remove old change log table and link to NVlabs/main/CHANGELOG.md
    • [x] Link to NVIDIA/cub/main/LICENSE.txt (also update year) ~- [ ] include more examples~
    • [x] version number updates with Doxyfile PROJECT_NUMBER update and build.
    • [x] version number download URL by PROJECT_NUMBER ~automatic latest download URL~ dowload_cub.html → download_cub.dox
    • [x] fixed doxygen warnings
    opened by karthikeyann 3
Owner
NVIDIA Research Projects
NVIDIA Research Projects
A competitive programming helper tool, which packages included libraries into a single file, suitable for online judges.

cpack Cpack is a competitive programming helper tool, which packages the main source file along with included libraries into a single file, suitable f

PetarMihalj 11 Apr 22, 2022
Open MPI main development repository

Open MPI The Open MPI Project is an open source Message Passing Interface (MPI) implementation that is developed and maintained by a consortium of aca

Open MPI 1.6k Oct 1, 2022
ARCHIVED - libbson has moved to https://github.com/mongodb/mongo-c-driver/tree/master/src/libbson

libbson ARCHIVED - libbson is now maintained in a subdirectory of the libmongoc project: https://github.com/mongodb/mongo-c-driver/tree/master/src/lib

mongodb 342 Sep 27, 2022
:hocho: Strictly RFC 3986 compliant URI parsing and handling library written in C89; moved from SourceForge to GitHub

uriparser uriparser is a strictly RFC 3986 compliant URI parsing and handling library written in C89 ("ANSI C"). uriparser is cross-platform, fast, su

uriparser 250 Sep 17, 2022
The Lua development repository, as seen by the Lua team. Mirrored irregularly

The Lua development repository, as seen by the Lua team. Mirrored irregularly

Lua 6.1k Oct 1, 2022
CQC (Charmed Quark Controller) a commercial grade, full featured, software based automation system. CQC is built on our CIDLib C++ development system, which is also available here on GitHub.

The CQC Automation System What It Is CQC is a commercial quality, software based automation system, suitable for residential or commercial application

Dean Roddey 60 Oct 2, 2022
nvidia nvmpi encoder for streamFX and obs-studio (e.g. for nvidia jetson. Requires nvmpi enabled ffmpeg / libavcodec)

nvmpi-streamFX-obs nvidia nvmpi encoder for streamFX and obs-studio (e.g. for nvidia jetson. Requires nvmpi enabled ffmpeg / libavcodec) Purpose This

null 16 Jun 25, 2022
A beginner friendly repo in the world of open source. Contribute here to add here project in any languages.

Hacktober Fest 2021 Heyy There (●'◡'●) Here you can contribute to opensource project in any valid language and project. Just follow the contribution g

Anonymous-inception 6 May 24, 2022
https://github.com/json-c/json-c is the official code repository for json-c. See the wiki for release tarballs for download. API docs at http://json-c.github.io/json-c/

\mainpage json-c Overview and Build Status Building on Unix Prerequisites Build commands CMake options Testing Building with vcpkg Linking to libjson-

json-c 2.5k Oct 6, 2022