Tiny CUDA Neural Networks

Overview

Tiny CUDA Neural Networks

This is a small, self-contained framework for training and querying neural networks. Most notably, it contains a lightning fast "fully fused" multi-layer perceptron as well as support for various advanced input encodings, losses, and optimizers.

This framework powers the following publication:

Real-time Neural Radiance Caching for Path Tracing
Thomas Müller, Fabrice Rousselle, Jan Novák, Alexander Keller
To appear: ACM Transactions on Graphics (SIGGRAPH) 2021

GTC talk

For business inquiries, please contact [email protected].
For press and other inquiries, please contact Hector Marinez at [email protected].

Performance

Image Fully fused networks vs. TensorFlow v2.5.0 w/ XLA. Measured on 64 (solid line) and 128 (dashed line) neurons wide multi-layer perceptrons on an RTX 3090. Generated by benchmarks/bench_ours.cu and benchmarks/bench_tensorflow.py.

License and Citation

This framework is licensed under the BSD 3-clause license. Please see LICENSE.txt for details.

If you use it in your research, we would appreciate a citation via

@misc{tiny-cuda-nn,
    Author = {Thomas M\"uller},
    Year = {2021},
    Note = {https://github.com/nvlabs/tiny-cuda-nn},
    Title = {Tiny {CUDA} Neural Network Framework}
}

Special thanks go to the NRC authors for helpful discussions and to Nikolaus Binder for providing part of the infrastructure of this framework, as well as for help with utilizing TensorCores from within CUDA.

Usage

Tiny CUDA neural networks have a simple C++/CUDA API:

training_batch_inputs(n_input_dims, batch_size); GPUMatrix training_batch_targets(n_output_dims, batch_size); for (int i = 0; i < n_training_steps; ++i) { generate_training_batch(&training_batch_inputs, &training_batch_targets); // <-- your code float loss; trainer->training_step(nullptr, training_batch_inputs, training_batch_targets, &loss); std::cout << "iteration=" << i << " loss=" << loss << std::endl; } // Use the model GPUMatrix inference_inputs(n_input_dims, batch_size); generate_inputs(&inference_inputs); // <-- your code GPUMatrix inference_outputs(n_output_dims, batch_size); network->inference(nullptr, inference_inputs, inference_outputs); ">
#include <tiny-cuda-nn/common.h>

// Configure the model
nlohmann::json config = {
	{"loss", {
		{"otype", "L2"}
	}},
	{"optimizer", {
		{"otype", "Adam"},
		{"learning_rate", 1e-3},
	}},
	{"encoding", {
		{"otype", "OneBlob"},
		{"n_bins", 32},
	}},
	{"network", {
		{"otype", "FullyFusedMLP"},
		{"n_neurons", 64},
		{"n_hidden_layers", 5},
		{"activation", "ReLU"},
		{"output_activation", "None"},
	}},
};

using namespace tcnn;

auto [loss, optimizer, network, trainer] =
	create_from_config(n_input_dims_to_encode, n_input_dims_to_pass_through, n_output_dims, config);

// Train the model
GPUMatrix<float, MatrixLayout::ColumnMajor> training_batch_inputs(n_input_dims, batch_size);
GPUMatrix<float, MatrixLayout::ColumnMajor> training_batch_targets(n_output_dims, batch_size);

for (int i = 0; i < n_training_steps; ++i) {
	generate_training_batch(&training_batch_inputs, &training_batch_targets); // <-- your code

	float loss;
	trainer->training_step(nullptr, training_batch_inputs, training_batch_targets, &loss);
	std::cout << "iteration=" << i << " loss=" << loss << std::endl;
}

// Use the model
GPUMatrix<float, MatrixLayout::ColumnMajor> inference_inputs(n_input_dims, batch_size);
generate_inputs(&inference_inputs); // <-- your code

GPUMatrix<float, MatrixLayout::ColumnMajor> inference_outputs(n_output_dims, batch_size);
network->inference(nullptr, inference_inputs, inference_outputs);

Example: learning a 2D image

We provide a sample application where an image function (x,y) -> (R,G,B) is learned. It can be run via

tiny-cuda-nn/build> ./mlp_learning_an_image ../data/images/albert.exr ../data/config.json

producing an image every 1000 training steps. Each 1000 steps should take roughly 0.8 seconds with the default configuration on an RTX 3090.

Learned image after 1,000 steps Learned image after 10,000 steps Reference image
1,000 steps 10,000 steps reference

Requirements

  • CUDA v11.2 or higher.
  • CMake v3.17 or higher.
  • A C++14 capable compiler.
  • A high-end NVIDIA GPU that supports TensorCores and has a large amount of shared memory. The framework was tested primarily with an RTX 3090.
    • Ampere GeForce GPUs: compiles out of the box.
    • Ampere A100: requires changing CMAKE_CUDA_ARCHITECTURE to 80 in CMakeLists.txt.
    • Turing GPUs: requires changing CMAKE_CUDA_ARCHITECTURE to 75 in CMakeLists.txt as well as changing SmArch in include/tiny-cuda-nn/cutlass_matmul.h to cutlass::arch::Sm75.

Compilation

Begin by cloning this repository and all its submodules using the following command:

> git clone --recursive https://github.com/nvlabs/tiny-cuda-nn
> cd tiny-cuda-nn
tiny-cuda-nn>

Then, use CMake to generate build files:

tiny-cuda-nn> mkdir build
tiny-cuda-nn> cd build
tiny-cuda-nn/build> cmake ..

Then, depending on your operating system

On Windows, open tiny-cuda-nn/build/tiny-cuda-nn.sln in Visual Studio and click the "Build" button. On Linux you can compile with

tiny-cuda-nn/build> make -j

Components

The following is a summary of all components of this framework that are currently released. Please consult the JSON documentation for how to configure them.

Networks    
Fully fused MLP src/fully_fused_mlp.cu Lightning fast implementation of small multi-layer perceptrons (MLPs).
CUTLASS MLP src/cutlass_mlp.cu MLP based on CUTLASS' GEMM routines. Slower than fully-fused, but handles larger networks and still is reasonably fast.
CUTLASS ResNet src/cutlass_resnet.cu Fully connected residual network based on CUTLASS' GEMM routines.
Input encodings    
Identity include/tiny-cuda-nn/encodings/identity.h Leaves values untouched.
Oneblob include/tiny-cuda-nn/encodings/oneblob.h From Neural Importance Sampling [Müller et al. 2019] and Neural Control Variates [Müller et al. 2020].
Frequency include/tiny-cuda-nn/encodings/frequency.h From NeRF [Mildenhall et al. 2020].
NRC include/tiny-cuda-nn/encodings/nrc.h Combined oneblob and frequency encoding used in Neural Radiance Caching [Müller et al. 2021].
Losses    
L2 include/tiny-cuda-nn/losses/l2.h Standard L2 loss.
Relative L2 include/tiny-cuda-nn/losses/relative_l2.h Relative L2 loss normalized by the network prediction [Lehtinen et al. 2018].
Relative L2 Luminance include/tiny-cuda-nn/losses/relative_l2_luminance.h Same as above, but normalized by the luminance of the network prediction. Only applicable when network prediction is RGB. Used in Neural Radiance Caching [Müller et al. 2021].
Cross Entropy include/tiny-cuda-nn/losses/cross_entropy.h Standard cross entropy loss. Only applicable when the network prediction is a PDF.
Variance include/tiny-cuda-nn/losses/variance_is.h Standard variance loss. Only applicable when the network prediction is a PDF.
Optimizers    
Adam include/tiny-cuda-nn/optimizers/adam.h Implementation of Adam [Kingma and Ba 2014], generalized to AdaBound [Luo et al. 2019].
Novograd include/tiny-cuda-nn/optimizers/lookahead.h Implementation of Novograd [Ginsburg et al. 2019].
SGD include/tiny-cuda-nn/optimizers/sgd.h Standard stochastic gradient descent (SGD).
Shampoo include/tiny-cuda-nn/optimizers/shampoo.h Implementation of the 2nd order Shampoo optimizer [Gupta et al. 2018] with home-grown optimizations as well as those by Anil et al. [2020].
Average include/tiny-cuda-nn/optimizers/average.h Wraps another optimizer and computes a linear average of the weights over the last N iterations. The average is used for inference only (does not feed back into training).
Batched include/tiny-cuda-nn/optimizers/batched.h Wraps another optimizer, invoking the nested optimizer once every N steps on the averaged gradient. Has the same effect as increasing the batch size but requires only a constant amount of memory.
EMA include/tiny-cuda-nn/optimizers/average.h Wraps another optimizer and computes an exponential moving average of the weights. The average is used for inference only (does not feed back into training).
Exponential Decay include/tiny-cuda-nn/optimizers/exponential_decay.h Wraps another optimizer and performs piecewise-constant exponential learning-rate decay.
Lookahead include/tiny-cuda-nn/optimizers/lookahead.h Wraps another optimizer, implementing the lookahead algorithm [Zhang et al. 2019].
Issues
  • Underlying buffer has been detached

    Underlying buffer has been detached

    I followed

    conda create -n dmodel python=3.9
    activate dmodel
    conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
    pip install ninja imageio PyOpenGL glfw xatlas gdown
    pip install git+https://github.com/NVlabs/nvdiffrast/
    pip install --global-option="--no-networks" git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
    imageio_download_bin freeimage
    

    in Windows 10.

    pip install --global-option="--no-networks" git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch

    threw

                    instantiation of "decltype(auto) std::_Get_unwrapped(_Iter &&) [with _Iter=nlohmann::basic_json<std::map, std::vector, std::string, __nv_bool, int64_t, uint64_t, double, std::allocator, nlohmann::adl_serializer, std::vector<uint8_t, std::allocator<uint8_t>>> *const &]"
        e:/VS/VC/Tools/MSVC/14.29.30133/include\xmemory(1703): here
                    instantiation of "std::_Alloc_ptr_t<_Alloc> std::_Uninitialized_move(_InIt, _InIt, std::_Alloc_ptr_t<_Alloc>, _Alloc &) [with _InIt=nlohmann::basic_json<std::map, std::vector, std::string, __nv_bool, int64_t, uint64_t, double, std::allocator, nlohmann::adl_serializer, std::vector<uint8_t, std::allocator<uint8_t>>> *, _Alloc=std::_Rebind_alloc_t<std::allocator<nlohmann::basic_json<std::map, std::vector, std::string, __nv_bool, int64_t, uint64_t, double, std::allocator, nlohmann::adl_serializer, std::vector<uint8_t, std::allocator<uint8_t>>>>, nlohmann::basic_json<std::map, std::vector, std::string, __nv_bool, int64_t, uint64_t, double, std::allocator, nlohmann::adl_serializer, std::vector<uint8_t, std::allocator<uint8_t>>>>]"
        e:/VS/VC/Tools/MSVC/14.29.30133/include\vector(1651): here
                    instantiation of "void std::vector<_Ty, _Alloc>::_Umove_if_noexcept1(std::vector<_Ty, _Alloc>::pointer, std::vector<_Ty, _Alloc>::pointer, std::vector<_Ty, _Alloc>::pointer, std::true_type) [with _Ty=nlohmann::basic_json<std::map, std::vector, std::string, __nv_bool, int64_t, uint64_t, double, std::allocator, nlohmann::adl_serializer, std::vector<uint8_t, std::allocator<uint8_t>>>, _Alloc=std::allocator<nlohmann::basic_json<std::map, std::vector, std::string, __nv_bool, int64_t, uint64_t, double, std::allocator, nlohmann::adl_serializer, std::vector<uint8_t, std::allocator<uint8_t>>>>]"
        e:/VS/VC/Tools/MSVC/14.29.30133/include\vector(1662): here
                    instantiation of "void std::vector<_Ty, _Alloc>::_Umove_if_noexcept(std::vector<_Ty, _Alloc>::pointer, std::vector<_Ty, _Alloc>::pointer, std::vector<_Ty, _Alloc>::pointer) [with _Ty=nlohmann::basic_json<std::map, std::vector, std::string, __nv_bool, int64_t, uint64_t, double, std::allocator, nlohmann::adl_serializer, std::vector<uint8_t, std::allocator<uint8_t>>>, _Alloc=std::allocator<nlohmann::basic_json<std::map, std::vector, std::string, __nv_bool, int64_t, uint64_t, double, std::allocator, nlohmann::adl_serializer, std::vector<uint8_t, std::allocator<uint8_t>>>>]"
        e:/VS/VC/Tools/MSVC/14.29.30133/include\vector(1297): here
                    instantiation of "void std::vector<_Ty, _Alloc>::_Reallocate_exactly(std::vector<_Ty, _Alloc>::size_type) [with _Ty=nlohmann::basic_json<std::map, std::vector, std::string, __nv_bool, int64_t, uint64_t, double, std::allocator, nlohmann::adl_serializer, std::vector<uint8_t, std::allocator<uint8_t>>>, _Alloc=std::allocator<nlohmann::basic_json<std::map, std::vector, std::string, __nv_bool, int64_t, uint64_t, double, std::allocator, nlohmann::adl_serializer, std::vector<uint8_t, std::allocator<uint8_t>>>>]"
        e:/VS/VC/Tools/MSVC/14.29.30133/include\vector(1363): here
                    instantiation of "void std::vector<_Ty, _Alloc>::reserve(std::vector<_Ty, _Alloc>::size_type) [with _Ty=nlohmann::basic_json<std::map, std::vector, std::string, __nv_bool, int64_t, uint64_t, double, std::allocator, nlohmann::adl_serializer, std::vector<uint8_t, std::allocator<uint8_t>>>, _Alloc=std::allocator<nlohmann::basic_json<std::map, std::vector, std::string, __nv_bool, int64_t, uint64_t, double, std::allocator, nlohmann::adl_serializer, std::vector<uint8_t, std::allocator<uint8_t>>>>]"
        E:/Temp/pip-req-build-dx4hpd_b/dependencies\json/json.hpp(18616): here
                    instantiation of "void nlohmann::basic_json<ObjectType, ArrayType, StringType, BooleanType, NumberIntegerType, NumberUnsignedType, NumberFloatType, AllocatorType, JSONSerializer, BinaryType>::json_value::destroy(nlohmann::basic_json<ObjectType, ArrayType, StringType, BooleanType, NumberIntegerType, NumberUnsignedType, NumberFloatType, AllocatorType, JSONSerializer, BinaryType>::value_t) [with ObjectType=std::map, ArrayType=std::vector, StringType=std::string, BooleanType=__nv_bool, NumberIntegerType=int64_t, NumberUnsignedType=uint64_t, NumberFloatType=double, AllocatorType=std::allocator, JSONSerializer=nlohmann::adl_serializer, BinaryType=std::vector<uint8_t, std::allocator<uint8_t>>]"
        E:/Temp/pip-req-build-dx4hpd_b/dependencies\json/json.hpp(19828): here
                    instantiation of "nlohmann::basic_json<ObjectType, ArrayType, StringType, BooleanType, NumberIntegerType, NumberUnsignedType, NumberFloatType, AllocatorType, JSONSerializer, BinaryType>::~basic_json() [with ObjectType=std::map, ArrayType=std::vector, StringType=std::string, BooleanType=__nv_bool, NumberIntegerType=int64_t, NumberUnsignedType=uint64_t, NumberFloatType=double, AllocatorType=std::allocator, JSONSerializer=nlohmann::adl_serializer, BinaryType=std::vector<uint8_t, std::allocator<uint8_t>>]"
        E:/Temp/pip-req-build-dx4hpd_b/dependencies\json/json.hpp(20679): here
    
    ...
    
        e:/VS/VC/Tools/MSVC/14.29.30133/include\xutility(124): error: expected a "("
                  detected during:
                    instantiation of "void *std::_Voidify_iter(_Iter) [with _Iter=std::vector<nlohmann::basic_json<std::map, std::vector, std::string, __nv_bool, int64_t, uint64_t, double, std::allocator, nlohmann::adl_serializer, std::vector<uint8_t, std::allocator<uint8_t>>>, std::allocator<nlohmann::basic_json<std::map, std::vector, std::string, __nv_bool, int64_t, uint64_t, double, std::allocator, nlohmann::adl_serializer, std::vector<uint8_t, std::allocator<uint8_t>>>>> *]"
        e:/VS/VC/Tools/MSVC/14.29.30133/include\xmemory(681): here
                    instantiation of "void std::_Default_allocator_traits<_Alloc>::construct(_Alloc &, _Objty *, _Types &&...) [with _Alloc=std::allocator<std::vector<nlohmann::basic_json<std::map, std::vector, std::string, __nv_bool, int64_t, uint64_t, double, std::allocator, nlohmann::adl_serializer, std::vector<uint8_t, std::allocator<uint8_t>>>, std::allocator<nlohmann::basic_json<std::map, std::vector, std::string, __nv_bool, int64_t, uint64_t, double, std::allocator, nlohmann::adl_serializer, std::vector<uint8_t, std::allocator<uint8_t>>>>>>, _Objty=std::vector<nlohmann::basic_json<std::map, std::vector, std::string, __nv_bool, int64_t, uint64_t, double, std::allocator, nlohmann::adl_serializer, std::vector<uint8_t, std::allocator<uint8_t>>>, std::allocator<nlohmann::basic_json<std::map, std::vector, std::string, __nv_bool, int64_t, uint64_t, double, std::allocator, nlohmann::adl_serializer, std::vector<uint8_t, std::allocator<uint8_t>>>>>, _Types=<>]"
        E:/Temp/pip-req-build-dx4hpd_b/dependencies\json/json.hpp(18440): here
                    instantiation of "T *nlohmann::basic_json<ObjectType, ArrayType, StringType, BooleanType, NumberIntegerType, NumberUnsignedType, NumberFloatType, AllocatorType, JSONSerializer, BinaryType>::create<T,Args...>(Args &&...) [with ObjectType=std::map, ArrayType=std::vector, StringType=std::string, BooleanType=__nv_bool, NumberIntegerType=int64_t, NumberUnsignedType=uint64_t, NumberFloatType=double, AllocatorType=std::allocator, JSONSerializer=nlohmann::adl_serializer, BinaryType=std::vector<uint8_t, std::allocator<uint8_t>>, T=std::vector<nlohmann::basic_json<std::map, std::vector, std::string, __nv_bool, int64_t, uint64_t, double, std::allocator, nlohmann::adl_serializer, std::vector<uint8_t, std::allocator<uint8_t>>>, std::allocator<nlohmann::basic_json<std::map, std::vector, std::string, __nv_bool, int64_t, uint64_t, double, std::allocator, nlohmann::adl_serializer, std::vector<uint8_t, std::allocator<uint8_t>>>>>, Args=<>]"
        E:/Temp/pip-req-build-dx4hpd_b/dependencies\json/json.hpp(18517): here
                    instantiation of "nlohmann::basic_json<ObjectType, ArrayType, StringType, BooleanType, NumberIntegerType, NumberUnsignedType, NumberFloatType, AllocatorType, JSONSerializer, BinaryType>::json_value::json_value(nlohmann::basic_json<ObjectType, ArrayType, StringType, BooleanType, NumberIntegerType, NumberUnsignedType, NumberFloatType, AllocatorType, JSONSerializer, BinaryType>::value_t) [with ObjectType=std::map, ArrayType=std::vector, StringType=std::string, BooleanType=__nv_bool, NumberIntegerType=int64_t, NumberUnsignedType=uint64_t, NumberFloatType=double, AllocatorType=std::allocator, JSONSerializer=nlohmann::adl_serializer, BinaryType=std::vector<uint8_t, std::allocator<uint8_t>>]"
        E:/Temp/pip-req-build-dx4hpd_b/dependencies\json/json.hpp(18947): here
                    instantiation of "nlohmann::basic_json<ObjectType, ArrayType, StringType, BooleanType, NumberIntegerType, NumberUnsignedType, NumberFloatType, AllocatorType, JSONSerializer, BinaryType>::basic_json(nlohmann::basic_json<ObjectType, ArrayType, StringType, BooleanType, NumberIntegerType, NumberUnsignedType, NumberFloatType, AllocatorType, JSONSerializer, BinaryType>::value_t) [with ObjectType=std::map, ArrayType=std::vector, StringType=std::string, BooleanType=__nv_bool, NumberIntegerType=int64_t, NumberUnsignedType=uint64_t, NumberFloatType=double, AllocatorType=std::allocator, JSONSerializer=nlohmann::adl_serializer, BinaryType=std::vector<uint8_t, std::allocator<uint8_t>>]"
        E:/Temp/pip-req-build-dx4hpd_b/dependencies\json/json.hpp(18971): here
                    instantiation of "nlohmann::basic_json<ObjectType, ArrayType, StringType, BooleanType, NumberIntegerType, NumberUnsignedType, NumberFloatType, AllocatorType, JSONSerializer, BinaryType>::basic_json(std::nullptr_t) [with ObjectType=std::map, ArrayType=std::vector, StringType=std::string, BooleanType=__nv_bool, NumberIntegerType=int64_t, NumberUnsignedType=uint64_t, NumberFloatType=double, AllocatorType=std::allocator, JSONSerializer=nlohmann::adl_serializer, BinaryType=std::vector<uint8_t, std::allocator<uint8_t>>]"
        E:/Temp/pip-req-build-dx4hpd_b/dependencies\json/json.hpp(24402): here
                    instantiation of "nlohmann::basic_json<ObjectType, ArrayType, StringType, BooleanType, NumberIntegerType, NumberUnsignedType, NumberFloatType, AllocatorType, JSONSerializer, BinaryType> nlohmann::basic_json<ObjectType, ArrayType, StringType, BooleanType, NumberIntegerType, NumberUnsignedType, NumberFloatType, AllocatorType, JSONSerializer, BinaryType>::parse(IteratorType, IteratorType, nlohmann::basic_json<ObjectType, ArrayType, StringType, BooleanType, NumberIntegerType, NumberUnsignedType, NumberFloatType, AllocatorType, JSONSerializer, BinaryType>::parser_callback_t, __nv_bool, __nv_bool) [with ObjectType=std::map, ArrayType=std::vector, StringType=std::string, BooleanType=__nv_bool, NumberIntegerType=int64_t, NumberUnsignedType=uint64_t, NumberFloatType=double, AllocatorType=std::allocator, JSONSerializer=nlohmann::adl_serializer, BinaryType=std::vector<uint8_t, std::allocator<uint8_t>>, IteratorType=const char *]"
        E:/Temp/pip-req-build-dx4hpd_b/dependencies\json/json.hpp(26513): here
    
        Error limit reached.
        100 errors detected in the compilation of "E:/Temp/pip-req-build-dx4hpd_b/src/cpp_api.cu".
        Compilation terminated.
        cpp_api.cu
        [5/5] cl /showIncludes /nologo /O2 /W3 /GL /DNDEBUG /MD /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc -IE:\Temp\pip-req-build-dx4hpd_b/include -IE:\Temp\pip-req-build-dx4hpd_b/dependencies -IE:\Temp\pip-req-build-dx4hpd_b/dependencies/cutlass/include -IE:\Temp\pip-req-build-dx4hpd_b/dependencies/cutlass/tools/util/include -IE:\miniconda\envs\dmodel\lib\site-packages\torch\include -IE:\miniconda\envs\dmodel\lib\site-packages\torch\include\torch\csrc\api\include -IE:\miniconda\envs\dmodel\lib\site-packages\torch\include\TH -IE:\miniconda\envs\dmodel\lib\site-packages\torch\include\THC "-IE:\Eigene Programme\Cuda\include" -IE:\miniconda\envs\dmodel\include -IE:\miniconda\envs\dmodel\Include "-IE:\VS\VC\Tools\MSVC\14.29.30133\ATLMFC\include" "-IE:\VS\VC\Tools\MSVC\14.29.30133\include" "-IE:\WK\NETFXSDK\4.8\include\um" "-IE:\Windows Kits\10\include\10.0.19041.0\ucrt" "-IE:\Windows Kits\10\include\10.0.19041.0\shared" "-IE:\Windows Kits\10\include\10.0.19041.0\um" "-IE:\Windows Kits\10\include\10.0.19041.0\winrt" "-IE:\Windows Kits\10\include\10.0.19041.0\cppwinrt" -c E:\Temp\pip-req-build-dx4hpd_b\bindings\torch\tinycudann\bindings.cpp /FoE:\Temp\pip-req-build-dx4hpd_b\bindings\torch\build\temp.win-amd64-3.9\Release\tinycudann/bindings.obj /std:c++14 -DTCNN_MIN_GPU_ARCH=52 -DTCNN_NO_NETWORKS -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0
       
       ...
        Hinweis: Einlesen der Datei: E:\Temp\pip-req-build-dx4hpd_b/dependencies\json/json.hpp
        Hinweis: Einlesen der Datei:  E:\VS\VC\Tools\MSVC\14.29.30133\include\cassert
        Hinweis: Einlesen der Datei:   E:\Windows Kits\10\include\10.0.19041.0\ucrt\assert.h
        Hinweis: Einlesen der Datei: E:\Temp\pip-req-build-dx4hpd_b/dependencies\pybind11_json/pybind11_json.hpp
        Hinweis: Einlesen der Datei: E:\Temp\pip-req-build-dx4hpd_b/include\tiny-cuda-nn/cpp_api.h
        ninja: build stopped: subcommand failed.
        Traceback (most recent call last):
          File "E:\miniconda\envs\dmodel\lib\site-packages\torch\utils\cpp_extension.py", line 1740, in _run_ninja_build
            subprocess.run(
          File "E:\miniconda\envs\dmodel\lib\subprocess.py", line 528, in run
            raise CalledProcessError(retcode, process.args,
        subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
    
        The above exception was the direct cause of the following exception:
    
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
          File "E:\Temp\pip-req-build-dx4hpd_b\bindings/torch\setup.py", line 117, in <module>
            setup(
          File "E:\miniconda\envs\dmodel\lib\site-packages\setuptools\__init__.py", line 87, in setup
            return distutils.core.setup(**attrs)
          File "E:\miniconda\envs\dmodel\lib\site-packages\setuptools\_distutils\core.py", line 148, in setup
            return run_commands(dist)
          File "E:\miniconda\envs\dmodel\lib\site-packages\setuptools\_distutils\core.py", line 163, in run_commands
            dist.run_commands()
          File "E:\miniconda\envs\dmodel\lib\site-packages\setuptools\_distutils\dist.py", line 967, in run_commands
            self.run_command(cmd)
          File "E:\miniconda\envs\dmodel\lib\site-packages\setuptools\dist.py", line 1214, in run_command
            super().run_command(command)
          File "E:\miniconda\envs\dmodel\lib\site-packages\setuptools\_distutils\dist.py", line 986, in run_command
            cmd_obj.run()
          File "E:\miniconda\envs\dmodel\lib\site-packages\setuptools\command\install.py", line 68, in run
            return orig.install.run(self)
          File "E:\miniconda\envs\dmodel\lib\site-packages\setuptools\_distutils\command\install.py", line 664, in run
            self.run_command('build')
          File "E:\miniconda\envs\dmodel\lib\site-packages\setuptools\_distutils\cmd.py", line 313, in run_command
            self.distribution.run_command(command)
          File "E:\miniconda\envs\dmodel\lib\site-packages\setuptools\dist.py", line 1214, in run_command
            super().run_command(command)
          File "E:\miniconda\envs\dmodel\lib\site-packages\setuptools\_distutils\dist.py", line 986, in run_command
            cmd_obj.run()
          File "E:\miniconda\envs\dmodel\lib\site-packages\setuptools\_distutils\command\build.py", line 135, in run
            self.run_command(cmd_name)
          File "E:\miniconda\envs\dmodel\lib\site-packages\setuptools\_distutils\cmd.py", line 313, in run_command
            self.distribution.run_command(command)
          File "E:\miniconda\envs\dmodel\lib\site-packages\setuptools\dist.py", line 1214, in run_command
            super().run_command(command)
          File "E:\miniconda\envs\dmodel\lib\site-packages\setuptools\_distutils\dist.py", line 986, in run_command
            cmd_obj.run()
          File "E:\miniconda\envs\dmodel\lib\site-packages\setuptools\command\build_ext.py", line 79, in run
            _build_ext.run(self)
          File "E:\miniconda\envs\dmodel\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 339, in run
            self.build_extensions()
          File "E:\miniconda\envs\dmodel\lib\site-packages\torch\utils\cpp_extension.py", line 741, in build_extensions
            build_ext.build_extensions(self)
          File "E:\miniconda\envs\dmodel\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 448, in build_extensions
            self._build_extensions_serial()
          File "E:\miniconda\envs\dmodel\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 473, in _build_extensions_serial
            self.build_extension(ext)
          File "E:\miniconda\envs\dmodel\lib\site-packages\setuptools\command\build_ext.py", line 202, in build_extension
            _build_ext.build_extension(self, ext)
          File "E:\miniconda\envs\dmodel\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 528, in build_extension
            objects = self.compiler.compile(sources,
          File "E:\miniconda\envs\dmodel\lib\site-packages\torch\utils\cpp_extension.py", line 714, in win_wrap_ninja_compile
            _write_ninja_file_and_compile_objects(
          File "E:\miniconda\envs\dmodel\lib\site-packages\torch\utils\cpp_extension.py", line 1419, in _write_ninja_file_and_compile_objects
            _run_ninja_build(
          File "E:\miniconda\envs\dmodel\lib\site-packages\torch\utils\cpp_extension.py", line 1756, in _run_ninja_build
            raise RuntimeError(message) from e
        RuntimeError: Error compiling objects for extension
        Error in atexit._run_exitfuncs:
        Traceback (most recent call last):
          File "E:\miniconda\envs\dmodel\lib\site-packages\colorama\ansitowin32.py", line 59, in closed
            return stream.closed
        ValueError: underlying buffer has been detached
        ----------------------------------------
    ERROR: Command errored out with exit status 1: 'E:\miniconda\envs\dmodel\python.exe' -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'E:\\Temp\\pip-req-build-dx4hpd_b\\bindings/torch\\setup.py'"'"'; __file__='"'"'E:\\Temp\\pip-req-build-dx4hpd_b\\bindings/torch\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' --no-networks install --record 'E:\Temp\pip-record-xxbieno2\install-record.txt' --single-version-externally-managed --compile --install-headers 'E:\miniconda\envs\dmodel\Include\tinycudann' Check the logs for full command output.
    
    
    opened by ErfolgreichCharismatisch 12
  • Got cutlass error: Error Internal at: 363, when trying to run samples/mlp_learning_an_image_pytorch.py

    Got cutlass error: Error Internal at: 363, when trying to run samples/mlp_learning_an_image_pytorch.py

    Hi, thank you for your pytorch extention! When I tried to run samples/mlp_learning_an_image_pytorch.py, I got an error message:

    Warning: FullyFusedMLP is not supported for the selected architecture 61. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+. Warning: FullyFusedMLP is not supported for the selected architecture 61. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+. NetworkWithInputEncoding(n_input_dims=2, n_output_dims=3, seed=1337, dtype=torch.float32, hyperparams={'encoding': {'base_resolution': 16, 'interpolation': 'Linear', 'log2_hashmap_size': 15, 'n_features_per_level': 2, 'n_levels': 16, 'otype': 'Grid', 'per_level_scale': 1.5, 'type': 'Hash'}, 'network': {'activation': 'ReLU', 'n_hidden_layers': 2, 'n_neurons': 64, 'otype': 'CutlassMLP', 'output_activation': 'None'}, 'otype': 'NetworkWithInputEncoding'}) Writing 'reference.jpg'... done. Beginning optimization with 10000000 training steps. samples/mlp_learning_an_image_pytorch.py:74: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. xs = xs * torch.tensor([shape[1], shape[0]], device=xs.device).float() samples/mlp_learning_an_image_pytorch.py:74: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! xs = xs * torch.tensor([shape[1], shape[0]], device=xs.device).float() Got cutlass error: Error Internal at: 363

    Maybe there is something wrong with my environment?

    My environment: Ubuntu 20.04.4 LTS GeForce GTX 1080 Ti CUDA 11.0 / Driver Version: 470.86 pytorch 1.7.1+cu110 cmake 3.22.2 I installed tinycudann by runningpython setup.py install.

    Thank you

    opened by OctoberKat 10
  • build failing

    build failing

    Build is failing

    [ 5%] Building CUDA object src/CMakeFiles/tiny-cuda-nn.dir/common.cu.o nvcc fatal : A single input file is required for a non-link phase when an outputfile is specified make[2]: *** [src/CMakeFiles/tiny-cuda-nn.dir/build.make:76: src/CMakeFiles/tiny-cuda-nn.dir/common.cu.o] Error 1 make[1]: *** [CMakeFiles/Makefile2:134: src/CMakeFiles/tiny-cuda-nn.dir/all] Error 2 make: *** [Makefile:91: all] Error 2

    opened by rathken 9
  • Any plans for double backward / second-order gradients ? i.e. backward for backward functions.

    Any plans for double backward / second-order gradients ? i.e. backward for backward functions.

    Hi, First of all, thanks for the great repo! I've already built a project based on tcnn and found it extremely helpful.

    However during usage, I found out that since the backward functions are c++ implemented, they are not trackable by pytorch, causing autograd.grad(..., create_graph=True) fails to generate grad_fn for grads (i.e. second-order gradients).

    This functionality is helpful when training and losses are related to first-order gradients. For example, when training a SDF MLP, typically a eikonal loss will be used, which is a loss applied on dy_dx (nablas) of the network. To achieve this, a d(dy_dx)_dparam is needed. Ref: https://arxiv.org/abs/2002.10099 Fig: image

    Currently I'm writing custom backward_backward functions upon tcnn's grid.h and fully_fused_mlp.cu, but it would be really nice if this could be officially supported. :smile:

    BR, Ventus


    🎉🎉🎉 UPDATE: to all people who reach here

    For now, a partial support for double backward and only for grid encodings is implemented within the tiny-cuda-nn repo.

    Example usage script could be found here.

    For implementation details, please check the original PR #69 .

    opened by ventusff 8
  • Do you plan to have a python wrapper for the fully fused MLP?

    Do you plan to have a python wrapper for the fully fused MLP?

    Hi, I am not an expert on cuda coding but have more experience on pytorch/tensorflow... Do you have any plans to have this code with a python (more specifically pytorch) wrapper? Or will it be possible to point the location for forward/backward function of this MLP implementation so that we can potentially incorporate this into other python code?

    Thanks a lot

    opened by MultiPath 8
  • Loading weights into TinyCUDA

    Loading weights into TinyCUDA

    Hi! I'm very excited by TinyCUDA and I'd like to test it out for an inference task on a pre-trained model. I have the network weights as a .npy file and I'd ideally like to load them into the fully fused MLP. From a quick scan of the codebase it looks like there isn't any way to load pre-computed model weights (please correct me if I'm wrong). Do you have any advice on how I could go about accomplishing this?

    opened by ZakSingh 7
  • CUTLASS Error when output size of network > 16

    CUTLASS Error when output size of network > 16

    There appears to be a bug when the output size of a network is greater than 16. This appears to be related to the padding of the output, as it jumps to size 32. I am mostly testing this using instant-grp by setting the n_output_dims of the density network to something larger. Here is a more complete log by replacing the exit() with an assert(False):

    python: /home/dronelab/instant-ngp/dependencies/tiny-cuda-nn/include/tiny-cuda-nn/cutlass_matmul.h:363: 
    void tcnn::fc_multiply_impl(cudaStream_t, const typename Gemm::Arguments&) 
    [with Gemm = cutlass::gemm::device::Gemm<
    cutlass::half_t, cutlass::layout::RowMajor, cutlass::half_t, cutlass::layout::ColumnMajor, cutlass::half_t, cutlass::layout::RowMajor, cutlass::half_t, cutlass::arch::OpClassTensorOp, cutlass::arch::Sm80, 
    cutlass::gemm::GemmShape<128, 32, 32>, cutlass::gemm::GemmShape<32, 32, 32>, cutlass::gemm::GemmShape<16, 8, 8>,
    tcnn::ActivationEpilogue<cutlass::half_t, 8, cutlass::half_t, cutlass::half_t, cutlass::FloatRoundStyle::round_to_nearest>, 
    cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<>, 2, 8, 8, false, cutlass::arch::OpMultiplyAdd, false>; 
    cudaStream_t = CUstream_st*; 
    typename Gemm::Arguments = cutlass::gemm::device::Gemm<cutlass::half_t, cutlass::layout::RowMajor, 
    cutlass::half_t, cutlass::layout::ColumnMajor, cutlass::half_t, cutlass::layout::RowMajor, cutlass::half_t, cutlass::arch::OpClassTensorOp, cutlass::arch::Sm80, 
    cutlass::gemm::GemmShape<128, 32, 32>, cutlass::gemm::GemmShape<32, 32, 32>, cutlass::gemm::GemmShape<16, 8, 8>, 
    tcnn::ActivationEpilogue<cutlass::half_t, 8, cutlass::half_t, cutlass::half_t, cutlass::FloatRoundStyle::round_to_nearest>, cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<>, 2, 8, 8, false, cutlass::arch::OpMultiplyAdd, false>::Arguments]: Assertion `false' failed.
    
    opened by half-potato 6
  • Win10 / VS 2019 build error: nvcc.exe (...) exited with code 1

    Win10 / VS 2019 build error: nvcc.exe (...) exited with code 1

    Hi, I am having an issue with Win10 / VS 2019 - I run cmake, all fine, except for one warning. Then I open tiny-cuda-nn.sln and run Build Solution. It exits with multiple errors. (cmake output at the bottom) Any hints? I am not a windows person, development wise my knowledge is scarse.

    EDIT: Just double checked, the issue only seems to happen on master. Downloading the release zip and building that works just fine.

    VS Error:

    Severity	Code	Description	Project	File	Line	Suppression State
    Error	MSB3721	The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin\nvcc.exe" -gencode=arch=compute_86,code=\"compute_86,compute_86\" -gencode=arch=compute_86,code=\"sm_86,compute_86\" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64" -x cu   -I"C:\Users\rootkid\Documents\Unreal Projects\laif5\external\tiny-cuda-nn\include" -I"C:\Users\rootkid\Documents\Unreal Projects\laif5\external\tiny-cuda-nn\dependencies" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\include"     --keep-dir x64\Debug  -maxrregcount=0  --machine 64 --compile -cudart static --extended-lambda --expt-relaxed-constexpr -std=c++14 -Xcompiler="/EHsc -Zi -Ob0" -g  -D_WINDOWS -DTCNN_MIN_GPU_ARCH=86 -DTCNN_SHAMPOO -D"CMAKE_INTDIR=\"Debug\"" -D_MBCS -D"CMAKE_INTDIR=\"Debug\"" -Xcompiler "/EHsc /W1 /nologo /Od /Fd"C:\Users\rootkid\Documents\Unreal Projects\laif5\external\tiny-cuda-nn\src\Debug\tiny-cuda-nn.pdb" /FS /Zi /RTC1 /MDd /GR" -o tiny-cuda-nn.dir\Debug\common.obj "C:\Users\rootkid\Documents\Unreal Projects\laif5\external\tiny-cuda-nn\src\common.cu"" exited with code 1.	tiny-cuda-nn	C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\MSBuild\Microsoft\VC\v160\BuildCustomizations\CUDA 11.6.targets	790	
    

    cmake:

    -- Selecting Windows SDK version 10.0.17763.0 to target Windows 10.0.19043.
    -- The CUDA compiler identification is NVIDIA 11.6.55
    -- Detecting CUDA compiler ABI info
    -- Detecting CUDA compiler ABI info - done
    -- Check for working CUDA compiler: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.6/bin/nvcc.exe - skipped
    -- Detecting CUDA compile features
    -- Detecting CUDA compile features - done
    -- Targeting GPU architectures: 86
    CMake Warning (dev) at CMakeLists.txt:120 (set):
      Cannot set "TCNN_DEFINITIONS": current scope has no parent.
    This warning is for project developers.  Use -Wno-dev to suppress it.
    
    -- Configuring done
    -- Generating done
    -- Build files have been written to: C:/Users/rootkid/Documents/Unreal Projects/laif5/external/tiny-cuda-nn
    
    opened by kommander 6
  • compiler errors:  common_device.h(75): error: more than one conversion function from

    compiler errors: common_device.h(75): error: more than one conversion function from "tcnn::network_precision_t" to a built-in type applies:

    tiny-cuda-nn/include/tiny-cuda-nn/common_device.h(75): error: more than one conversion function from "tcnn::network_precision_t" to a built-in type applies: function "__half::operator float() const" function "__half::operator short() const" function "__half::operator unsigned short() const" function "__half::operator int() const" function "__half::operator unsigned int() const" function "__half::operator long long() const" function "__half::operator unsigned long long() const" function "__half::operator __nv_bool() const" detected during: instantiation of "void tcnn::warp_activation<T,fragment_t>(tcnn::Activation, const fragment_t &, fragment_t &) [with T=tcnn::network_precision_t, fragment_t=tcnn::vector_fragment_t<tcnn::network_precision_t, 8U>]" (245): here instantiation of "void tcnn::kernel_activation(uint32_t, tcnn::Activation, const T *, T *) [with T=tcnn::network_precision_t, N=8U]" (287): here instantiation of "void tcnn::activation_gpu(cudaStream_t, uint32_t, tcnn::Activation, const T *, T *) [with T=tcnn::network_precision_t]"

    enviroment: ubuntu 18.04 gtx 1080 g++ 9.4.0 cuda-11.0

    opened by dumpinfo 6
  • error include by .cpp files

    error include by .cpp files

    Hey! when i include tiny-cuda-nn/common.h by a .cpp flie, some errors have occurred. a sample main.cpp like

    #include <tiny-cuda-nn/common.h>
    #include <iostream>
    #include <cuda.h>
    #include <cuda_fp16.h>
    #include <cuda_runtime.h>
    #include <cuda_device_runtime_api.h>
    
    int main(){
    	
    	int a = 3;
    	std::cout << a << std::endl; 
    
    
    
    	return 0;
    
    }
    
    In file included from /home/zhn/testCode_new/cuda/CUDACPP/tiny-cuda-nn/include/tiny-cuda-nn/common.h:38,
                     from /home/zhn/testCode_new/cuda/CUDACPP/tiny-cuda-nn/main.cpp:1:
    /home/zhn/testCode_new/cuda/CUDACPP/tiny-cuda-nn/include/tiny-cuda-nn/cpp_api.h:74:25: error: ‘cudaStream_t’ has not been declared
       74 |  virtual void inference(cudaStream_t stream, uint32_t n_elements, const float* input, void* output, void* params) = 0;
          |                         ^~~~~~~~~~~~
    /home/zhn/testCode_new/cuda/CUDACPP/tiny-cuda-nn/include/tiny-cuda-nn/cpp_api.h:75:26: error: ‘cudaStream_t’ has not been declared
       75 |  virtual Context forward(cudaStream_t stream, uint32_t n_elements, const float* input, void* output, void* params, bool prepare_input_gradients) = 0;
          |                          ^~~~~~~~~~~~
    /home/zhn/testCode_new/cuda/CUDACPP/tiny-cuda-nn/include/tiny-cuda-nn/cpp_api.h:76:24: error: ‘cudaStream_t’ has not been declared
       76 |  virtual void backward(cudaStream_t stream, const Context& ctx, uint32_t n_elements, float* dL_dinput, const void* dL_doutput, void* dL_dparams, const float* input, const void* output, const void* params) = 0;
          |                        ^~~~~~~~~~~~
    /home/zhn/testCode_new/cuda/CUDACPP/tiny-cuda-nn/include/tiny-cuda-nn/cpp_api.h:77:39: error: ‘cudaStream_t’ has not been declared
       77 |  virtual void backward_backward_input(cudaStream_t stream, const Context& ctx, uint32_t n_elements, const float* dL_ddLdinput, const float* input, const void* dL_doutput, void* dL_dparams, void* dL_ddLdoutput, float* dL_dinput, const void* params) = 0;
          |                                       ^~~~~~~~~~~~
    make[2]: *** [CMakeFiles/testcpp1.dir/build.make:76:CMakeFiles/testcpp1.dir/main.cpp.o] error 1
    make[1]: *** [CMakeFiles/Makefile2:129:CMakeFiles/testcpp1.dir/all] error 2
    make: *** [Makefile:91:all] error 2
    

    Here I add

    add_executable(testcpp1 main.cpp)
    target_compile_features(testcpp1 PRIVATE cxx_std_14)
    target_link_libraries(testcpp1 tiny-cuda-nn) # PUBLIC ${CUDA_LIBRARIES} 
    

    command in origin CMakeLists.txt.

    Of course, when named main.cu, it is OK. Can you help me? Think you!!!!

    opened by ZHN2ZHN 5
  • Trouble installing tiny-cuda-nn

    Trouble installing tiny-cuda-nn

    I'm trying to set up tinycudann for nvdiffrec Whenever I try running this command in the readme: pip install --global-option="--no-networks" git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch it spits out ERROR: Command errored out with exit status 1 image I tried building it manually this way image but it just fails to build it then spits out the same error. Tried the command below and get RuntimeError: Error compiling objects for extension fully_fused_mlp.cu.txt I also tried to cmake it, no luck image

    Cuda: 11.3 Python 3.9 Windows 10 CMake 3.23.2 Visual Studio 2019 community Git 2.36.1 image image

    opened by Askejm 5
  • RuntimeError: DifferentiableObject::backward_backward_input_impl: not implemented error   (doutput)

    RuntimeError: DifferentiableObject::backward_backward_input_impl: not implemented error (doutput)

    I met the following error, when modifing neus with tinycudann:

      File "exp_runner.py", line 397, in <module>
        runner.train()
      File "exp_runner.py", line 148, in train
        loss.backward()
      File "/home/nerf/.conda/envs/neus/lib/python3.7/site-packages/torch/_tensor.py", line 396, in backward
        torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
      File "/home/nerf/.conda/envs/neus/lib/python3.7/site-packages/torch/autograd/__init__.py", line 175, in backward
        allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
      File "/home/nerf/.conda/envs/neus/lib/python3.7/site-packages/torch/autograd/function.py", line 253, in apply
        return user_fn(self, *args)
      File "/home/nerf/.conda/envs/neus/lib/python3.7/site-packages/tinycudann-1.6-py3.7-linux-x86_64.egg/tinycudann/modules.py", line 88, in backward
        doutput
    RuntimeError: DifferentiableObject::backward_backward_input_impl: not implemented error
    

    Here is the code:

    # models/fields.py for NeuS
    class SDFNetwork(nn.Module):
        def __init__():
           xxxx
            self.sdf_lin_net = tcnn.Network(
                n_input_dims    = (2*multires+1)*d_in,
                n_output_dims   = d_out, # out_dim, d_out
                network_config  = {
                    "otype": "FullyFusedMLP", # FullyFusedMLP | CutlassMLP 
                    "activation": "Softplus",
                    "output_activation": "None",
                    "n_neurons": d_hidden,
                    "n_hidden_layers": n_layers,
                },
            )
        def forward(self, inputs):
            inputs = inputs * self.scale
            if self.embed_fn_fine is not None:
                inputs = self.embed_fn_fine(inputs)
    
            x = inputs
            y = self.sdf_lin_net(x)
            return y
    
    # bindings/torch/tinycudnn/modules.py
    	@staticmethod
    	def backward(ctx, dinput_grad, dweight_grad):
    		# NOTE: currently support:
    		#       ✓   d(dL_dinput)_d(dL_doutput)  doutput_grad
    		#       ✓   d(dL_dinput)_d(params)      weight_grad
    		#       ✓   d(dL_dinput)_d(input)       input_grad
    		#       x   d(dL_dparam)_d(...)
    		input, params, doutput = ctx.saved_tensors
    		# assert dweight_grad is None, "currently do not support 2nd-order gradients from gradient of grid"
    		with torch.enable_grad():
    			# NOTE: preserves requires_grad info (this function is in no_grad() context by default when invoking loss.backward())
    			doutput = doutput * ctx.ctx_fwd.loss_scale
    		with torch.no_grad():
    			print("ctx.ctx_fwd.native_ctx =", ctx.ctx_fwd.native_ctx)
    			print("input =", input.size(), input, input.dtype)
    			print("params =", params.size(), params, params.dtype)
    			print("dinput_grad =", dinput_grad.size(), dinput_grad, dinput_grad.dtype)
    			print("doutput =", doutput.size(), doutput, doutput.dtype)
    			doutput_grad, weight_grad, input_grad = ctx.ctx_fwd.native_tcnn_module.bwd_bwd_input(
    				ctx.ctx_fwd.native_ctx,
    				input,
    				params,
    				dinput_grad,
    				doutput
    			)
    			# NOTE: be cautious when multiplying and dividing loss_scale
    			#       doutput_grad uses dinput_grad
    			#       weight_grad  uses dinput_grad * doutput
    			#       input_grad   uses dinput_grad * doutput
    			weight_grad = None if weight_grad is None else (weight_grad / ctx.ctx_fwd.loss_scale)
    			input_grad = None if input_grad is None else (input_grad / ctx.ctx_fwd.loss_scale)
    
    		# ctx_fwd,   doutput,      input,      params,      output
    		return None, doutput_grad, input_grad, weight_grad, None
    
    

    How to fix it ?

    https://github.com/NVlabs/tiny-cuda-nn/issues/89 this does not work for me!

    opened by coder4869 1
  • Error while installing python extension

    Error while installing python extension

    Windows 10 Visual Studio 2019 version 16.11.17 Anaconda 3, python 3.9.12 CUDA: 11.6 torch: 1.12.0+cu116 cmake 3.22.0-rc2 GPU: RTX 6000

    The tiny-cuda-nn itself compiled normally, but the pytorch extension failed to build.

    When running python setup.py install, error was reported in format.h:

    Building PyTorch extension for tiny-cuda-nn version 1.6
    Targeting compute capability 75
    running install
    running bdist_egg
    running egg_info
    writing tinycudann.egg-info\PKG-INFO
    writing dependency_links to tinycudann.egg-info\dependency_links.txt
    writing top-level names to tinycudann.egg-info\top_level.txt
    reading manifest file 'tinycudann.egg-info\SOURCES.txt'
    writing manifest file 'tinycudann.egg-info\SOURCES.txt'
    installing library code to build\bdist.win-amd64\egg
    running install_lib
    running build_py
    running build_ext
    building 'tinycudann_bindings._C' extension
    Emitting ninja build file E:\Downloads\tiny-cuda-nn\bindings\torch\build\temp.win-amd64-3.9\Release\build.ninja...
    Compiling objects...
    Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
    [1/7] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin\nvcc --generate-dependencies-with-compile --dependency-output E:\Downloads\tiny-cuda-nn\bindings\torch\build\src/common.obj.d --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IE:\Downloads\tiny-cuda-nn/include -IE:\Downloads\tiny-cuda-nn/dependencies -IE:\Downloads\tiny-cuda-nn/dependencies/cutlass/include -IE:\Downloads\tiny-cuda-nn/dependencies/cutlass/tools/util/include -IE:\Downloads\tiny-cuda-nn/dependencies/fmt/include -IE:\Anaconda3\envs\dmodel\lib\site-packages\torch\include -IE:\Anaconda3\envs\dmodel\lib\site-packages\torch\include\torch\csrc\api\include -IE:\Anaconda3\envs\dmodel\lib\site-packages\torch\include\TH -IE:\Anaconda3\envs\dmodel\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\include" -IE:\Anaconda3\envs\dmodel\include -IE:\Anaconda3\envs\dmodel\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt" -c E:\Downloads\tiny-cuda-nn\src\common.cu -o E:\Downloads\tiny-cuda-nn\bindings\torch\build\src/common.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -std=c++14 --extended-lambda --expt-relaxed-constexpr -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -DTCNN_MIN_GPU_ARCH=75 -DFMT_HEADER_ONLY=1 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0
    FAILED: E:/Downloads/tiny-cuda-nn/bindings/torch/build/src/common.obj 
    C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin\nvcc --generate-dependencies-with-compile --dependency-output E:\Downloads\tiny-cuda-nn\bindings\torch\build\src/common.obj.d --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IE:\Downloads\tiny-cuda-nn/include -IE:\Downloads\tiny-cuda-nn/dependencies -IE:\Downloads\tiny-cuda-nn/dependencies/cutlass/include -IE:\Downloads\tiny-cuda-nn/dependencies/cutlass/tools/util/include -IE:\Downloads\tiny-cuda-nn/dependencies/fmt/include -IE:\Anaconda3\envs\dmodel\lib\site-packages\torch\include -IE:\Anaconda3\envs\dmodel\lib\site-packages\torch\include\torch\csrc\api\include -IE:\Anaconda3\envs\dmodel\lib\site-packages\torch\include\TH -IE:\Anaconda3\envs\dmodel\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\include" -IE:\Anaconda3\envs\dmodel\include -IE:\Anaconda3\envs\dmodel\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt" -c E:\Downloads\tiny-cuda-nn\src\common.cu -o E:\Downloads\tiny-cuda-nn\bindings\torch\build\src/common.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -std=c++14 --extended-lambda --expt-relaxed-constexpr -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -DTCNN_MIN_GPU_ARCH=75 -DFMT_HEADER_ONLY=1 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0
    cl: cmd warning D9025 :rewritting“/D__CUDA_NO_HALF_OPERATORS__”(using“/U__CUDA_NO_HALF_OPERATORS__”)
    cl: cmd warning D9025 :rewritting“/D__CUDA_NO_HALF_CONVERSIONS__”(using“/U__CUDA_NO_HALF_CONVERSIONS__”)
    cl: cmd warning D9025 :rewritting“/D__CUDA_NO_HALF2_OPERATORS__”(using“/U__CUDA_NO_HALF2_OPERATORS__”)
    common.cu
    cl: cmd warning D9025 :rewritting“/D__CUDA_NO_HALF_OPERATORS__”(using“/U__CUDA_NO_HALF_OPERATORS__”)
    cl: cmd warning D9025 :rewritting“/D__CUDA_NO_HALF_CONVERSIONS__”(using“/U__CUDA_NO_HALF_CONVERSIONS__”)
    cl: cmd warning D9025 :rewritting“/D__CUDA_NO_HALF2_OPERATORS__”(using“/U__CUDA_NO_HALF2_OPERATORS__”)
    common.cu
    E:\Downloads\tiny-cuda-nn\dependencies\fmt\include\fmt\format.h(2478): error: too many recursive substitutions of function template signatures
              detected during:
                processing of template argument list for "fmt::v9::detail::has_isfinite" 
    (3177): here
                instantiation of "fmt::v9::detail::isfinite" 
    (3177): here
                processing of template argument list for "fmt::v9::detail::has_isfinite" 
    (3177): here
                instantiation of "fmt::v9::detail::isfinite" 
    (3177): here
                processing of template argument list for "fmt::v9::detail::has_isfinite" 
    (3177): here
                [ 397 instantiation contexts not shown ]
                instantiation of "auto fmt::v9::detail::write(OutputIt, T, fmt::v9::basic_format_specs<Char>, fmt::v9::detail::locale_ref)->OutputIt [with Char=char, OutputIt=fmt::v9::appender, T=float, <unnamed>=0]" 
    (3217): here
                instantiation of "auto fmt::v9::detail::write<Char,OutputIt,T,<unnamed>>(OutputIt, T)->OutputIt [with Char=char, OutputIt=fmt::v9::appender, T=float, <unnamed>=0]" 
    (3351): here
                instantiation of "auto fmt::v9::detail::default_arg_formatter<Char>::operator()(T)->fmt::v9::detail::default_arg_formatter<Char>::iterator [with Char=char, T=float]" 
    E:/Downloads/tiny-cuda-nn/dependencies/fmt/include\fmt/core.h(1644): here
                instantiation of "auto fmt::v9::visit_format_arg(Visitor &&, const fmt::v9::basic_format_arg<Context> &)->decltype((<expression>)) [with Visitor=fmt::v9::detail::default_arg_formatter<char>, Context=fmt::v9::format_context]" 
    (4055): here
                instantiation of "void fmt::v9::detail::vformat_to(fmt::v9::detail::buffer<Char> &, fmt::v9::basic_string_view<Char>, fmt::v9::basic_format_args<fmt::v9::basic_format_context<fmt::v9::detail::buffer_appender<fmt::v9::type_identity_t<Char>>, fmt::v9::type_identity_t<Char>>>, fmt::v9::detail::locale_ref) [with Char=char]" 
    E:\Downloads\tiny-cuda-nn\dependencies\fmt\include\fmt\format-inl.h(1472): here
    
    E:\Downloads\tiny-cuda-nn\dependencies\fmt\include\fmt\format.h(2475): error: duplicate base class name
              detected during:
                instantiation of class "fmt::v9::detail::has_isfinite<T, Enable> [with T=float, Enable=void]" 
    (3177): here
                instantiation of "fmt::v9::detail::isfinite" 
    (3177): here
                processing of template argument list for "fmt::v9::detail::has_isfinite" 
    (3177): here
                instantiation of "fmt::v9::detail::isfinite" 
    (3177): here
                processing of template argument list for "fmt::v9::detail::has_isfinite" 
    (3177): here
                [ 395 instantiation contexts not shown ]
                instantiation of "auto fmt::v9::detail::write(OutputIt, T, fmt::v9::basic_format_specs<Char>, fmt::v9::detail::locale_ref)->OutputIt [with Char=char, OutputIt=fmt::v9::appender, T=float, <unnamed>=0]" 
    (3217): here
                instantiation of "auto fmt::v9::detail::write<Char,OutputIt,T,<unnamed>>(OutputIt, T)->OutputIt [with Char=char, OutputIt=fmt::v9::appender, T=float, <unnamed>=0]" 
    (3351): here
                instantiation of "auto fmt::v9::detail::default_arg_formatter<Char>::operator()(T)->fmt::v9::detail::default_arg_formatter<Char>::iterator [with Char=char, T=float]" 
    E:/Downloads/tiny-cuda-nn/dependencies/fmt/include\fmt/core.h(1644): here
                instantiation of "auto fmt::v9::visit_format_arg(Visitor &&, const fmt::v9::basic_format_arg<Context> &)->decltype((<expression>)) [with Visitor=fmt::v9::detail::default_arg_formatter<char>, Context=fmt::v9::format_context]" 
    (4055): here
                instantiation of "void fmt::v9::detail::vformat_to(fmt::v9::detail::buffer<Char> &, fmt::v9::basic_string_view<Char>, fmt::v9::basic_format_args<fmt::v9::basic_format_context<fmt::v9::detail::buffer_appender<fmt::v9::type_identity_t<Char>>, fmt::v9::type_identity_t<Char>>>, fmt::v9::detail::locale_ref) [with Char=char]" 
    E:\Downloads\tiny-cuda-nn\dependencies\fmt\include\fmt\format-inl.h(1472): here
    
    E:\Downloads\tiny-cuda-nn\dependencies\fmt\include\fmt\format.h(2475): error: duplicate base class name
              detected during:
                instantiation of class "fmt::v9::detail::has_isfinite<T, Enable> [with T=float, Enable=void]" 
    (3177): here
                instantiation of "fmt::v9::detail::isfinite" 
    (3177): here
                processing of template argument list for "fmt::v9::detail::has_isfinite" 
    (3177): here
                instantiation of "fmt::v9::detail::isfinite" 
    (3177): here
                processing of template argument list for "fmt::v9::detail::has_isfinite" 
    (3177): here
                [ 393 instantiation contexts not shown ]
                instantiation of "auto fmt::v9::detail::write(OutputIt, T, fmt::v9::basic_format_specs<Char>, fmt::v9::detail::locale_ref)->OutputIt [with Char=char, OutputIt=fmt::v9::appender, T=float, <unnamed>=0]" 
    (3217): here
                instantiation of "auto fmt::v9::detail::write<Char,OutputIt,T,<unnamed>>(OutputIt, T)->OutputIt [with Char=char, OutputIt=fmt::v9::appender, T=float, <unnamed>=0]" 
    (3351): here
                instantiation of "auto fmt::v9::detail::default_arg_formatter<Char>::operator()(T)->fmt::v9::detail::default_arg_formatter<Char>::iterator [with Char=char, T=float]" 
    E:/Downloads/tiny-cuda-nn/dependencies/fmt/include\fmt/core.h(1644): here
                instantiation of "auto fmt::v9::visit_format_arg(Visitor &&, const fmt::v9::basic_format_arg<Context> &)->decltype((<expression>)) [with Visitor=fmt::v9::detail::default_arg_formatter<char>, Context=fmt::v9::format_context]" 
    (4055): here
                instantiation of "void fmt::v9::detail::vformat_to(fmt::v9::detail::buffer<Char> &, fmt::v9::basic_string_view<Char>, fmt::v9::basic_format_args<fmt::v9::basic_format_context<fmt::v9::detail::buffer_appender<fmt::v9::type_identity_t<Char>>, fmt::v9::type_identity_t<Char>>>, fmt::v9::detail::locale_ref) [with Char=char]" 
    E:\Downloads\tiny-cuda-nn\dependencies\fmt\include\fmt\format-inl.h(1472): here
    ...
    More errors
    ...
    
    Error limit reached.
    100 errors detected in the compilation of "E:/Downloads/tiny-cuda-nn/src/fully_fused_mlp.cu".
    Compilation terminated.
    fully_fused_mlp.cu
    ninja: build stopped: subcommand failed.
    
    opened by bacTlink 5
  • Decouple inference and training.

    Decouple inference and training.

    Hi, I'm playing with mlp_learning_an_image.cu file in the samples directory, it simply trains an MLP and performs inference on the MLP. I want to save the trained model and then load it back (i.e. I want to decouple training from inference). Can you please share some insights on how can I save the model into a file and then load it at inference time? thanks

    opened by husnainmubarikIntel 0
  • Do you plan to add a TensorFlow wrapper?

    Do you plan to add a TensorFlow wrapper?

    First of all, thanks for all the great efforts you have put into this repository!

    While trying to apply instant-ngp to another project, I found out that the other project is implemented in Tensorflow. Repositories built on top of the original NeRF implementation are implemented in Tensorflow.

    Do you have any plans for providing a TensorFlow wrapper? Or would there be a simpler way to apply instant-ngp to a TensorFlow-based code?

    Thanks in advance!

    opened by totolacky 0
  • HashGrid Initialization.

    HashGrid Initialization.

    Hi, I want to change the initialization scheme to be compatible with subsequent module. But when I tried some different init such as U(-1, 1), the model become unable to converge. I only changed the function initialize_params() in grid.h, and I wonder if I should modify the gradient accordingly?

    opened by DouYishun 0
Releases(v1.5)
  • v1.5(Apr 22, 2022)

    Changes Since Last Release

    • Encodings and neural networks in tiny-cuda-nn now share the same generic API for differentiable objects. This simplifies implementations significantly.
      • As part of this generalization, encodings and neural networks can now take and produce row- and column-major matrices (i.e. both AoS and SoA data). Additionally, input data may be strided arbitrarily, which permits slicing of input matrices without copying.
    • Added GridEncoding support for double-backward, which is useful for e.g. eikonal supervision (courtesy of @ventusff).
    • Dropped the dependency on PyEXR / tinyexr in the sample applications (using imageio / stb_image instead).
    • Fixed many bug, added several performance improvements, and improved compatibility with older GPUs.
    Source code(tar.gz)
    Source code(zip)
  • v1.4(Feb 14, 2022)

    Changes Since Last Release

    Major Changes

    • Added a PyTorch extension for using tiny-cuda-nn from within Python.
      • This functionality is considered to be in a "beta" state. Please do report any issues you come across!
      • See the this section of the README for installation/usage instructions.
      • Caveat: the overheads of Python/PyTorch can be extensive. For example, the bundled mlp_learning_an_image example is ~2x slower through PyTorch than native CUDA. (This is still faster than implementing everything from scratch in Python, but something to be aware of.)
    • Significantly reduced memory usage (sometimes 3x lower)
      • Added a GPU memory arena that permits efficient, stream-ordered allocation and de-allocation of temporary buffers. This circumvents the need for pre-allocation, resulting in often 3x lower memory consumption.
      • The memory arena uses the GPU's virtual memory mapper to get its performance without invalidating pointers or shuffling memory around.
    • All neural networks in tiny-cuda-nn now additionally support row-major input memory layout. This affords higher performance and lower memory usage when transposition was otherwise required.
      • GridEncoding naturally outputs row-major data and is thus sped-up by ~20% when followed by a neural network.
    • tiny-cuda-nn now runs on older GPUs down to compute capability 37.

    Minor Changes

    • Sped up the input gradient computation of GridEncoding by ~3x.
    • Sped up SyncedMultiStream.
    • Fixed incorrect gradients of SphericalHarmonicsEncoding.
    • Fixed incorrect gradients of GridEncoding when max_level arguments were provided or Interpolation::Nearest was used.
    Source code(tar.gz)
    Source code(zip)
  • v1.3(Jan 14, 2022)

    Changes Since Last Release

    Major Changes

    • Adds a new encoding: GridEncoding
    • tiny-cuda-nn now runs on CUDA 10.2 (previously required CUDA 11 and higher)
    • tiny-cuda-nn now only requires C++14 (previously C++17)

    Minor Changes

    • This repository now supports continuous integration builds through GitHub Actions.
    • Added support for 16 neurons wide FullyFusedMLP
    • Added support for nesting of SyncedMultiStream
    Source code(tar.gz)
    Source code(zip)
  • v1.2(Dec 15, 2021)

    Changes Since Last Release

    Major Changes

    • Adds three new encodings: (i) TriangleWave, (ii) SphericalHarmonics, (iii) Composite
    • Pitched pointers are now used to parameterize inputs and outputs of all encodings.
      • This feature enables a new Composite encoding that can apply basic encodings to different subsets of input dimensions.
      • This also removes the distinction of "encoded dims" vs. "passthrough_dims". The old behavior of passing through certain dimensions can be achieved by composing with the Identity encoding.
    • tiny-cuda-nn no longer depends on cuRAND and instead uses an implementation of the PCG32 random number generator (derived from https://github.com/wjakob/pcg32) for all randomness.
    • Activation code has been centralized within and across CUTLASS components. All neural network implementations now support all activation functions (except for the ResNet, which still only supports ReLU activations in its hidden layers).

    Minor Changes

    • Installed GPUs are now correctly automatically detected and targeted by CMake.
    • Samples and benchmarks can now be disabled when tiny-cuda-nn is used as a submodule.
    • The required CUDA version has been relaxed. Future plans include compatibility with CUDA 10.2
    Source code(tar.gz)
    Source code(zip)
  • v1.1(Oct 30, 2021)

    Changes Since Last Release

    Major Changes

    • tiny-cuda-nn now supports saving and loading snapshots via Trainer::serialize and Trainer::deserialize. These functions produce a nlohmann::json object containing the trained parameters of the model as well as, optionally, the state of the optimizer (to support continued training).

    The intended way to efficiently store the resulting json blob to disk is:

    std::ofstream f("checkpoint.msgpack", std::ios::out | std::ios::binary);
    json::to_msgpack(trainer->serialize(), f);
    

    and to load it again:

    std::ifstream f{"checkpoint.msgpack", std::ios::in | std::ios::binary};
    trainer->deserialize(json::from_msgpack(f));
    
    • tiny-cuda-nn now supports L1-type losses. Four new losses were added: L1, Relative L1, MAPE (Mean Absolute Percentage Error), and SMAPE (Symmetric Mean Absolute Percentage Error).
    • GPUMatrix has been made much less verbose. Column-major matrices now have the type GPUMatrix<T> and row-major matrices GPUMatrix<T, RM>. We also introduced a dynamically laid out matrix type: GPUMatrixDynamic<T>. As a result, the API for dynamically laid out network outputs is now simplified.

    Minor Changes

    • Extends the functionality of Network/NetworkWithInputEncoding to support features such as extraction of neuron activations or gradients of the output w.r.t. the input.
    • Added Squareplus and Softplus activations to FullyFusedMLP.
    • CMake now automatically detects the GPU architecture of the system, simplifying the compilation process for Turing and A100 GPUs (see updated README.md)
    • Removed data_factor from all losses. To achieve the same behavior, please wrap existing losses in a helper class.
    Source code(tar.gz)
    Source code(zip)
Owner
NVIDIA Research Projects
NVIDIA Research Projects
A lightweight C library for artificial neural networks

Getting Started # acquire source code and compile git clone https://github.com/attractivechaos/kann cd kann; make # learn unsigned addition (30000 sam

Attractive Chaos 608 Aug 4, 2022
Convolutional Neural Networks

Darknet Darknet is an open source neural network framework written in C and CUDA. It is fast, easy to install, and supports CPU and GPU computation. D

Joseph Redmon 23.1k Aug 7, 2022
Low dependency(C++11 STL only), good portability, header-only, deep neural networks for embedded

LKYDeepNN LKYDeepNN 可訓練的深度類神經網路 (Deep Neural Network) 函式庫。 輕量,核心部份只依賴 C++11 標準函式庫,低相依性、好移植,方便在嵌入式系統上使用。 Class diagram 附有訓練視覺化 demo 程式 訓練視覺化程式以 OpenCV

Lin Kao-Yuan 42 Aug 6, 2022
Raspberry Pi guitar pedal using neural networks to emulate real amps and pedals.

NeuralPi NeuralPi is a guitar pedal using neural networks to emulate real amps and pedals on a Raspberry Pi 4. The NeuralPi software is a VST3 plugin

Keith Bloemer 674 Aug 10, 2022
An Efficient Implementation of Analytic Mesh Algorithm for 3D Iso-surface Extraction from Neural Networks

AnalyticMesh Analytic Marching is an exact meshing solution from neural networks. Compared to standard methods, it completely avoids geometric and top

Jiabao Lei 39 Jul 12, 2022
An Efficient Implementation of Analytic Mesh Algorithm for 3D Iso-surface Extraction from Neural Networks

AnalyticMesh Analytic Marching is an exact meshing solution from neural networks. Compared to standard methods, it completely avoids geometric and top

null 39 Jul 12, 2022
A header-only C++ library for deep neural networks

MiniDNN MiniDNN is a C++ library that implements a number of popular deep neural network (DNN) models. It has a mini codebase but is fully functional

Yixuan Qiu 317 Aug 5, 2022
InsNet Runs Instance-dependent Neural Networks with Padding-free Dynamic Batching.

InsNet documentation InsNet (documentation) is a powerful neural network library aiming at building instance-dependent computation graphs. It is desig

Chauncey Wang 58 Jun 9, 2022
A framework for generic hybrid two-party computation and private inference with neural networks

MOTION2NX -- A Framework for Generic Hybrid Two-Party Computation and Private Inference with Neural Networks This software is an extension of the MOTI

ENCRYPTO 11 Jul 6, 2022
TS-9 guitar pedal clone using neural networks.

TS-M1N3 TS-M1N3 is a guitar plugin clone of the TS-9 Tubescreamer overdrive pedal. Machine learning was used to train a model of both the drive and to

Keith Bloemer 25 Jul 28, 2022
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.

iNeural A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms. What is a Neural Network? Work on

Fatih Küçükkarakurt 5 Apr 5, 2022
A Tool for Verifying Neural Networks using SMT-Based Model Checking

Project Title QNNVerifier Description A Tool for Verifying Neural Networks using SMT-Based Model Checking. Using Frama-C and ESBMC as the backends. Yo

null 2 Dec 11, 2021
CoDi is a cellular automaton model for spiking neural networks

CoDi CoDi is a cellular automaton (CA) model for spiking neural networks (SNNs). CoDi is an acronym for Collect and Distribute, referring to the signa

Jett LaRue 6 May 5, 2022
A GPU (CUDA) based Artificial Neural Network library

Updates - 05/10/2017: Added a new example The program "image_generator" is located in the "/src/examples" subdirectory and was submitted by Ben Bogart

Daniel Frenzel 91 Jun 13, 2022
International Business Machines 9 Jul 21, 2022
Grouped Feedback Delay Networks for Coupled Room Modeling

Grouped Feedback Delay Networks Reverb Plugin GFDNs connect multiple spaces with different T60 characteristics and a parameterized mixing matrix to co

Orchisama Das 26 Jun 23, 2022
Parallel library for approximate inference on discrete Bayesian networks

baylib C++ library Baylib is a parallel inference library for discrete Bayesian networks supporting approximate inference algorithms both in CPU and G

Massimiliano Pronesti 21 Jun 20, 2022
Computer Networks, [email protected], taught by Hong Xu

CSCI4430, Computer Networks (Spring 2022) Administrivia Schedule Lectures: Mon 12:30pm -- 2:15pm, ERB LT (Zoom link) Tue 4:30pm -- 5:15pm, ERB LT (Zoo

Hong Xu 11 Aug 3, 2022
GPU Cloth TOP in TouchDesigner using CUDA-enabled NVIDIA Flex

This project demonstrates how to use NVIDIA FleX for GPU cloth simulation in a TouchDesigner Custom Operator. It also shows how to render dynamic meshes from the texture data using custom PBR GLSL material shaders inside TouchDesigner.

Vinícius Ginja 37 Jul 27, 2022