Eclipse Deeplearning4J (DL4J) ecosystem is a set of projects intended to support all the needs of a JVM based deep learning application

Overview

Documentation Get help at the community forum javadoc javadoc License GitHub commit activity

The Eclipse Deeplearning4J (DL4J) ecosystem is a set of projects intended to support all the needs of a JVM based deep learning application. This means starting with the raw data, loading and preprocessing it from wherever and whatever format it is in to building and tuning a wide variety of simple and complex deep learning networks.

Because Deeplearning4J runs on the JVM you can use it with a wide variety of JVM based languages other than Java, like Scala, Kotlin, Clojure and many more.

The DL4J stack comprises of:

  • DL4J: High level API to build MultiLayerNetworks and ComputationGraphs with a variety of layers, including custom ones. Supports importing Keras models from h5, including tf.keras models (as of 1.0.0-beta7) and also supports distributed training on Apache Spark
  • ND4J: General purpose linear algebra library with over 500 mathematical, linear algebra and deep learning operations. ND4J is based on the highly-optimized C++ codebase LibND4J that provides CPU (AVX2/512) and GPU (CUDA) support and acceleration by libraries such as OpenBLAS, OneDNN (MKL-DNN), cuDNN, cuBLAS, etc
  • SameDiff : Part of the ND4J library, SameDiff is our automatic differentiation / deep learning framework. SameDiff uses a graph-based (define then run) approach, similar to TensorFlow graph mode. Eager graph (TensorFlow 2.x eager/PyTorch) graph execution is planned. SameDiff supports importing TensorFlow frozen model format .pb (protobuf) models. Import for ONNX, TensorFlow SavedModel and Keras models are planned. Deeplearning4j also has full SameDiff support for easily writing custom layers and loss functions.
  • DataVec: ETL for machine learning data in a wide variety of formats and files (HDFS, Spark, Images, Video, Audio, CSV, Excel etc)
  • LibND4J : C++ library that underpins everything. For more information on how the JVM acceses native arrays and operations refer to JavaCPP

All projects in the DL4J ecosystem support Windows, Linux and macOS. Hardware support includes CUDA GPUs (10.0, 10.1, 10.2 except OSX), x86 CPU (x86_64, avx2, avx512), ARM CPU (arm, arm64, armhf) and PowerPC (ppc64le).

Community Support

For support for the project, please go over to https://community.konduit.ai/

Using Eclipse Deeplearning4J in your project

Deeplearning4J has quite a few dependencies. For this reason we only support usage with a build tool.

<dependencies>
  <dependency>
      <groupId>org.deeplearning4jgroupId>
      <artifactId>deeplearning4j-coreartifactId>
      <version>1.0.0-M1.1version>
  dependency>
  <dependency>
      <groupId>org.nd4jgroupId>
      <artifactId>nd4j-native-platformartifactId>
      <version>1.0.0-M1.1version>
  dependency>
dependencies>

Add these dependencies to your pom.xml file to use Deeplearning4J with the CPU backend. A full standalone project example is available in the example repository, if you want to start a new Maven project from scratch.

A taste of code

Deeplearning4J offers a very high level API for defining even complex neural networks. The following example code shows you how LeNet, a convolutional neural network, is defined in DL4J.

MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
                .seed(seed)
                .l2(0.0005)
                .weightInit(WeightInit.XAVIER)
                .updater(new Adam(1e-3))
                .list()
                .layer(new ConvolutionLayer.Builder(5, 5)
                        .stride(1,1)
                        .nOut(20)
                        .activation(Activation.IDENTITY)
                        .build())
                .layer(new SubsamplingLayer.Builder(PoolingType.MAX)
                        .kernelSize(2,2)
                        .stride(2,2)
                        .build())
                .layer(new ConvolutionLayer.Builder(5, 5)
                        .stride(1,1)
                        .nOut(50)
                        .activation(Activation.IDENTITY)
                        .build())
                .layer(new SubsamplingLayer.Builder(PoolingType.MAX)
                        .kernelSize(2,2)
                        .stride(2,2)
                        .build())
                .layer(new DenseLayer.Builder().activation(Activation.RELU)
                        .nOut(500).build())
                .layer(new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                        .nOut(outputNum)
                        .activation(Activation.SOFTMAX)
                        .build())
                .setInputType(InputType.convolutionalFlat(28,28,1))
                .build();

Documentation, Guides and Tutorials

You can find the official documentation for Deeplearning4J and the other libraries of its ecosystem at http://deeplearning4j.konduit.ai/.

Want some examples?

We have separate repository with various examples available: https://github.com/eclipse/deeplearning4j-examples

Building from source

It is preferred to use the official pre-compiled releases (see above). But if you want to build from source, first take a look at the prerequisites for building from source here: https://deeplearning4j.konduit.ai/multi-project/how-to-guides/build-from-source.

To build everything, we can use commands like

./change-cuda-versions.sh x.x
./change-scala-versions.sh 2.xx
./change-spark-versions.sh x
mvn clean install -Dmaven.test.skip -Dlibnd4j.cuda=x.x -Dlibnd4j.compute=xx

or

mvn -B -V -U clean install -pl  -Dlibnd4j.platform=linux-x86_64 -Dlibnd4j.chip=cuda -Dlibnd4j.cuda=11.0 -Dlibnd4j.compute=
   
     -Djavacpp.platform=linux-x86_64 -Dmaven.test.skip=true

   

An example of GPU "CC" or compute capability is 61 for Titan X Pascal.

License

Apache License 2.0

Commercial Support

Deeplearning4J is actively developed by the team at Konduit K.K..

[If you need any commercial support feel free to reach out to us. at [email protected]

Comments
  • [WIP] Keras upgrades

    [WIP] Keras upgrades

    Work in progress...

    Upgrades to deeplearning4j-keras to be a little better structured and expand the API from Keras to DL4J. Encourages hijacking model methods rather than implementing an actual Keras backend for efficiency and performance.

    Main goals of this PR include:

    • expanding Keras to better support DL4J
    • model saving methods via save_model
    • supporting Keras functional API
    opened by crockpotveggies 103
  • Implement new UI functionality using Play framework

    Implement new UI functionality using Play framework

    _WIP DO NOT MERGE_

    Play framework UI: builds upon earlier StatsListener and StatsStorage work implemented here: https://github.com/deeplearning4j/deeplearning4j/pull/2143

    opened by AlexDBlack 90
  • Fix RBMs and AE

    Fix RBMs and AE

    • Setup vb params to persist and be updated when in pretraining mode. It was skipping the update part
    • Added flag for pretraining to configuration at layer level and set trigger to turn off after layer pretrains. LayerUpdater will skip vb params when running outside pretrain. In previous setup, backprop was hard coded to true in many cases when setting params or gradients and it would skip vb (visual bias) during pretrain phase. In this change, getting the count for params or gradients or updating them will take vb into account. It will just not have any changes applied in the updater when it is not in pretrain mode.
    • HiddenUnit is the activation in RBM - added backpropGradient and derivative for hidden unit in RBM to account for this fact
    • RBM needed a reverse sign on application of gradients for the step function
    • Deprecated unused code in RBM and cleaned up functions in AE that appeared out of date
    • Expanded RBM tests and fixed gradient checks
    opened by nyghtowl 86
  • "A fatal error has been detected by the Java Runtime Environment" when running ParagraphVectors.inferVector(), 1.0.0-alpha

    Issue Description

    I submitted this issue before for dl4j v0.80, and thought it was resolved after upgrading to 1.00-alpha. However when I built a new ParagraphVectors model and called the method inferVector() to infer a batch of new texts, the error came back again. The information about the issue is as follows:

    I'm running DL4J on my personal laptop, within Eclipse IDE. If I saved the ParagraphVectors model to a file and then loaded the model from the same file to call ParagraphVectors.inferVector, I received the error message of "A fatal error has been detected by the Java Runtime Environment". One error report is in attachment.

    I noticed that this issue appears to be more likely to happen when the new text is a (slightly) longer sentence. The data for training the model and new texts are in Simplified Chinese, all being properly processed before using Dl4J.

    The code snippet causing this issue is as follows, within a next() function of a DataSetIterator:

            for(int j=0; j<report.size(); j++){
                String stc = report.get(j);
                // this is where the problem is
                // m_SWV is loaded from a saved model, and proper TokenizerFactory has been set
                INDArray vector = ((ParagraphVectors)m_SWV).inferVector(stc);  
    
                features.put(new INDArrayIndex[]{NDArrayIndex.point(i), NDArrayIndex.all(), NDArrayIndex.point(j)}, vector);
                temp[1] = j;
                featuresMask.putScalar(temp, 1.0); 
            }
    

    Version Information

    Please indicate relevant versions, including, if relevant:

    • Deeplearning4j 1.0.0-alpha
    • platform information (OS, etc): DELL Inspiron 15 laptop with Windows 8 as OS
    • Java version: jdk1.8.0_60

    hs_err_pid4712_jdk1.8_60.log

    Bug Release Burndown 
    opened by xinxu75 85
  • Word2Vec/ParagraphVectors/DeepWalk Spark

    Word2Vec/ParagraphVectors/DeepWalk Spark

    WIP; DO NOT MERGE;

    Word2Vec/ParagraphVectors/DeepWalk implementation for Spark, using VoidParameterServer available in ND4j

    Do not merge before this: https://github.com/deeplearning4j/nd4j/pull/1551

    opened by raver119 82
  • DL4J Hanging after

    DL4J Hanging after "Loaded [JCublasBackend] backend"

    Hi,

    We are running some DL4J code as part of a wider system. This code runs fine on an Alienware development PC with CUDA 9.1 on Ubuntu, run from Eclipse.

    However, when we package this application and run it on a RHEL ppc64le server with CUDA 9.1, we see that ND4J is not doing anything after the following output:

    2309 [pool-8-thread-1] INFO org.nd4j.linalg.factory.Nd4jBackend - Loaded [JCublasBackend] backend

    I have verified we are running the latest NVIDIA drivers and CUDA 9.1 is installed successfully. Below is the output from running the CUDA 9.1 sample deviceQuery, which lists the GPU devices:

     CUDA Device Query (Runtime API) version (CUDART static linking)
    
    Detected 4 CUDA Capable device(s)
    
    Device 0: "Tesla P100-SXM2-16GB"
      CUDA Driver Version / Runtime Version          9.1 / 9.1
      CUDA Capability Major/Minor version number:    6.0
      Total amount of global memory:                 16276 MBytes (17066885120 bytes)
      (56) Multiprocessors, ( 64) CUDA Cores/MP:     3584 CUDA Cores
      GPU Max Clock rate:                            1481 MHz (1.48 GHz)
      Memory Clock rate:                             715 Mhz
      Memory Bus Width:                              4096-bit
      L2 Cache Size:                                 4194304 bytes
      Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
      Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
      Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
      Total amount of constant memory:               65536 bytes
      Total amount of shared memory per block:       49152 bytes
      Total number of registers available per block: 65536
      Warp size:                                     32
      Maximum number of threads per multiprocessor:  2048
      Maximum number of threads per block:           1024
      Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
      Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
      Maximum memory pitch:                          2147483647 bytes
      Texture alignment:                             512 bytes
      Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
      Run time limit on kernels:                     No
      Integrated GPU sharing Host Memory:            No
      Support host page-locked memory mapping:       Yes
      Alignment requirement for Surfaces:            Yes
      Device has ECC support:                        Enabled
      Device supports Unified Addressing (UVA):      Yes
      Supports Cooperative Kernel Launch:            Yes
      Supports MultiDevice Co-op Kernel Launch:      Yes
      Device PCI Domain ID / Bus ID / location ID:   2 / 1 / 0
      Compute Mode:
         < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
    
    Device 1: "Tesla P100-SXM2-16GB"
      CUDA Driver Version / Runtime Version          9.1 / 9.1
      CUDA Capability Major/Minor version number:    6.0
      Total amount of global memory:                 16276 MBytes (17066885120 bytes)
      (56) Multiprocessors, ( 64) CUDA Cores/MP:     3584 CUDA Cores
      GPU Max Clock rate:                            1481 MHz (1.48 GHz)
      Memory Clock rate:                             715 Mhz
      Memory Bus Width:                              4096-bit
      L2 Cache Size:                                 4194304 bytes
      Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
      Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
      Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
      Total amount of constant memory:               65536 bytes
      Total amount of shared memory per block:       49152 bytes
      Total number of registers available per block: 65536
      Warp size:                                     32
      Maximum number of threads per multiprocessor:  2048
      Maximum number of threads per block:           1024
      Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
      Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
      Maximum memory pitch:                          2147483647 bytes
      Texture alignment:                             512 bytes
      Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
      Run time limit on kernels:                     No
      Integrated GPU sharing Host Memory:            No
      Support host page-locked memory mapping:       Yes
      Alignment requirement for Surfaces:            Yes
      Device has ECC support:                        Enabled
      Device supports Unified Addressing (UVA):      Yes
      Supports Cooperative Kernel Launch:            Yes
      Supports MultiDevice Co-op Kernel Launch:      Yes
      Device PCI Domain ID / Bus ID / location ID:   3 / 1 / 0
      Compute Mode:
         < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
    
    Device 2: "Tesla P100-SXM2-16GB"
      CUDA Driver Version / Runtime Version          9.1 / 9.1
      CUDA Capability Major/Minor version number:    6.0
      Total amount of global memory:                 16276 MBytes (17066885120 bytes)
      (56) Multiprocessors, ( 64) CUDA Cores/MP:     3584 CUDA Cores
      GPU Max Clock rate:                            1481 MHz (1.48 GHz)
      Memory Clock rate:                             715 Mhz
      Memory Bus Width:                              4096-bit
      L2 Cache Size:                                 4194304 bytes
      Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
      Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
      Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
      Total amount of constant memory:               65536 bytes
      Total amount of shared memory per block:       49152 bytes
      Total number of registers available per block: 65536
      Warp size:                                     32
      Maximum number of threads per multiprocessor:  2048
      Maximum number of threads per block:           1024
      Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
      Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
      Maximum memory pitch:                          2147483647 bytes
      Texture alignment:                             512 bytes
      Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
      Run time limit on kernels:                     No
      Integrated GPU sharing Host Memory:            No
      Support host page-locked memory mapping:       Yes
      Alignment requirement for Surfaces:            Yes
      Device has ECC support:                        Enabled
      Device supports Unified Addressing (UVA):      Yes
      Supports Cooperative Kernel Launch:            Yes
      Supports MultiDevice Co-op Kernel Launch:      Yes
      Device PCI Domain ID / Bus ID / location ID:   6 / 1 / 0
      Compute Mode:
         < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
    
    Device 3: "Tesla P100-SXM2-16GB"
      CUDA Driver Version / Runtime Version          9.1 / 9.1
      CUDA Capability Major/Minor version number:    6.0
      Total amount of global memory:                 16276 MBytes (17066885120 bytes)
      (56) Multiprocessors, ( 64) CUDA Cores/MP:     3584 CUDA Cores
      GPU Max Clock rate:                            1481 MHz (1.48 GHz)
      Memory Clock rate:                             715 Mhz
      Memory Bus Width:                              4096-bit
      L2 Cache Size:                                 4194304 bytes
      Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
      Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
      Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
      Total amount of constant memory:               65536 bytes
      Total amount of shared memory per block:       49152 bytes
      Total number of registers available per block: 65536
      Warp size:                                     32
      Maximum number of threads per multiprocessor:  2048
      Maximum number of threads per block:           1024
      Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
      Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
      Maximum memory pitch:                          2147483647 bytes
      Texture alignment:                             512 bytes
      Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
      Run time limit on kernels:                     No
      Integrated GPU sharing Host Memory:            No
      Support host page-locked memory mapping:       Yes
      Alignment requirement for Surfaces:            Yes
      Device has ECC support:                        Enabled
      Device supports Unified Addressing (UVA):      Yes
      Supports Cooperative Kernel Launch:            Yes
      Supports MultiDevice Co-op Kernel Launch:      Yes
      Device PCI Domain ID / Bus ID / location ID:   7 / 1 / 0
      Compute Mode:
         < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
    > Peer access from Tesla P100-SXM2-16GB (GPU0) -> Tesla P100-SXM2-16GB (GPU1) : Yes
    > Peer access from Tesla P100-SXM2-16GB (GPU0) -> Tesla P100-SXM2-16GB (GPU2) : No
    > Peer access from Tesla P100-SXM2-16GB (GPU0) -> Tesla P100-SXM2-16GB (GPU3) : No
    > Peer access from Tesla P100-SXM2-16GB (GPU1) -> Tesla P100-SXM2-16GB (GPU0) : Yes
    > Peer access from Tesla P100-SXM2-16GB (GPU1) -> Tesla P100-SXM2-16GB (GPU2) : No
    > Peer access from Tesla P100-SXM2-16GB (GPU1) -> Tesla P100-SXM2-16GB (GPU3) : No
    > Peer access from Tesla P100-SXM2-16GB (GPU2) -> Tesla P100-SXM2-16GB (GPU0) : No
    > Peer access from Tesla P100-SXM2-16GB (GPU2) -> Tesla P100-SXM2-16GB (GPU1) : No
    > Peer access from Tesla P100-SXM2-16GB (GPU2) -> Tesla P100-SXM2-16GB (GPU3) : Yes
    > Peer access from Tesla P100-SXM2-16GB (GPU3) -> Tesla P100-SXM2-16GB (GPU0) : No
    > Peer access from Tesla P100-SXM2-16GB (GPU3) -> Tesla P100-SXM2-16GB (GPU1) : No
    > Peer access from Tesla P100-SXM2-16GB (GPU3) -> Tesla P100-SXM2-16GB (GPU2) : Yes
    
    deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA Runtime Version = 9.1, NumDevs = 4
    Result = PASS
    
    

    Can someone please help us with diagnosing this issue? It seems CUDA is installed correctly but DL4J is not producing any output and the following Java code is just hanging when calling Nd4j.create() for the first time:

    ...
    Nd4j.create()
    ...
    

    Note that this same code works fine on the AlienWare on Ubuntu 64 bit.

    Aha! Link: https://skymindai.aha.io/features/ND4J-143

    DevOps ND4J 
    opened by madhs53 79
  • Feature Request: Add Support for Apple Silicon M1

    Feature Request: Add Support for Apple Silicon M1

    Issue Description

    New Apple Silicon M1 processor yields javacpp.platform of macosx-arm64. These artifacts aren't available in Maven Central Repository which causes builds and IDEs on this new hardware to error/complain.

    See these two forum topics for more information: https://community.konduit.ai/t/support-for-apple-silicon-m1/1168 https://community.konduit.ai/t/compiling-on-arm/283

    Expected behavior: prebuilt jars for macosx-arm64 should exist in maven central repo

    Enhancement ARM 
    opened by bpossolo 77
  • Convert Mat image to INDArray, When trying to convert Mat image to INDArray it is returning me INDArray null

    Convert Mat image to INDArray, When trying to convert Mat image to INDArray it is returning me INDArray null

    I have this code and I do not understand why my IDNarray image is returning me null when I try convert Mat in INDArray. I using the android sutdio 3.0.1.

    //************************* Digit classification *******************************************************************
            for (int i = 0; i < rects.size() ; i++) {
                Rect rect = rects.get(i);
                digit = inverted.submat(rect.y, rect.y + rect.height, rect.x, rect.x + rect.width);
                Imgproc.resize(digit, digit, new Size(28, 28));
    
                    NativeImageLoader nativeImageLoader = new NativeImageLoader(digit.height(), digit.width(), digit.channels());//Use the nativeImageLoader to convert to numerical matrix
                    INDArray image = nativeImageLoader.asMatrix(digit);//put image into INDArray
    
                System.out.println("carregar modelo matrixes  " + image);
     }
    

    output: carregar modelo matrixes NULL

    Bug Enhancement DataVec / ETL 
    opened by AILTON091 76
  • Add CenterLossOutputLayer for efficient training

    Add CenterLossOutputLayer for efficient training

    Work in progress...

    Center loss has proven to be more efficient than triplet loss, and it enables classifier training which is also more speedy than triplets.

    @AlexDBlack can you take a look at CenterLossParamInitializer and confirm it's on the right track? Also, should we just specify numClasses in layer conf? Let's keep discussion in Gitter :)

    opened by crockpotveggies 65
  • Can not run CUDA example on Jetson TX1

    Can not run CUDA example on Jetson TX1

    Issue Description

    deeplearning4jtest-1.0/bin/deeplearning4jtest 10000 10 09:07:35.540 [main] INFO deeplearning4jtest.CSVExample - Build model.... 09:07:35.652 [main] INFO org.nd4j.linalg.factory.Nd4jBackend - Loaded [JCublasBackend] backend Exception in thread "main" java.lang.ExceptionInInitializerError at org.nd4j.jita.concurrency.CudaAffinityManager.getNumberOfDevices(CudaAffinityManager.java:173) at org.nd4j.jita.constant.ConstantProtector.purgeProtector(ConstantProtector.java:36) at org.nd4j.jita.constant.ConstantProtector.(ConstantProtector.java:29) at org.nd4j.jita.constant.ConstantProtector.(ConstantProtector.java:19) at org.nd4j.jita.constant.ProtectedCudaConstantHandler.(ProtectedCudaConstantHandler.java:45) at org.nd4j.jita.constant.CudaConstantHandler.(CudaConstantHandler.java:17) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.nd4j.linalg.factory.Nd4j.initWithBackend(Nd4j.java:5753) at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5694) at org.nd4j.linalg.factory.Nd4j.(Nd4j.java:184) at org.deeplearning4j.nn.conf.NeuralNetConfiguration$Builder.seed(NeuralNetConfiguration.java:677) at deeplearning4jtest.CSVExample.main(CSVExample.java:54) Caused by: java.lang.RuntimeException: ND4J is probably missing dependencies. For more information, please refer to: http://nd4j.org/getstarted.html at org.nd4j.nativeblas.NativeOpsHolder.(NativeOpsHolder.java:51) at org.nd4j.nativeblas.NativeOpsHolder.(NativeOpsHolder.java:19) ... 13 more Caused by: java.lang.UnsatisfiedLinkError: no jnind4jcuda in java.library.path at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867) at java.lang.Runtime.loadLibrary0(Runtime.java:870) at java.lang.System.loadLibrary(System.java:1122) at org.bytedeco.javacpp.Loader.loadLibrary(Loader.java:963) at org.bytedeco.javacpp.Loader.load(Loader.java:764) at org.bytedeco.javacpp.Loader.load(Loader.java:671) at org.nd4j.nativeblas.Nd4jCuda.(Nd4jCuda.java:10) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.bytedeco.javacpp.Loader.load(Loader.java:726) at org.bytedeco.javacpp.Loader.load(Loader.java:671) at org.nd4j.nativeblas.Nd4jCuda$NativeOps.(Nd4jCuda.java:62) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.nd4j.nativeblas.NativeOpsHolder.(NativeOpsHolder.java:29) ... 14 more Caused by: java.lang.UnsatisfiedLinkError: no nd4jcuda in java.library.path at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867) at java.lang.Runtime.loadLibrary0(Runtime.java:870) at java.lang.System.loadLibrary(System.java:1122) at org.bytedeco.javacpp.Loader.loadLibrary(Loader.java:963) at org.bytedeco.javacpp.Loader.load(Loader.java:752) ... 24 more

    Version Information

    Please indicate relevant versions, including, if relevant:

    • Deeplearning4j version - 0.8.0
    • platform information (OS, etc) - Ubuntu 16.04, arm64, Jetson TX1
    • CUDA version, if used - 8.0
    • NVIDIA driver version, if in use -

    Contributing

    If you'd like to help us fix the issue by contributing some code, but would like guidance or help in doing so, please mention it! - I could help, if I can.

    DevOps 
    opened by gospodinbodurov 60
  • libopenblas_nolapack.so.0: cannot open shared object file: No such file or directory

    libopenblas_nolapack.so.0: cannot open shared object file: No such file or directory

    Hello,

    I've just tried to run my application on beta2 and I've got the follow exception: Caused by: java.lang.UnsatisfiedLinkError: /app/.javacpp/cache/openblas-0.3.0-1.4.2-linux-x86_64.jar/org/bytedeco/javacpp/linux-x86_64/libjniopenblas_nolapack.so: libopenblas_nolapack.so.0: cannot open shared object file: No such file or directory

    You can find full stacktrace here - https://gist.github.com/sergmain/0685cda1456721595637def8ca347662

    Few days ago, I opened an issue https://github.com/deeplearning4j/deeplearning4j/issues/6083 Since then the issue was fixed and beta2 was released.

    I rolled back to beta and my application started to work.

    there is stub project for reproducing this problem on heroku https://github.com/sergmain/dl4j-uber-jar It doesn't contain actual keras model in this repo but you can use any.

    Summary: beta - working beta2 - not working target OS - Heroku's PaaS target pratform for DL4J is specified in /.mvn/jvm.config

    Question ND4J 
    opened by sergmain 59
  • Fixes #9869 linear layer equivalencies

    Fixes #9869 linear layer equivalencies

    What changes were proposed in this pull request?

    Fixes linear layer equivalencies, adds test (Please fill in changes proposed in this fix)

    How was this patch tested?

    (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

    Quick checklist

    The following checklist helps ensure your PR is complete:

    • [X ] Eclipse Contributor Agreement signed, and signed commits - see IP Requirements page for details
    • [ X] Reviewed the Contributing Guidelines and followed the steps within.
    • [ X] Created tests for any significant new code additions.
    • [ X] Relevant tests for your changes are passing.
    opened by agibsonccc 0
  • Training using sd.nn.linear and sd.nn.reluLayer doesn't succeed

    Training using sd.nn.linear and sd.nn.reluLayer doesn't succeed

    Issue Description

    • expected behavior: The expressions input.mmul(weights).add(bias) and sd.nn.linear(input, weights, bias) should be equivalent.
    • encountered behavior: In the code below, training will only succeed when the first variant is used: Variant 1 archives 100% accuracy, while variant 2 doesn't get better than random guessing (often even 0% accuracy).

    I also added a third variant below, which uses sd.nn.reluLayer(input, weights, bias). Although this is not equivalent to the other two variants (it additionally has a ReLU activation function) it should nonetheless allow learning the task with high accuracy, but it doesn't (note that the weights are initialized all positive, so the ReLU should not make a difference).

    Caveat: I could only test this with M2.1 due to issue #9862, which is fixed but not yet in SNAPSHOT.

    int batchSize = 32;
    int modelDim = 10;
    
    SameDiff sd = SameDiff.create();
    
    SDVariable features = sd.placeHolder("features", FLOAT, batchSize, modelDim);
    SDVariable labels = sd.placeHolder("labels", FLOAT, batchSize, modelDim);
    SDVariable weights = sd.var("weights", new OneInitScheme('c'), FLOAT, modelDim, modelDim);
    SDVariable bias = sd.zero("bias", modelDim);
    // SDVariable predictions = features.mmul(weights).add("predictions", bias);         // <<< variant 1 (works)
    SDVariable predictions = sd.nn.linear("predictions", features, weights, bias);       // <<< variant 2 (doesn't work)
    // SDVariable predictions = sd.nn.reluLayer("predictions", features, weights, bias); // <<< variant 3 (doesn't work)
    sd.loss.meanSquaredError("loss", labels, predictions, null);
    
    TrainingConfig config = new TrainingConfig.Builder()
            .updater(new Adam(0.1))
            .dataSetFeatureMapping("features")
            .dataSetLabelMapping("labels")
            .build();
    sd.setTrainingConfig(config);
    
    // the task is to reconstruct the one-hot encoded input
    DataSetIterator iterator = new ReconstructionDataSetIterator(new RandomDataSetIterator(100, new long[]{batchSize, modelDim}, new long[]{}, ONE_HOT, ZEROS));
    
    sd.fit(iterator, 10);
    
    Evaluation evaluation = new Evaluation();
    sd.evaluate(iterator, "predictions", evaluation);
    System.out.println(evaluation.stats());
    

    Version Information

    • Deeplearning4j version: 1.0.0-M2.1
    • Platform information (OS, etc): Linux Mint 21
    • CUDA version, if used: N/A
    • NVIDIA driver version, if in use: N/A
    opened by CompilerCrash 2
  • Python4J Futher Performance Optimizations

    Python4J Futher Performance Optimizations

    Follow up ticket for review findings from: https://github.com/eclipse/deeplearning4j/issues/9595#issuecomment-1146085342

    • PythonTypes still inializes an array of types for each conversion, this was one of the biggest performance bottlenecks that I removed in my version by introducing static fields for these (they are immutable anyhow)
    • UncheckedPythonInterpreter should not store retrieved variables in a map always. First this introduces a memory leak because the map has no eviction, second this slows down the path for retrievals where caching is not needed. Would be better if a CachedPythonInterpreter (as a wrapper delegate) is introduced as a separate class to opt-in to caching or to leave the caching in the hands of client that calls the interpreter. Creating an entry in the map and creating a Pair object is also garbage collector overhead (further away from zero-allocation principles). Also dunno if it is thread-safe to share variable instances between interpreters because the ConcurrentHashMap is static instead of inside the ThreadLocal.
    opened by subes 0
  • Support for using sd.grad output as an intermediate variable

    Support for using sd.grad output as an intermediate variable

    Issue Description

    Currently, it is impossible to use the output of sd.grad as a variable for further computations. Consider following class:

    package com.valb3r.idr.networks;
    
    import org.nd4j.autodiff.samediff.SDVariable;
    import org.nd4j.autodiff.samediff.SameDiff;
    import org.nd4j.linalg.api.buffer.DataType;
    import org.nd4j.weightinit.impl.XavierInitScheme;
    
    public class Issue {
    
        public static void main(String[] args) {
            SameDiff sd = SameDiff.create();
            //Create input and label variables
            SDVariable sdfPoint = sd.placeHolder("point", DataType.FLOAT, -1, 3); //Shape: [?, 3]
            SDVariable ray = sd.placeHolder("ray", DataType.FLOAT, -1, 3); //Shape: [?, 3]
            SDVariable expectedColor = sd.placeHolder("expected-color", DataType.FLOAT, -1, 3); //Shape: [?, 3]
    
            SDVariable sdfInput = denseLayer(sd, 10, 3, sdfPoint);
            SDVariable sdf = denseLayer(sd, 3, 10, sdfInput);
            sdf.markAsLoss();
    
            SDVariable idrRenderGradient = sd.grad(sdfPoint.name());
            SDVariable dotGrad = idrRenderGradient.dot(ray); // org.nd4j.autodiff.util.SameDiffUtils.validateDifferentialFunctionSameDiff(SameDiffUtils.java:134)
    
            sd.loss().meanSquaredError(expectedColor, dotGrad, null);
        }
    
        private static SDVariable denseLayer(SameDiff sd, int nOut, int nIn, SDVariable input) {
            SDVariable w = sd.var(input.name() + "-w1", new XavierInitScheme('c', nIn, nOut), DataType.FLOAT, nIn, nOut);
            SDVariable b = sd.zero(input.name() + "-b1", 1, nOut);
            SDVariable z = input.mmul(w).add(b);
            return sd.nn().tanh(z);
        }
    }
    

    Variable idrRenderGradient is expected to be the gradient of sdf variable and should be usable in computation graph, but unfortunately it is not the case, line SDVariable dotGrad = idrRenderGradient.dot(ray); throws an exception:

    Exception in thread "main" java.lang.IllegalStateException
    	at org.nd4j.common.base.Preconditions.checkState(Preconditions.java:253)
    	at org.nd4j.autodiff.util.SameDiffUtils.validateDifferentialFunctionSameDiff(SameDiffUtils.java:134)
    	at org.nd4j.linalg.api.ops.BaseReduceOp.<init>(BaseReduceOp.java:85)
    	at org.nd4j.linalg.api.ops.BaseReduceOp.<init>(BaseReduceOp.java:114)
    

    For self-contained minimum reproducible example, please see: https://github.com/valb3r/same-diff/blob/master/src/main/java/com/valb3r/idr/networks/Issue.java

    For more details on discussion, please see: https://community.konduit.ai/t/using-gradient-as-an-intermediate-sdvariable/1890

    Version Information

    Please indicate relevant versions, including, if relevant:

    • Deeplearning4j version: 1.0.0-M1.1, 1.0.0-M2
    • Platform information: MacOS, Apple Silicon, CPU
    opened by valb3r 0
  • Feature Request: Export Keras Model

    Feature Request: Export Keras Model

    Issue Description

    Currently we can import a model from a HDF5 Keras model file. It would be helpful if we can also do the other way around - exporting a DL4J MultiLayerNetwork or ComputationGraph to a HDF5 Keras-compatible file.

    Version Information

    1.0.0-M2

    opened by bnnthang 2
  • Different PCA.pca_factor results switching between CPU and GPU

    Different PCA.pca_factor results switching between CPU and GPU

    Issue Description

    I'm getting different PCA.pca_factor results switching between CPU and GPU libraries. The GPU results are correct. No code changes are made during the switch.

    Version Information

    DL4J M1.1 Windows 11 11.2 512.59 CPU --> Processor AMD Ryzen 7 3800X 8-Core Processor, 3901 Mhz, 8 Core(s), 16 Logical Processor(s) GPU --> NVIDIA 2080Ti

        <dependency>
            <groupId>org.nd4j</groupId>
            <artifactId>nd4j-cuda-11.2</artifactId>  <!--using nd4j-native gives wrong answer -->
            <version>${dl4j-master.version}</version>
        </dependency>
    
        <dependency>
            <groupId>org.bytedeco</groupId>
            <artifactId>cuda</artifactId>
            <version>11.2-8.1-1.5.5</version>
            <classifier>windows-x86_64-redist</classifier>
        </dependency>
    
    opened by ebremer 1
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.

iNeural A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms. What is a Neural Network? Work on

Fatih Küçükkarakurt 5 Apr 5, 2022
Deep Learning API and Server in C++11 support for Caffe, Caffe2, PyTorch,TensorRT, Dlib, NCNN, Tensorflow, XGBoost and TSNE

Open Source Deep Learning Server & API DeepDetect (https://www.deepdetect.com/) is a machine learning API and server written in C++11. It makes state

JoliBrain 2.4k Dec 30, 2022
Implementation of Univaraint Linear Regresion (Supervised Machine Learning) in c++. With a data set (training set) you can predict outcomes.

Linear-Regression Implementation of Univaraint Linear Regresion (Supervised Machine Learning) in c++. With a data set (training set) you can predict o

vincent laizer 1 Nov 3, 2021
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.

Frog - A Tagger-Lemmatizer-Morphological-Analyzer-Dependency-Parser for Dutch Copyright 2006-2020 Ko van der Sloot, Maarten van Gompel, Antal van den

Language Machines 70 Dec 14, 2022
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

This is the Vowpal Wabbit fast online learning code. Why Vowpal Wabbit? Vowpal Wabbit is a machine learning system which pushes the frontier of machin

Vowpal Wabbit 8.1k Dec 30, 2022
KSAI Lite is a deep learning inference framework of kingsoft, based on tensorflow lite

KSAI Lite English | 简体中文 KSAI Lite是一个轻量级、灵活性强、高性能且易于扩展的深度学习推理框架,底层基于tensorflow lite,定位支持包括移动端、嵌入式以及服务器端在内的多硬件平台。 当前KSAI Lite已经应用在金山office内部业务中,并逐步支持金山

null 80 Dec 27, 2022
TinNet - A compact C++17 based deep learning library.

[email protected] A compact DNN library. Build This project uses Bazel as a build system(1.0 or above required) and compiles with Clang(NOT required, automatic

AcrylicShrimp 19 Oct 12, 2020
A C++ implementation of nx-TAS by hamhub7 intended to make shortcuts easier than before.

C-TAS Documentation Features C-TAS is a C++ implementation of nx-TAS by hamhub7 intended to make shortcuts easier than before. This is a blatant conve

Deltaion Lee 2 Sep 20, 2021
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Apache MXNet (incubating) for Deep Learning Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to m

The Apache Software Foundation 20.2k Dec 31, 2022
Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

Microsoft 17.3k Dec 23, 2022
header only, dependency-free deep learning framework in C++14

The project may be abandoned since the maintainer(s) are just looking to move on. In the case anyone is interested in continuing the project, let us k

tiny-dnn 5.6k Dec 31, 2022
LibDEEP BSD-3-ClauseLibDEEP - Deep learning library. BSD-3-Clause

LibDEEP LibDEEP is a deep learning library developed in C language for the development of artificial intelligence-based techniques. Please visit our W

Joao Paulo Papa 22 Dec 8, 2022
Caffe: a fast open framework for deep learning.

Caffe Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR)/The Berke

Berkeley Vision and Learning Center 33k Dec 30, 2022
Forward - A library for high performance deep learning inference on NVIDIA GPUs

a library for high performance deep learning inference on NVIDIA GPUs.

Tencent 123 Mar 17, 2021
A library for high performance deep learning inference on NVIDIA GPUs.

Forward - A library for high performance deep learning inference on NVIDIA GPUs Forward - A library for high performance deep learning inference on NV

Tencent 509 Dec 17, 2022
Nimble: Physics Engine for Deep Learning

Nimble: Physics Engine for Deep Learning

Keenon Werling 312 Dec 27, 2022
Deploying Deep Learning Models in C++: BERT Language Model

This repository show the code to deploy a deep learning model serialized and running in C++ backend.

null 43 Nov 14, 2022
TFCC is a C++ deep learning inference framework.

TFCC is a C++ deep learning inference framework.

Tencent 113 Dec 23, 2022