Eclipse Deeplearning4J (DL4J) ecosystem is a set of projects intended to support all the needs of a JVM based deep learning application

Overview

Documentation Get help at the community forum javadoc javadoc License GitHub commit activity

The Eclipse Deeplearning4J (DL4J) ecosystem is a set of projects intended to support all the needs of a JVM based deep learning application. This means starting with the raw data, loading and preprocessing it from wherever and whatever format it is in to building and tuning a wide variety of simple and complex deep learning networks.

Because Deeplearning4J runs on the JVM you can use it with a wide variety of JVM based languages other than Java, like Scala, Kotlin, Clojure and many more.

The DL4J stack comprises of:

  • DL4J: High level API to build MultiLayerNetworks and ComputationGraphs with a variety of layers, including custom ones. Supports importing Keras models from h5, including tf.keras models (as of 1.0.0-beta7) and also supports distributed training on Apache Spark
  • ND4J: General purpose linear algebra library with over 500 mathematical, linear algebra and deep learning operations. ND4J is based on the highly-optimized C++ codebase LibND4J that provides CPU (AVX2/512) and GPU (CUDA) support and acceleration by libraries such as OpenBLAS, OneDNN (MKL-DNN), cuDNN, cuBLAS, etc
  • SameDiff : Part of the ND4J library, SameDiff is our automatic differentiation / deep learning framework. SameDiff uses a graph-based (define then run) approach, similar to TensorFlow graph mode. Eager graph (TensorFlow 2.x eager/PyTorch) graph execution is planned. SameDiff supports importing TensorFlow frozen model format .pb (protobuf) models. Import for ONNX, TensorFlow SavedModel and Keras models are planned. Deeplearning4j also has full SameDiff support for easily writing custom layers and loss functions.
  • DataVec: ETL for machine learning data in a wide variety of formats and files (HDFS, Spark, Images, Video, Audio, CSV, Excel etc)
  • LibND4J : C++ library that underpins everything. For more information on how the JVM acceses native arrays and operations refer to JavaCPP

All projects in the DL4J ecosystem support Windows, Linux and macOS. Hardware support includes CUDA GPUs (10.0, 10.1, 10.2 except OSX), x86 CPU (x86_64, avx2, avx512), ARM CPU (arm, arm64, armhf) and PowerPC (ppc64le).

Community Support

For support for the project, please go over to https://community.konduit.ai/

Using Eclipse Deeplearning4J in your project

Deeplearning4J has quite a few dependencies. For this reason we only support usage with a build tool.

<dependencies>
  <dependency>
      <groupId>org.deeplearning4jgroupId>
      <artifactId>deeplearning4j-coreartifactId>
      <version>1.0.0-M1.1version>
  dependency>
  <dependency>
      <groupId>org.nd4jgroupId>
      <artifactId>nd4j-native-platformartifactId>
      <version>1.0.0-M1.1version>
  dependency>
dependencies>

Add these dependencies to your pom.xml file to use Deeplearning4J with the CPU backend. A full standalone project example is available in the example repository, if you want to start a new Maven project from scratch.

A taste of code

Deeplearning4J offers a very high level API for defining even complex neural networks. The following example code shows you how LeNet, a convolutional neural network, is defined in DL4J.

MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
                .seed(seed)
                .l2(0.0005)
                .weightInit(WeightInit.XAVIER)
                .updater(new Adam(1e-3))
                .list()
                .layer(new ConvolutionLayer.Builder(5, 5)
                        .stride(1,1)
                        .nOut(20)
                        .activation(Activation.IDENTITY)
                        .build())
                .layer(new SubsamplingLayer.Builder(PoolingType.MAX)
                        .kernelSize(2,2)
                        .stride(2,2)
                        .build())
                .layer(new ConvolutionLayer.Builder(5, 5)
                        .stride(1,1)
                        .nOut(50)
                        .activation(Activation.IDENTITY)
                        .build())
                .layer(new SubsamplingLayer.Builder(PoolingType.MAX)
                        .kernelSize(2,2)
                        .stride(2,2)
                        .build())
                .layer(new DenseLayer.Builder().activation(Activation.RELU)
                        .nOut(500).build())
                .layer(new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                        .nOut(outputNum)
                        .activation(Activation.SOFTMAX)
                        .build())
                .setInputType(InputType.convolutionalFlat(28,28,1))
                .build();

Documentation, Guides and Tutorials

You can find the official documentation for Deeplearning4J and the other libraries of its ecosystem at http://deeplearning4j.konduit.ai/.

Want some examples?

We have separate repository with various examples available: https://github.com/eclipse/deeplearning4j-examples

Building from source

It is preferred to use the official pre-compiled releases (see above). But if you want to build from source, first take a look at the prerequisites for building from source here: https://deeplearning4j.konduit.ai/multi-project/how-to-guides/build-from-source.

To build everything, we can use commands like

./change-cuda-versions.sh x.x
./change-scala-versions.sh 2.xx
./change-spark-versions.sh x
mvn clean install -Dmaven.test.skip -Dlibnd4j.cuda=x.x -Dlibnd4j.compute=xx

or

mvn -B -V -U clean install -pl  -Dlibnd4j.platform=linux-x86_64 -Dlibnd4j.chip=cuda -Dlibnd4j.cuda=11.0 -Dlibnd4j.compute=
   
     -Djavacpp.platform=linux-x86_64 -Dmaven.test.skip=true

   

An example of GPU "CC" or compute capability is 61 for Titan X Pascal.

License

Apache License 2.0

Commercial Support

Deeplearning4J is actively developed by the team at Konduit K.K..

[If you need any commercial support feel free to reach out to us. at [email protected]

Issues
  • [WIP] Keras upgrades

    [WIP] Keras upgrades

    Work in progress...

    Upgrades to deeplearning4j-keras to be a little better structured and expand the API from Keras to DL4J. Encourages hijacking model methods rather than implementing an actual Keras backend for efficiency and performance.

    Main goals of this PR include:

    • expanding Keras to better support DL4J
    • model saving methods via save_model
    • supporting Keras functional API
    opened by crockpotveggies 103
  • Implement new UI functionality using Play framework

    Implement new UI functionality using Play framework

    _WIP DO NOT MERGE_

    Play framework UI: builds upon earlier StatsListener and StatsStorage work implemented here: https://github.com/deeplearning4j/deeplearning4j/pull/2143

    opened by AlexDBlack 90
  • Fix RBMs and AE

    Fix RBMs and AE

    • Setup vb params to persist and be updated when in pretraining mode. It was skipping the update part
    • Added flag for pretraining to configuration at layer level and set trigger to turn off after layer pretrains. LayerUpdater will skip vb params when running outside pretrain. In previous setup, backprop was hard coded to true in many cases when setting params or gradients and it would skip vb (visual bias) during pretrain phase. In this change, getting the count for params or gradients or updating them will take vb into account. It will just not have any changes applied in the updater when it is not in pretrain mode.
    • HiddenUnit is the activation in RBM - added backpropGradient and derivative for hidden unit in RBM to account for this fact
    • RBM needed a reverse sign on application of gradients for the step function
    • Deprecated unused code in RBM and cleaned up functions in AE that appeared out of date
    • Expanded RBM tests and fixed gradient checks
    opened by nyghtowl 86
  • Word2Vec/ParagraphVectors/DeepWalk Spark

    Word2Vec/ParagraphVectors/DeepWalk Spark

    WIP; DO NOT MERGE;

    Word2Vec/ParagraphVectors/DeepWalk implementation for Spark, using VoidParameterServer available in ND4j

    Do not merge before this: https://github.com/deeplearning4j/nd4j/pull/1551

    opened by raver119 82
  • DL4J Hanging after

    DL4J Hanging after "Loaded [JCublasBackend] backend"

    Hi,

    We are running some DL4J code as part of a wider system. This code runs fine on an Alienware development PC with CUDA 9.1 on Ubuntu, run from Eclipse.

    However, when we package this application and run it on a RHEL ppc64le server with CUDA 9.1, we see that ND4J is not doing anything after the following output:

    2309 [pool-8-thread-1] INFO org.nd4j.linalg.factory.Nd4jBackend - Loaded [JCublasBackend] backend

    I have verified we are running the latest NVIDIA drivers and CUDA 9.1 is installed successfully. Below is the output from running the CUDA 9.1 sample deviceQuery, which lists the GPU devices:

     CUDA Device Query (Runtime API) version (CUDART static linking)
    
    Detected 4 CUDA Capable device(s)
    
    Device 0: "Tesla P100-SXM2-16GB"
      CUDA Driver Version / Runtime Version          9.1 / 9.1
      CUDA Capability Major/Minor version number:    6.0
      Total amount of global memory:                 16276 MBytes (17066885120 bytes)
      (56) Multiprocessors, ( 64) CUDA Cores/MP:     3584 CUDA Cores
      GPU Max Clock rate:                            1481 MHz (1.48 GHz)
      Memory Clock rate:                             715 Mhz
      Memory Bus Width:                              4096-bit
      L2 Cache Size:                                 4194304 bytes
      Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
      Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
      Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
      Total amount of constant memory:               65536 bytes
      Total amount of shared memory per block:       49152 bytes
      Total number of registers available per block: 65536
      Warp size:                                     32
      Maximum number of threads per multiprocessor:  2048
      Maximum number of threads per block:           1024
      Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
      Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
      Maximum memory pitch:                          2147483647 bytes
      Texture alignment:                             512 bytes
      Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
      Run time limit on kernels:                     No
      Integrated GPU sharing Host Memory:            No
      Support host page-locked memory mapping:       Yes
      Alignment requirement for Surfaces:            Yes
      Device has ECC support:                        Enabled
      Device supports Unified Addressing (UVA):      Yes
      Supports Cooperative Kernel Launch:            Yes
      Supports MultiDevice Co-op Kernel Launch:      Yes
      Device PCI Domain ID / Bus ID / location ID:   2 / 1 / 0
      Compute Mode:
         < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
    
    Device 1: "Tesla P100-SXM2-16GB"
      CUDA Driver Version / Runtime Version          9.1 / 9.1
      CUDA Capability Major/Minor version number:    6.0
      Total amount of global memory:                 16276 MBytes (17066885120 bytes)
      (56) Multiprocessors, ( 64) CUDA Cores/MP:     3584 CUDA Cores
      GPU Max Clock rate:                            1481 MHz (1.48 GHz)
      Memory Clock rate:                             715 Mhz
      Memory Bus Width:                              4096-bit
      L2 Cache Size:                                 4194304 bytes
      Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
      Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
      Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
      Total amount of constant memory:               65536 bytes
      Total amount of shared memory per block:       49152 bytes
      Total number of registers available per block: 65536
      Warp size:                                     32
      Maximum number of threads per multiprocessor:  2048
      Maximum number of threads per block:           1024
      Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
      Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
      Maximum memory pitch:                          2147483647 bytes
      Texture alignment:                             512 bytes
      Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
      Run time limit on kernels:                     No
      Integrated GPU sharing Host Memory:            No
      Support host page-locked memory mapping:       Yes
      Alignment requirement for Surfaces:            Yes
      Device has ECC support:                        Enabled
      Device supports Unified Addressing (UVA):      Yes
      Supports Cooperative Kernel Launch:            Yes
      Supports MultiDevice Co-op Kernel Launch:      Yes
      Device PCI Domain ID / Bus ID / location ID:   3 / 1 / 0
      Compute Mode:
         < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
    
    Device 2: "Tesla P100-SXM2-16GB"
      CUDA Driver Version / Runtime Version          9.1 / 9.1
      CUDA Capability Major/Minor version number:    6.0
      Total amount of global memory:                 16276 MBytes (17066885120 bytes)
      (56) Multiprocessors, ( 64) CUDA Cores/MP:     3584 CUDA Cores
      GPU Max Clock rate:                            1481 MHz (1.48 GHz)
      Memory Clock rate:                             715 Mhz
      Memory Bus Width:                              4096-bit
      L2 Cache Size:                                 4194304 bytes
      Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
      Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
      Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
      Total amount of constant memory:               65536 bytes
      Total amount of shared memory per block:       49152 bytes
      Total number of registers available per block: 65536
      Warp size:                                     32
      Maximum number of threads per multiprocessor:  2048
      Maximum number of threads per block:           1024
      Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
      Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
      Maximum memory pitch:                          2147483647 bytes
      Texture alignment:                             512 bytes
      Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
      Run time limit on kernels:                     No
      Integrated GPU sharing Host Memory:            No
      Support host page-locked memory mapping:       Yes
      Alignment requirement for Surfaces:            Yes
      Device has ECC support:                        Enabled
      Device supports Unified Addressing (UVA):      Yes
      Supports Cooperative Kernel Launch:            Yes
      Supports MultiDevice Co-op Kernel Launch:      Yes
      Device PCI Domain ID / Bus ID / location ID:   6 / 1 / 0
      Compute Mode:
         < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
    
    Device 3: "Tesla P100-SXM2-16GB"
      CUDA Driver Version / Runtime Version          9.1 / 9.1
      CUDA Capability Major/Minor version number:    6.0
      Total amount of global memory:                 16276 MBytes (17066885120 bytes)
      (56) Multiprocessors, ( 64) CUDA Cores/MP:     3584 CUDA Cores
      GPU Max Clock rate:                            1481 MHz (1.48 GHz)
      Memory Clock rate:                             715 Mhz
      Memory Bus Width:                              4096-bit
      L2 Cache Size:                                 4194304 bytes
      Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
      Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
      Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
      Total amount of constant memory:               65536 bytes
      Total amount of shared memory per block:       49152 bytes
      Total number of registers available per block: 65536
      Warp size:                                     32
      Maximum number of threads per multiprocessor:  2048
      Maximum number of threads per block:           1024
      Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
      Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
      Maximum memory pitch:                          2147483647 bytes
      Texture alignment:                             512 bytes
      Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
      Run time limit on kernels:                     No
      Integrated GPU sharing Host Memory:            No
      Support host page-locked memory mapping:       Yes
      Alignment requirement for Surfaces:            Yes
      Device has ECC support:                        Enabled
      Device supports Unified Addressing (UVA):      Yes
      Supports Cooperative Kernel Launch:            Yes
      Supports MultiDevice Co-op Kernel Launch:      Yes
      Device PCI Domain ID / Bus ID / location ID:   7 / 1 / 0
      Compute Mode:
         < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
    > Peer access from Tesla P100-SXM2-16GB (GPU0) -> Tesla P100-SXM2-16GB (GPU1) : Yes
    > Peer access from Tesla P100-SXM2-16GB (GPU0) -> Tesla P100-SXM2-16GB (GPU2) : No
    > Peer access from Tesla P100-SXM2-16GB (GPU0) -> Tesla P100-SXM2-16GB (GPU3) : No
    > Peer access from Tesla P100-SXM2-16GB (GPU1) -> Tesla P100-SXM2-16GB (GPU0) : Yes
    > Peer access from Tesla P100-SXM2-16GB (GPU1) -> Tesla P100-SXM2-16GB (GPU2) : No
    > Peer access from Tesla P100-SXM2-16GB (GPU1) -> Tesla P100-SXM2-16GB (GPU3) : No
    > Peer access from Tesla P100-SXM2-16GB (GPU2) -> Tesla P100-SXM2-16GB (GPU0) : No
    > Peer access from Tesla P100-SXM2-16GB (GPU2) -> Tesla P100-SXM2-16GB (GPU1) : No
    > Peer access from Tesla P100-SXM2-16GB (GPU2) -> Tesla P100-SXM2-16GB (GPU3) : Yes
    > Peer access from Tesla P100-SXM2-16GB (GPU3) -> Tesla P100-SXM2-16GB (GPU0) : No
    > Peer access from Tesla P100-SXM2-16GB (GPU3) -> Tesla P100-SXM2-16GB (GPU1) : No
    > Peer access from Tesla P100-SXM2-16GB (GPU3) -> Tesla P100-SXM2-16GB (GPU2) : Yes
    
    deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA Runtime Version = 9.1, NumDevs = 4
    Result = PASS
    
    

    Can someone please help us with diagnosing this issue? It seems CUDA is installed correctly but DL4J is not producing any output and the following Java code is just hanging when calling Nd4j.create() for the first time:

    ...
    Nd4j.create()
    ...
    

    Note that this same code works fine on the AlienWare on Ubuntu 64 bit.

    Aha! Link: https://skymindai.aha.io/features/ND4J-143

    DevOps ND4J 
    opened by madhs53 77
  • Convert Mat image to INDArray, When trying to convert Mat image to INDArray it is returning me INDArray null

    Convert Mat image to INDArray, When trying to convert Mat image to INDArray it is returning me INDArray null

    I have this code and I do not understand why my IDNarray image is returning me null when I try convert Mat in INDArray. I using the android sutdio 3.0.1.

    //************************* Digit classification *******************************************************************
            for (int i = 0; i < rects.size() ; i++) {
                Rect rect = rects.get(i);
                digit = inverted.submat(rect.y, rect.y + rect.height, rect.x, rect.x + rect.width);
                Imgproc.resize(digit, digit, new Size(28, 28));
    
                    NativeImageLoader nativeImageLoader = new NativeImageLoader(digit.height(), digit.width(), digit.channels());//Use the nativeImageLoader to convert to numerical matrix
                    INDArray image = nativeImageLoader.asMatrix(digit);//put image into INDArray
    
                System.out.println("carregar modelo matrixes  " + image);
     }
    

    output: carregar modelo matrixes NULL

    Bug Enhancement DataVec / ETL 
    opened by AILTON091 76
  • Add CenterLossOutputLayer for efficient training

    Add CenterLossOutputLayer for efficient training

    Work in progress...

    Center loss has proven to be more efficient than triplet loss, and it enables classifier training which is also more speedy than triplets.

    @AlexDBlack can you take a look at CenterLossParamInitializer and confirm it's on the right track? Also, should we just specify numClasses in layer conf? Let's keep discussion in Gitter :)

    opened by crockpotveggies 65
  • Can not run CUDA example on Jetson TX1

    Can not run CUDA example on Jetson TX1

    Issue Description

    deeplearning4jtest-1.0/bin/deeplearning4jtest 10000 10 09:07:35.540 [main] INFO deeplearning4jtest.CSVExample - Build model.... 09:07:35.652 [main] INFO org.nd4j.linalg.factory.Nd4jBackend - Loaded [JCublasBackend] backend Exception in thread "main" java.lang.ExceptionInInitializerError at org.nd4j.jita.concurrency.CudaAffinityManager.getNumberOfDevices(CudaAffinityManager.java:173) at org.nd4j.jita.constant.ConstantProtector.purgeProtector(ConstantProtector.java:36) at org.nd4j.jita.constant.ConstantProtector.(ConstantProtector.java:29) at org.nd4j.jita.constant.ConstantProtector.(ConstantProtector.java:19) at org.nd4j.jita.constant.ProtectedCudaConstantHandler.(ProtectedCudaConstantHandler.java:45) at org.nd4j.jita.constant.CudaConstantHandler.(CudaConstantHandler.java:17) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.nd4j.linalg.factory.Nd4j.initWithBackend(Nd4j.java:5753) at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5694) at org.nd4j.linalg.factory.Nd4j.(Nd4j.java:184) at org.deeplearning4j.nn.conf.NeuralNetConfiguration$Builder.seed(NeuralNetConfiguration.java:677) at deeplearning4jtest.CSVExample.main(CSVExample.java:54) Caused by: java.lang.RuntimeException: ND4J is probably missing dependencies. For more information, please refer to: http://nd4j.org/getstarted.html at org.nd4j.nativeblas.NativeOpsHolder.(NativeOpsHolder.java:51) at org.nd4j.nativeblas.NativeOpsHolder.(NativeOpsHolder.java:19) ... 13 more Caused by: java.lang.UnsatisfiedLinkError: no jnind4jcuda in java.library.path at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867) at java.lang.Runtime.loadLibrary0(Runtime.java:870) at java.lang.System.loadLibrary(System.java:1122) at org.bytedeco.javacpp.Loader.loadLibrary(Loader.java:963) at org.bytedeco.javacpp.Loader.load(Loader.java:764) at org.bytedeco.javacpp.Loader.load(Loader.java:671) at org.nd4j.nativeblas.Nd4jCuda.(Nd4jCuda.java:10) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.bytedeco.javacpp.Loader.load(Loader.java:726) at org.bytedeco.javacpp.Loader.load(Loader.java:671) at org.nd4j.nativeblas.Nd4jCuda$NativeOps.(Nd4jCuda.java:62) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.nd4j.nativeblas.NativeOpsHolder.(NativeOpsHolder.java:29) ... 14 more Caused by: java.lang.UnsatisfiedLinkError: no nd4jcuda in java.library.path at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867) at java.lang.Runtime.loadLibrary0(Runtime.java:870) at java.lang.System.loadLibrary(System.java:1122) at org.bytedeco.javacpp.Loader.loadLibrary(Loader.java:963) at org.bytedeco.javacpp.Loader.load(Loader.java:752) ... 24 more

    Version Information

    Please indicate relevant versions, including, if relevant:

    • Deeplearning4j version - 0.8.0
    • platform information (OS, etc) - Ubuntu 16.04, arm64, Jetson TX1
    • CUDA version, if used - 8.0
    • NVIDIA driver version, if in use -

    Contributing

    If you'd like to help us fix the issue by contributing some code, but would like guidance or help in doing so, please mention it! - I could help, if I can.

    DevOps 
    opened by gospodinbodurov 60
  • libopenblas_nolapack.so.0: cannot open shared object file: No such file or directory

    libopenblas_nolapack.so.0: cannot open shared object file: No such file or directory

    Hello,

    I've just tried to run my application on beta2 and I've got the follow exception: Caused by: java.lang.UnsatisfiedLinkError: /app/.javacpp/cache/openblas-0.3.0-1.4.2-linux-x86_64.jar/org/bytedeco/javacpp/linux-x86_64/libjniopenblas_nolapack.so: libopenblas_nolapack.so.0: cannot open shared object file: No such file or directory

    You can find full stacktrace here - https://gist.github.com/sergmain/0685cda1456721595637def8ca347662

    Few days ago, I opened an issue https://github.com/deeplearning4j/deeplearning4j/issues/6083 Since then the issue was fixed and beta2 was released.

    I rolled back to beta and my application started to work.

    there is stub project for reproducing this problem on heroku https://github.com/sergmain/dl4j-uber-jar It doesn't contain actual keras model in this repo but you can use any.

    Summary: beta - working beta2 - not working target OS - Heroku's PaaS target pratform for DL4J is specified in /.mvn/jvm.config

    Question ND4J 
    opened by sergmain 59
  • Deeplearning4j memory leakage

    Deeplearning4j memory leakage

    I deployed an application using Deeplearning4j, it starts with 8Gb memory usage and after about 48h it runs out of memory (32Gb). In a loop it creates new instance of MultiLayerNetwork, do some calculations than drops it. Ends up with java process consuming entire memory, the leakage is somehow outside of java heap.

    I took and modified your official example UCISequenceClassificationExample to reproduce the error. The following example will end up consuming all memory (depends on RAM and CPU) in 40+ hours.

    Also I took screenshots of average memory usage of this example, and it shows the process uses 200mb more memory after 1 hour of run.

    Deeplearning4j 0.8-SNAPSHOT CentOS 7 x64 2x Intel Xeon E5-2420 32Gb RAM Blas vendor MKL

    px

    NetLeak.txt

    Bug 
    opened by SergeyZYX 59
  • Major refactor of Keras model import to support Functional API -> ComputationGraph

    Major refactor of Keras model import to support Functional API -> ComputationGraph

    Near rewrite of Keras model import module (which retains critical backwards compatibility). Witness:

    • KerasLayer class for processing Keras layer configurations and building DL4J Layers
    • KerasModel class for processing Keras model configurations and building ComputationGraphs (for Functional API) and MultiLayerNetworks (for Keras Sequential)
    • KerasModelImport class for reading Keras model archive files (HDF5 archives for models and weights, text files for JSON configurations)

    Backwards compatibility is maintained via the high-level, static model and configuration import functions in Model and ModelConfiguration, respectively. These work exactly as before but have been deprecated so we can remove them in some future release.

    Main new feature is support for Keras the Functional API Model, which maps to DL4J ComputationGraph. We can also now import Keras loss functions (from training configuration) and add a corresponding loss layer. We also added a few additional layers.

    Perhaps most important, new code is much more modular and readable and should be much easier to maintain, debug, and develop. Everything is pretty thoroughly commented and documented, including TODOs.

    opened by turambar 56
  • Adds graalvm support for import cache, optimized helpers in libnd4j

    Adds graalvm support for import cache, optimized helpers in libnd4j

    What changes were proposed in this pull request?

    1. Consolidate all classgraph usages in to 1 static kotlin object called ClassGraphHolder. This object also has utility methods for saving and loading the classgraph configuration.
    2. Make this all accessible and reloadable by reading and writing from JSON
    3. Add more ifdefs for related helpers and builds. (Please fill in changes proposed in this fix)

    How was this patch tested?

    Repeated builds. (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

    Quick checklist

    The following checklist helps ensure your PR is complete:

    • [ X] Eclipse Contributor Agreement signed, and signed commits - see IP Requirements page for details
    • [ X] Reviewed the Contributing Guidelines and followed the steps within.
    • [ X] Created tests for any significant new code additions.
    • [ X] Relevant tests for your changes are passing.
    opened by agibsonccc 0
  • Misleading error messages in InputType class

    Misleading error messages in InputType class

    Issue Description

    Here are the lines 318 .. 346 taken out of org.deeplearning4j.nn.conf.inputs.InputType class:

            public InputTypeConvolutional(@JsonProperty("height") long height, @JsonProperty("width") long width,
                                          @JsonProperty("channels") long channels, @JsonProperty("format") CNN2DFormat format) {
                if(height <= 0) {
                    OneTimeLogger.warn(log,"Assigning height of 0. Normally this is not valid. Exceptions for this are generally related" +
                            "to model import and unknown dimensions");
                }
    
                if(width <= 0) {
                    OneTimeLogger.warn(log,"Assigning height of 0. Normally this is not valid. Exceptions for this are generally related" +
                            "to model import and unknown dimensions");
                }
    
                if(width <= 0) {
                    OneTimeLogger.warn(log,"Assigning width of 0. Normally this is not valid. Exceptions for this are generally related" +
                            "to model import and unknown dimensions");
                }
    
                if(channels <= 0) {
                    OneTimeLogger.warn(log,"Assigning width of 0. Normally this is not valid. Exceptions for this are generally related" +
                            "to model import and unknown dimensions");
                }
    
    
                this.height = height;
                this.width = width;
                this.channels = channels;
                if(format != null)
                    this.format = format;
            }
    

    Out of these 4 error messages, 2 messages (number 2 and 4) are obviously wrong (widthmistaken with height, channelsmistaken with width). Also, check on widthvalue is made twice (checks number 2 and 3), and unfortunately the correct one (number 3) cannot get called.

    Version Information

    • Deeplearning4j version 1.0.0-M2
    opened by hbitteur 0
  • musl libc & Alpine based Docker images compatibility

    musl libc & Alpine based Docker images compatibility

    DeepLearning4J is incompatible with musl libc & Alpine based Docker images

    I tried running a DL4J application inside a Docker container that is based on an Alpine Linux Image (eclipse-temurin:8-alpine). When starting the application, I get a fatal error. The Java VM dies.

    expected behavior: I can run a DL4J application inside a musl libc based Linux Docker container like Alpine encountered behavior: When I run my DL4J application in a Docker container that uses Alpine Linux as its base image, I get a fatal error.

    14:45:20.692 [main] INFO org.nd4j.linalg.factory.Nd4jBackend - Loaded [CpuBackend] backend
    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  SIGSEGV (0xb) at pc=0x00000000000021c6, pid=8, tid=0x00007f3c87a8cb38
    #
    # JRE version: OpenJDK Runtime Environment (8.0_332-b09) (build 1.8.0_332-b09)
    # Java VM: OpenJDK 64-Bit Server VM (25.332-b09 mixed mode linux-amd64 compressed oops)
    # Problematic frame:
    # C  0x00000000000021c6
    #
    # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
    #
    # An error report file with more information is saved as:
    # /opt/dl4j/hs_err_pid8.log
    #
    # If you would like to submit a bug report, please visit:
    #   https://github.com/adoptium/adoptium-support/issues
    #
    Aborted
    

    I attached the file hs_err_pid8.log

    One suspicious log entry is:

    R12=0x00007f3c65aaa558: <offset 0x23e558> in /root/.javacpp/cache/example-1.0.0-SNAPSHOT-shaded.jar/org/bytedeco/openblas/linux-x86_64/libquadmath.so.0 at 0x00007f3c6586c000
    R13=0x00007f3c8891a880: <offset 0x96880> in /lib/ld-musl-x86_64.so.1 at 0x00007f3c88884000
    R14=0x00007f3c87a8cb38 is an unknown value
    

    I guess /root/.javacpp/cache/example-1.0.0-SNAPSHOT-shaded.jar/org/bytedeco/openblas/linux-x86_64/libquadmath.so.0 is linked against glibc, but on Alpine Linux there is no glibc. There is gcompat, but that doesn't fix this issue.

    I ran:

    objdump -x /root/.javacpp/cache/example-1.0.0-SNAPSHOT-shaded.jar/org/bytedeco/openblas/linux-x86_64/libquadmath.so.0
    

    and it shows:

    Version References:
      required from libm.so.6:
        0x09691a75 0x00 05 GLIBC_2.2.5
      required from libc.so.6:
        0x0d696914 0x00 08 GLIBC_2.4
        0x0d696913 0x00 07 GLIBC_2.3
        0x09691a75 0x00 06 GLIBC_2.2.5
        0x06969190 0x00 04 GLIBC_2.10
    

    So I guess these libraries were actually compiled and linked with glibc.

    Version Information

    • Deeplearning4j version: 1.0.0-M2
    • Platform information (OS, etc): Alpine Linux v3.15.4, libc:musl - unknown musl - unknown

    Additional Information

    I attached an example project (example.zip) and corresponding Dockerfiles to replicate the problem. There is a README.md inside with instructions.

    Thanks in advance.

    Regards, Andreas

    opened by schipplock 0
  • Python4J Futher Performance Optimizations

    Python4J Futher Performance Optimizations

    Follow up ticket for review findings from: https://github.com/eclipse/deeplearning4j/issues/9595#issuecomment-1146085342

    • PythonTypes still inializes an array of types for each conversion, this was one of the biggest performance bottlenecks that I removed in my version by introducing static fields for these (they are immutable anyhow)
    • UncheckedPythonInterpreter should not store retrieved variables in a map always. First this introduces a memory leak because the map has no eviction, second this slows down the path for retrievals where caching is not needed. Would be better if a CachedPythonInterpreter (as a wrapper delegate) is introduced as a separate class to opt-in to caching or to leave the caching in the hands of client that calls the interpreter. Creating an entry in the map and creating a Pair object is also garbage collector overhead (further away from zero-allocation principles). Also dunno if it is thread-safe to share variable instances between interpreters because the ConcurrentHashMap is static instead of inside the ThreadLocal.
    opened by subes 0
  • Support for using sd.grad output as an intermediate variable

    Support for using sd.grad output as an intermediate variable

    Issue Description

    Currently, it is impossible to use the output of sd.grad as a variable for further computations. Consider following class:

    package com.valb3r.idr.networks;
    
    import org.nd4j.autodiff.samediff.SDVariable;
    import org.nd4j.autodiff.samediff.SameDiff;
    import org.nd4j.linalg.api.buffer.DataType;
    import org.nd4j.weightinit.impl.XavierInitScheme;
    
    public class Issue {
    
        public static void main(String[] args) {
            SameDiff sd = SameDiff.create();
            //Create input and label variables
            SDVariable sdfPoint = sd.placeHolder("point", DataType.FLOAT, -1, 3); //Shape: [?, 3]
            SDVariable ray = sd.placeHolder("ray", DataType.FLOAT, -1, 3); //Shape: [?, 3]
            SDVariable expectedColor = sd.placeHolder("expected-color", DataType.FLOAT, -1, 3); //Shape: [?, 3]
    
            SDVariable sdfInput = denseLayer(sd, 10, 3, sdfPoint);
            SDVariable sdf = denseLayer(sd, 3, 10, sdfInput);
            sdf.markAsLoss();
    
            SDVariable idrRenderGradient = sd.grad(sdfPoint.name());
            SDVariable dotGrad = idrRenderGradient.dot(ray); // org.nd4j.autodiff.util.SameDiffUtils.validateDifferentialFunctionSameDiff(SameDiffUtils.java:134)
    
            sd.loss().meanSquaredError(expectedColor, dotGrad, null);
        }
    
        private static SDVariable denseLayer(SameDiff sd, int nOut, int nIn, SDVariable input) {
            SDVariable w = sd.var(input.name() + "-w1", new XavierInitScheme('c', nIn, nOut), DataType.FLOAT, nIn, nOut);
            SDVariable b = sd.zero(input.name() + "-b1", 1, nOut);
            SDVariable z = input.mmul(w).add(b);
            return sd.nn().tanh(z);
        }
    }
    

    Variable idrRenderGradient is expected to be the gradient of sdf variable and should be usable in computation graph, but unfortunately it is not the case, line SDVariable dotGrad = idrRenderGradient.dot(ray); throws an exception:

    Exception in thread "main" java.lang.IllegalStateException
    	at org.nd4j.common.base.Preconditions.checkState(Preconditions.java:253)
    	at org.nd4j.autodiff.util.SameDiffUtils.validateDifferentialFunctionSameDiff(SameDiffUtils.java:134)
    	at org.nd4j.linalg.api.ops.BaseReduceOp.<init>(BaseReduceOp.java:85)
    	at org.nd4j.linalg.api.ops.BaseReduceOp.<init>(BaseReduceOp.java:114)
    

    For self-contained minimum reproducible example, please see: https://github.com/valb3r/same-diff/blob/master/src/main/java/com/valb3r/idr/networks/Issue.java

    For more details on discussion, please see: https://community.konduit.ai/t/using-gradient-as-an-intermediate-sdvariable/1890

    Version Information

    Please indicate relevant versions, including, if relevant:

    • Deeplearning4j version: 1.0.0-M1.1, 1.0.0-M2
    • Platform information: MacOS, Apple Silicon, CPU
    opened by valb3r 0
  • Feature Request: Export Keras Model

    Feature Request: Export Keras Model

    Issue Description

    Currently we can import a model from a HDF5 Keras model file. It would be helpful if we can also do the other way around - exporting a DL4J MultiLayerNetwork or ComputationGraph to a HDF5 Keras-compatible file.

    Version Information

    1.0.0-M2

    opened by bnnthang 2
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.

iNeural A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms. What is a Neural Network? Work on

Fatih Küçükkarakurt 5 Apr 5, 2022
Deep Learning API and Server in C++11 support for Caffe, Caffe2, PyTorch,TensorRT, Dlib, NCNN, Tensorflow, XGBoost and TSNE

Open Source Deep Learning Server & API DeepDetect (https://www.deepdetect.com/) is a machine learning API and server written in C++11. It makes state

JoliBrain 2.4k Jun 30, 2022
Implementation of Univaraint Linear Regresion (Supervised Machine Learning) in c++. With a data set (training set) you can predict outcomes.

Linear-Regression Implementation of Univaraint Linear Regresion (Supervised Machine Learning) in c++. With a data set (training set) you can predict o

vincent laizer 1 Nov 3, 2021
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.

Frog - A Tagger-Lemmatizer-Morphological-Analyzer-Dependency-Parser for Dutch Copyright 2006-2020 Ko van der Sloot, Maarten van Gompel, Antal van den

Language Machines 69 Jun 20, 2022
KSAI Lite is a deep learning inference framework of kingsoft, based on tensorflow lite

KSAI Lite English | 简体中文 KSAI Lite是一个轻量级、灵活性强、高性能且易于扩展的深度学习推理框架,底层基于tensorflow lite,定位支持包括移动端、嵌入式以及服务器端在内的多硬件平台。 当前KSAI Lite已经应用在金山office内部业务中,并逐步支持金山

null 75 Apr 14, 2022
TinNet - A compact C++17 based deep learning library.

[email protected] A compact DNN library. Build This project uses Bazel as a build system(1.0 or above required) and compiles with Clang(NOT required, automatic

AcrylicShrimp 19 Oct 12, 2020
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

This is the Vowpal Wabbit fast online learning code. Why Vowpal Wabbit? Vowpal Wabbit is a machine learning system which pushes the frontier of machin

Vowpal Wabbit 8k Jul 1, 2022
A C++ implementation of nx-TAS by hamhub7 intended to make shortcuts easier than before.

C-TAS Documentation Features C-TAS is a C++ implementation of nx-TAS by hamhub7 intended to make shortcuts easier than before. This is a blatant conve

Deltaion Lee 2 Sep 20, 2021
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Apache MXNet (incubating) for Deep Learning Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to m

The Apache Software Foundation 20k Jun 23, 2022
Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

Microsoft 17.2k Jun 24, 2022
header only, dependency-free deep learning framework in C++14

The project may be abandoned since the maintainer(s) are just looking to move on. In the case anyone is interested in continuing the project, let us k

tiny-dnn 5.5k Jun 22, 2022
LibDEEP BSD-3-ClauseLibDEEP - Deep learning library. BSD-3-Clause

LibDEEP LibDEEP is a deep learning library developed in C language for the development of artificial intelligence-based techniques. Please visit our W

Joao Paulo Papa 18 Mar 15, 2022
Caffe: a fast open framework for deep learning.

Caffe Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR)/The Berke

Berkeley Vision and Learning Center 32.7k Jul 1, 2022
Forward - A library for high performance deep learning inference on NVIDIA GPUs

a library for high performance deep learning inference on NVIDIA GPUs.

Tencent 123 Mar 17, 2021
A library for high performance deep learning inference on NVIDIA GPUs.

Forward - A library for high performance deep learning inference on NVIDIA GPUs Forward - A library for high performance deep learning inference on NV

Tencent 502 May 31, 2022
Nimble: Physics Engine for Deep Learning

Nimble: Physics Engine for Deep Learning

Keenon Werling 262 Jun 1, 2022
Deploying Deep Learning Models in C++: BERT Language Model

This repository show the code to deploy a deep learning model serialized and running in C++ backend.

null 42 Mar 24, 2022
TFCC is a C++ deep learning inference framework.

TFCC is a C++ deep learning inference framework.

Tencent 108 May 19, 2022