Plaidml - PlaidML is a framework for making deep learning work everywhere.

Overview

A platform for making deep learning work everywhere.

Documentation | Installation Instructions | Building PlaidML | Contributing | Troubleshooting | Reporting Issues

License Build status

To Our Users

First off, we’d like to thank you for choosing PlaidML. Whether you’re a new user or a multi-year veteran, we greatly appreciate you for the time you’ve spent tinkering around with our source code, sending us feedback, and improving our codebase. PlaidML would truly not be the same without you.

The feedback we have received from our users indicates an ever-increasing need for performance, programmability, and portability. During the past few months, we have been restructuring PlaidML to address those needs. Below is a summary of the biggest changes:

  • We’ve adopted MLIR, an extensible compiler infrastructure that has gained industry-wide adoption since its release in early 2019. MLIR makes it easier to integrate new software and hardware into our compiler stack, as well as making it easier to write optimizations for our compiler.
  • We’ve worked extensively on Stripe, our low-level intermediate representation within PlaidML. Stripe contains optimizations that greatly improve the performance of our compiler. While our work on Stripe began before we decided to use MLIR, we are in the process of fully integrating Stripe into MLIR.
  • We created our C++/Python embedded domain-specific language (EDSL) to improve the programmability of PlaidML.

Today, we’re announcing a new branch of PlaidML — plaidml-v1. This will act as our development branch going forward and will allow us to more rapidly prototype the changes we’re making without breaking our existing user base. As a precaution, please note that certain features, tests, and hardware targets may be broken in plaidml-v1.

You can continue to use code on the master branch or from our releases on PyPI. For your convenience, the contents of our master branch will be released as version 0.7.0. We are keeping the master branch of PlaidML stable and maintaining it until plaidml-v1 is ready for production.

If you’d like to try out some of PlaidML’s newer performance improvements, you can try running PlaidML with the environment variable PLAIDML_USE_STRIPE=1. This will act as a precursor to the changes you’ll be seeing in plaidml-v1, and we’re excited to hear your feedback on Stripe.

Your support means a lot to us. Thank you for being understanding of our new development process during this new and exciting time for deep learning compilers.


PlaidML is an advanced and portable tensor compiler for enabling deep learning on laptops, embedded devices, or other devices where the available computing hardware is not well supported or the available software stack contains unpalatable license restrictions.

PlaidML sits underneath common machine learning frameworks, enabling users to access any hardware supported by PlaidML. PlaidML supports Keras, ONNX, and nGraph.

As a component within the nGraph Compiler stack, PlaidML further extends the capabilities of specialized deep-learning hardware (especially GPUs,) and makes it both easier and faster to access or make use of subgraph-level optimizations that would otherwise be bounded by the compute limitations of the device.

As a component under Keras, PlaidML can accelerate training workloads with customized or automatically-generated Tile code. It works especially well on GPUs, and it doesn't require use of CUDA/cuDNN on Nvidia hardware, while achieving comparable performance.

PlaidML works on all major operating systems: Linux, macOS, and Windows.

If you are using a hardware target not supported by PlaidML by default, such as Clover, check out the instructions at building PlaidML to build a custom configuration to support your hardware.

Prerequisites

  • Python (v2 supported, v3 recommended)
  • OpenCL 1.2 or greater

Quick Start

See the troubleshooting section for solutions to common issues.

virtualenv plaidml
source plaidml/bin/activate
pip install plaidml-keras plaidbench

Choose which accelerator you'd like to use (many computers, especially laptops, have multiple):

plaidml-setup

Next, try benchmarking MobileNet inference performance:

plaidbench keras mobilenet

Or, try training MobileNet:

plaidbench --batch-size 16 keras --train mobilenet

Installation Instructions

We support a variety of operating systems and installation methods.

Demos and Related Projects

Plaidbench

Plaidbench is a performance testing suite designed to help users compare the performance of different cards and different frameworks.

Hello VGG

One of the great things about Keras is how easy it is to play with state of the art networks. Here's all the code you need to run VGG-19:

#!/usr/bin/env python

import numpy as np
import os
import time

os.environ["KERAS_BACKEND"] = "plaidml.keras.backend"

import keras
import keras.applications as kapp
from keras.datasets import cifar10

(x_train, y_train_cats), (x_test, y_test_cats) = cifar10.load_data()
batch_size = 8
x_train = x_train[:batch_size]
x_train = np.repeat(np.repeat(x_train, 7, axis=1), 7, axis=2)
model = kapp.VGG19()
model.compile(optimizer='sgd', loss='categorical_crossentropy',
              metrics=['accuracy'])

print("Running initial batch (compiling tile program)")
y = model.predict(x=x_train, batch_size=batch_size)

# Now start the clock and run 10 batches
print("Timing inference...")
start = time.time()
for i in range(10):
    y = model.predict(x=x_train, batch_size=batch_size)
print("Ran in {} seconds".format(time.time() - start))

Reporting Issues

Either open a ticket on GitHub or join our slack channel (#plaidml).

CI & Validation

Validated Hardware

A comprehensive set of tests for each release are run against the hardware targets listed below.

  • AMD

    • R9 Nano
    • RX 480
    • Vega 10
  • Intel

    • HD4000
    • HD Graphics 505
  • NVIDIA

    • K80
    • GT 640M
    • GTX 1050
    • GTX 1070

Validated Networks

We support all of the Keras application networks from current versions of 2.x. Validated networks are tested for performance and correctness as part of our continuous integration system.

  • CNNs

    • Inception v3
    • ResNet50
    • VGG19
    • Xception
    • MobileNet
    • DenseNet
    • ShuffleNet
  • LSTM

    • examples/imdb_lstm.py (from keras)
Issues
  • [macOS] model.fit() loss: nan

    [macOS] model.fit() loss: nan

    Ran mnist_cnn.py from keras/examples after adding plaidml as the backend. This issue affects many others, but this is the simplest example.

    Will run fine for a while, then loss will hit nan and acc will plummet until it hits 0, where it stays.

    Andys-iMac-2:examples andy$ python mnist_cnn.py x_train shape: (60000, 28, 28, 1) 60000 train samples 10000 test samples INFO:plaidml:Opening device "amd_radeon_pro_580_compute_engine.0 Train on 60000 samples, validate on 10000 samples Epoch 1/12 59776/60000 [============================>.] - ETA: 0s - loss: 0.3177 - acc: 0.9025INFO:plaidml:Analyzing Ops: 85 of 285 operations complete 60000/60000 [==============================] - 27s - loss: 0.3172 - acc: 0.9026 - val_loss: 0.2699 - val_acc: 0.9217 Epoch 2/12 60000/60000 [==============================] - 18s - loss: 0.1104 - acc: 0.9666 - val_loss: 0.2247 - val_acc: 0.9308 Epoch 3/12 60000/60000 [==============================] - 19s - loss: nan - acc: 0.5408 - val_loss: nan - val_acc: 0.0000e+00 Epoch 4/12 60000/60000 [==============================] - 19s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 5/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 6/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 7/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 8/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 9/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 10/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 11/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 12/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Test loss: nan Test accuracy: 0.0

    opened by andyoneal 28
  • trying to implement ReflectionPadding2D

    trying to implement ReflectionPadding2D

    finally I implemented it in one op for B,H,W,C

    class ReflectionPadding2D(PMLTile.Operation):
        def __init__(self, input, h_pad, w_pad):
            if K.image_data_format() == 'channels_last':
                if input.shape.ndims == 4:
                    H, W = input.shape.dims[1:3]
                    if (type(H) == int and h_pad >= H) or \
                       (type(W) == int and w_pad >= W):
                        raise ValueError("Paddings must be less than dimensions.")
                    c = """ function (I[B, H, W, C] ) -> (O) {{
                            WE = W + {w_pad}*2;
                            HE = H + {h_pad}*2;
                        """.format(h_pad=h_pad, w_pad=w_pad)
                    if w_pad > 0:
                        c += """
                            LEFT_PAD [b, h, w , c : B, H, WE, C ] = =(I[b, h, {w_pad}-w,            c]), w < {w_pad} ;
                            HCENTER  [b, h, w , c : B, H, WE, C ] = =(I[b, h, w-{w_pad},            c]), w < W+{w_pad}-1 ;
                            RIGHT_PAD[b, h, w , c : B, H, WE, C ] = =(I[b, h, 2*W - (w-{w_pad}) -2, c]);
                            LCR = LEFT_PAD+HCENTER+RIGHT_PAD;
                        """.format(h_pad=h_pad, w_pad=w_pad)
                    else:
                        c += "LCR = I;"
                    if h_pad > 0:
                        c += """
                            TOP_PAD   [b, h, w , c : B, HE, WE, C ] = =(LCR[b, {h_pad}-h,            w, c]), h < {h_pad};
                            VCENTER   [b, h, w , c : B, HE, WE, C ] = =(LCR[b, h-{h_pad},            w, c]), h < H+{h_pad}-1 ;
                            BOTTOM_PAD[b, h, w , c : B, HE, WE, C ] = =(LCR[b, 2*H - (h-{h_pad}) -2, w, c]);
                            TVB = TOP_PAD+VCENTER+BOTTOM_PAD;
                        """.format(h_pad=h_pad, w_pad=w_pad)
                    else:
                        c += "TVB = LCR;"
                    c += "O = TVB; }"
                    inp_dims = input.shape.dims
                    out_dims = (inp_dims[0], inp_dims[1]+h_pad*2, inp_dims[2]+w_pad*2, inp_dims[3])
                else:
                    raise NotImplemented
            else:
                raise NotImplemented
            super(ReflectionPadding2D, self).__init__(c, [('I', input) ],
                    [('O', PMLTile.Shape(input.shape.dtype, out_dims ) )])
    

    also I implemented it via slice and concat but I suppose it will consume more VRAM for this? or am I wrong??

    class ReflectionPadding2D():
        def __init__(self, h_pad, w_pad):
            self.h_pad, self.w_pad = h_pad, w_pad
        def __call__(self, inp):
            h_pad, w_pad = self.h_pad, self.w_pad
            if K.image_data_format() == 'channels_last':
                if inp.shape.ndims == 4:
                    w = K.concatenate ([ inp[:,:,w_pad:0:-1,:],
                                         inp,
                                         inp[:,:,-2:-w_pad-2:-1,:] ], axis=2 )
                    h = K.concatenate ([ w[:,h_pad:0:-1,:,:],
                                         w,
                                         w[:,-2:-h_pad-2:-1,:,:] ], axis=1 )
                    return h
                else:
                    raise NotImplemented
            else:
                raise NotImplemented
    
    needs integration 
    opened by iperov 27
  • Memory error on Vega 10

    Memory error on Vega 10

    Hi I am trying plaid ml on AMD Vega 10 : gfx900

    I get the following error:

    [email protected]:~/biswa/plaidbench$ python plaidbench.py mobilenet Using PlaidML backend. INFO:plaidml:Initializing device gfx900.0: "gfx900", vendor "Advanced Micro Devi ces, Inc." INFO:plaidml:Initializing device gfx900.1: "gfx900", vendor "Advanced Micro Devi ces, Inc." INFO:plaidml:Initializing device gfx900.2: "gfx900", vendor "Advanced Micro Devi ces, Inc." INFO:plaidml:Initializing device gfx900.3: "gfx900", vendor "Advanced Micro Devi ces, Inc." INFO:plaidml:Opening device "gfx900.3": "Advanced Micro Devices, Inc. gfx900"

    Model loaded. Compiling and running initial batch, batch_size=1 Warmup Memory access fault by GPU node-7 on address 0x4408bd6000. Reason: Page not pres ent or supervisor privilege. Aborted (core dumped)

    Any idea how to resolve this?

    Thanks, Biswa

    opened by biswagsingh 26
  • plaidml.exceptions.PlaidMLError: Could not find PlaidML configuration file:

    plaidml.exceptions.PlaidMLError: Could not find PlaidML configuration file: "experimental.json".

    Traceback (most recent call last): File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.7_3.7.1264.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 193, in run_module_as_main "main", mod_spec) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.7_3.7.1264.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 85, in run_code exec(code, run_globals) File "C:\Users\andre\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\Scripts\plaidml-setup.exe_main.py", line 5, in File "C:\Users\andre\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\plaidml_init.py", line 50, in import plaidml.settings File "C:\Users\andre\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\plaidml\settings.py", line 33, in _setup_config('PLAIDML_EXPERIMENTAL_CONFIG', 'experimental.json') File "C:\Users\andre\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\plaidml\settings.py", line 30, in _setup_config 'Could not find PlaidML configuration file: "{}".'.format(filename)) plaidml.exceptions.PlaidMLError: Could not find PlaidML configuration file: "experimental.json".

    opened by Duddino 25
  • "CL_OUT_OF_HOST_MEMORY" error when command "plaidml-setup"

    Hello again, I'm experiencing a new issue with the 0.6.0 rc1 version of the plaidml. Using 0.5 led to this issue: https://github.com/plaidml/plaidml/issues/73. Any luck of solving it?

    opened by iamkucuk 23
  • Feature request - port to Python 3.6

    Feature request - port to Python 3.6

    I've got PlaidML running on my AMD Bonaire on Arch Linux with Python 2.7 in a Conda environment. Every other Python package I have runs with 3.6 and my goal is to keep it that way. ;-)

    There doesn't seem to even be a pip package for 3.6, so the pip install -U plaidml-keras fails with Python 3.6. If you can post build-from-GitHub-source instructions, I can make a local package and install it.

    P.S.: Let me know if you want Arch setup instructions for AMD GPUs. Most of it is on the Arch User Repository wiki but I've got some scripts that do the work.

    P.P.S.: Benchmark results

    Using PlaidML backend.
    INFO:plaidml:Initializing device bonaire.0: "Bonaire", vendor "Advanced Micro Devices, Inc."
    INFO:plaidml:Opening device "bonaire.0": "Advanced Micro Devices, Inc. Bonaire"
    Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.6/mobilenet_1_0_224_tf.h5
    16793600/17225924 [============================>.] - ETA: 0s 
    Model loaded.
    Compiling and running initial batch, batch_size=1
    Warmup
    Doing the main timing
    Example finished, elapsed: 6.821215868 (compile), 15.0223557949 (execution)
    
    opened by znmeb 21
  • Mac+AMD: AMD not detected and Intel uses too high of a work group

    Mac+AMD: AMD not detected and Intel uses too high of a work group

    iMac 2017 with a Radeon Pro 580 and a Core i5-7600K. Compiled and installed PlaidML from source. Installed via the pip wheel.

    Ran plaidml-setup:

    PlaidML Setup (0.0.0.dev0)

    Thanks for using PlaidML!

    Some Notes:

    • Bugs and other issues: https://github.com/plaidml/plaidml
    • Questions: https://stackoverflow.com/questions/tagged/plaidml
    • Say hello: https://groups.google.com/forum/#!forum/plaidml-dev
    • PlaidML is licensed under the GNU AGPLv3

    Default Config Devices: No devices.

    Experimental Config Devices: intel(r)_core(tm)i5-7600k_cpu@_3.80ghz.0 : Intel Intel(R) Core(TM) i5-7600K CPU @ 3.80GHz

    Using experimental devices can cause poor performance, crashes, and other nastiness. Enable experimental device support? (y,n)[n]:y

    PlaidML sends anonymous usage statistics to help guide improvements. We'd love your help making it better.

    Enable telemetry reporting? (y,n)[y]:y

    Almost done. Multiplying some matrices... Tile code: function (B[X,Z], C[Z,Y]) -> (A) { A[x,y : X,Y] = +(B[x,z] * C[z,y]); } ERROR:plaidml:OpenCL: [CL_INVALID_WORK_GROUP_SIZE] : OpenCL Error : clEnqueueNDRangeKernel failed: total work group size (32) is greater than the device can support (1) (cb=12) Whew. That worked.

    Save settings to /Users/andy/.plaidml? (y,n)[y]:y Success!

    Should a gpu be detected at this point? Is there somewhere I can lower total work group size manually?

    New to submitting git issues. Sorry if I'm missing anything.

    opened by andyoneal 19
  • PlaidML Setup Issue Windows

    PlaidML Setup Issue Windows

    Hi, Running plaidml-setup gives me the following:

    PlaidML Setup (0.3.5)

    Thanks for using PlaidML!

    Some Notes:

    • Bugs and other issues: https://github.com/plaidml/plaidml
    • Questions: https://stackoverflow.com/questions/tagged/plaidml
    • Say hello: https://groups.google.com/forum/#!forum/plaidml-dev
    • PlaidML is licensed under the GNU AGPLv3

    No OpenCL devices found. Check driver installation. Read the helpful, easy driver installation instructions from our README: http://github.com/plaidml/plaidml

    This is the output from clinfo: Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 2.1 AMD-APP (2766.5) Platform Name: AMD Accelerated Parallel Processing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_amd_event_callback cl_amd_offline_devices

    Platform Name: AMD Accelerated Parallel Processing Number of devices: 1 Device Type: CL_DEVICE_TYPE_GPU Vendor ID: 1002h Board name: Radeon RX 580 Series Device Topology: PCI[ B#1, D#0, F#0 ] Max compute units: 36 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 256 Preferred vector width char: 4 Preferred vector width short: 2 Preferred vector width int: 1 Preferred vector width long: 1 Preferred vector width float: 1 Preferred vector width double: 1 Native vector width char: 4 Native vector width short: 2 Native vector width int: 1 Native vector width long: 1 Native vector width float: 1 Native vector width double: 1 Max clock frequency: 1340Mhz Address bits: 64 Max memory allocation: 4244635648 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 64 Max image 2D width: 16384 Max image 2D height: 16384 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 2048 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: Read/Write Cache line size: 64 Cache size: 16384 Global memory size: 8589934592 Constant buffer size: 4244635648 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Max pipe arguments: 16 Max pipe active reservations: 16 Max pipe packet size: 4244635648 Max global variable size: 3820172032 Max global variable preferred total size: 8589934592 Max read/write image args: 64 Max on device events: 1024 Queue on device max size: 8388608 Max on device queues: 1 Queue on device preferred size: 262144 SVM capabilities: Coarse grain buffer: Yes Fine grain buffer: Yes Fine grain system: No Atomics: No Preferred platform atomic alignment: 0 Preferred global atomic alignment: 0 Preferred local atomic alignment: 0 Kernel Preferred work group size multiple: 64 Error correction support: 0 Unified memory for Host and Device: 0 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue on Host properties: Out-of-Order: No Profiling : Yes Queue on Device properties: Out-of-Order: Yes Profiling : Yes Platform ID: 00007FFEC2C66FD0 Name: Ellesmere Vendor: Advanced Micro Devices, Inc. Device OpenCL C version: OpenCL C 2.0 Driver version: 2766.5 Profile: FULL_PROFILE Version: OpenCL 2.0 AMD-APP (2766.5) Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_khr_gl_depth_images cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_amd_liquid_flash cl_amd_planar_yuv

    Shouldn't it be working? I just switched to a new computer, so I used to use NVIDIA with CUDA. Any help is appreciated!

    Note: I do have the most recent AMD driver installed.

    opened by YutaTakano 16
  • could not broadcast input array from shape (3,2048) into shape (6144)

    could not broadcast input array from shape (3,2048) into shape (6144)

    I just installed plaidml and i tried to run this example:

    #!/usr/bin/env python
    
    import plaidml.keras
    plaidml.keras.install_backend() 
    
    import numpy as np
    import matplotlib.pyplot as plt
    from keras.models import Sequential
    from keras.layers.core import Dense, Activation, Dropout
    from keras.datasets import mnist
    from keras.utils import np_utils
    
    # fix a random seed for reproducibility
    np.random.seed(9)
    
    # user inputs
    nb_epoch = 25
    num_classes = 10
    batch_size = 128
    train_size = 60000
    test_size = 10000
    v_length = 784
    
    # split the mnist data into train and test
    (trainData, trainLabels), (testData, testLabels) = mnist.load_data()
    
    
    # reshape the dataset
    trainData = trainData.reshape(train_size, v_length)
    testData = testData.reshape(test_size, v_length)
    trainData = trainData.astype("float32")
    testData = testData.astype("float32")
    trainData /= 255
    testData /= 255
    
    
    # convert class vectors to binary class matrices --> one-hot encoding
    mTrainLabels = np_utils.to_categorical(trainLabels, num_classes)
    mTestLabels = np_utils.to_categorical(testLabels, num_classes)
    
    # create the model
    model = Sequential()
    model.add(Dense(512, input_shape=(784,)))
    model.add(Activation("relu"))
    model.add(Dropout(0.2))
    model.add(Dense(256))
    model.add(Activation("relu"))
    model.add(Dropout(0.2))
    model.add(Dense(num_classes))
    model.add(Activation("softmax"))
    
    # summarize the model
    model.summary()
    
    # compile the model
    model.compile(loss="categorical_crossentropy",
    			  optimizer="adam",
    			  metrics=["accuracy"])
    
    # fit the model
    history = model.fit(trainData, 
    				 	mTrainLabels,
    					validation_data=(testData, mTestLabels),
    					batch_size=batch_size,
    					nb_epoch=nb_epoch,
    					verbose=2)
    
    # print the history keys
    
    
    # evaluate the model
    scores = model.evaluate(testData, mTestLabels, verbose=0)
    
    # history plot for accuracy
    plt.plot(history.history["acc"])
    plt.plot(history.history["val_acc"])
    plt.title("Model Accuracy")
    plt.xlabel("Epoch")
    plt.ylabel("Accuracy")
    plt.legend(["train", "test"], loc="upper left")
    plt.show()
    
    # history plot for accuracy
    plt.plot(history.history["loss"])
    plt.plot(history.history["val_loss"])
    plt.title("Model Loss")
    plt.xlabel("Epoch")
    plt.ylabel("Loss")
    plt.legend(["train", "test"], loc="upper left")
    plt.show()
    
    

    and I got this error

    could not broadcast input array from shape (3,2048) into shape (6144)

    Then I tried running Hello VGG example from plaidml github page and I got the same error.

    I am using plaidml 0.3.4 on ubuntu in virtualenv and I am trying to run this code on rx 480.

    Tnx for help.

    opened by leon3428 16
  • plaidml.exceptions.Unknown: Duplicate updates

    plaidml.exceptions.Unknown: Duplicate updates

    Setup:

    sudo apt-get install clinfo
    clinfo [sees 1080ti]
    sudo pip install -U plaidml-keras
    plaidml-setup
    [insert before keras import:]
    import plaidml.keras
    plaidml.keras.install_backend()
    

    But, intermediate problem:

     ImportError: No module named plaidml.keras
    $ which python
    /home/phobrain/anaconda2/bin//python
    

    Fix:

    sys.path.append('/usr/local/lib/python2.7/dist-packages/')
    import plaidml.keras
    plaidml.keras.install_backend()
    

    'Real' issue being reported:

    File "siaconv.py", line 919, in doit epochs=epochs) File "/home/phobrain/anaconda2/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 87, in wrapper return func(*args, **kwargs) File "/home/phobrain/anaconda2/lib/python2.7/site-packages/keras/engine/training.py", line 1926, in fit_generator self._make_train_function() File "/home/phobrain/anaconda2/lib/python2.7/site-packages/keras/engine/training.py", line 967, in _make_train_function **self._function_kwargs) File "/usr/local/lib/python2.7/dist-packages/plaidml/keras/backend.py", line 1718, in function return _Function(inputs, outputs, updates, name) File "/usr/local/lib/python2.7/dist-packages/plaidml/keras/backend.py", line 931, in init c.add_update(_plaidml_val(var), _plaidml_val(newval)) File "/usr/local/lib/python2.7/dist-packages/plaidml/init.py", line 1289, in add_update _lib().plaidml_add_composer_update(self, dest, src) File "/usr/local/lib/python2.7/dist-packages/plaidml/init.py", line 674, in _check_err self.raise_last_status() File "/usr/local/lib/python2.7/dist-packages/plaidml/library.py", line 136, in raise_last_status raise self.last_status() plaidml.exceptions.Unknown: Duplicate updates

    model.fit_generator(
            myGen('data', tr_pairs, tr_y, batch_size, True),
            (len(tr_pairs)-1) / batch_size,
            validation_data=myGen('valid', te_pairs, te_y, batch_size, True),
            validation_steps=1,
            max_queue_size=2,
            workers=1,
            epochs=epochs)
    

    Net:

    KERNEL_INIT = 'glorot_normal'
    
        seq.add(Dense(dense_size, input_shape=input_shape,
                    activation='relu', kernel_initializer=KERNEL_INIT))
        seq.add(BatchNormalization())
        seq.add(Dense((dense_size*2)/3,
                activation='relu',
                kernel_initializer=KERNEL_INIT))
        seq.add(Dropout(0.1, seed=SEED))
        seq.add(Dense(dense_size/4,
                activation='relu',
                kernel_initializer=KERNEL_INIT))
        seq.add(Dense((dense_size*2)/3,
                activation='relu',
                kernel_initializer=KERNEL_INIT))
        seq.add(Dense(dense_size,
                activation='relu',
                kernel_initializer=KERNEL_INIT))
        seq.add(Dense(512,
                    activation='relu',
                    kernel_initializer=KERNEL_INIT))
        seq.add(Dense(256,
                    activation='relu',
                    kernel_initializer=KERNEL_INIT))
        seq.add(Dense(128,
                    activation='relu',
                    kernel_initializer=KERNEL_INIT))
        seq.add(Dense(256,
                    activation='relu',
                    kernel_initializer=KERNEL_INIT))
        seq.add(Dense(128,
                    activation='relu',
                    kernel_initializer=KERNEL_INIT))
    
    opened by phobrain 16
  • Plaidml not detecting Mali-T628 on ARM

    Plaidml not detecting Mali-T628 on ARM

    Hi,

    I've build plaidml 0.3.5 to use on Odroid XU4 with Mali-T628 GPU with debian stretch. I manage to install the wheel, when I run plaidml-setup, I get:

    "No supported devices found. Run 'clinfo' and file an issue containing the full output."

    However, with plaidml 0.3.0rc1 latest available with pip install plaidml, my devices can be configured and I have 2 mali-t628 reported. "experimental.json" appears quite similar in both cases.

    Any clue with what I may have done wrong building plaidml ? (used basel 0.18.1 with --config linux_arm_32v7) or what change might explain 0.3.5 not recognizing my devices where 0.3.0rc1 did ?

    Thanks

    Here's my clinfo report:

    Number of platforms 1 Platform Name ARM Platform Platform Vendor ARM Platform Version OpenCL 1.2 v1.r12p0-04rel0.03af15950392f3702b248717f4938b82 Platform Profile FULL_PROFILE Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_gl_sharing cl_khr_icd cl_khr_egl_event cl_khr_egl_image cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory Platform Extensions function suffix ARM

    Platform Name ARM Platform Number of devices 2 Device Name Mali-T628 Device Vendor ARM Device Vendor ID 0x6200010 Device Version OpenCL 1.2 v1.r12p0-04rel0.03af15950392f3702b248717f4938b82 Driver Version 1.2 Device OpenCL C Version OpenCL C 1.2 v1.r12p0-04rel0.03af15950392f3702b248717f4938b82 Device Type GPU Device Profile FULL_PROFILE Max compute units 4 Max clock frequency 600MHz Device Partition (core) Max number of sub-devices 0 Supported partition types None Max work item dimensions 3 Max work item sizes 256x256x256 Max work group size 256 Preferred work group size multiple 4 Preferred / native vector sizes
    char 16 / 16
    short 8 / 8
    int 4 / 4
    long 2 / 2
    half 8 / 8 (cl_khr_fp16) float 4 / 4
    double 2 / 2 (cl_khr_fp64) Half-precision Floating-point support (cl_khr_fp16) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Single-precision Floating-point support (core) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Address bits 64, Little-Endian Global memory size 2090405888 (1.947GiB) Error Correction support No Max memory allocation 522601472 (498.4MiB) Unified memory for Host and Device Yes Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Global Memory cache type Read/Write Global Memory cache size <printDeviceInfo:89: get CL_DEVICE_GLOBAL_MEM_CACHE_SIZE : error -30> Global Memory cache line 64 bytes Image support Yes Max number of samplers per kernel 16 Max size for 1D images from buffer 65536 pixels Max 1D or 2D image array size 2048 images Max 2D image size 65536x65536 pixels Max 3D image size 65536x65536x65536 pixels Max number of read image args 128 Max number of write image args 8 Local memory type Global Local memory size 32768 (32KiB) Max constant buffer size 65536 (64KiB) Max number of constant args 8 Max size of kernel argument 1024 Queue properties
    Out-of-order execution Yes Profiling Yes Prefer user sync for interop No Profiling timer resolution 1000ns Execution capabilities
    Run OpenCL kernels Yes Run native kernels No printf() buffer size 1048576 (1024KiB) Built-in kernels
    Device Available Yes Compiler Available Yes Linker Available Yes Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_gl_sharing cl_khr_icd cl_khr_egl_event cl_khr_egl_image cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory

    Device Name Mali-T628 Device Vendor ARM Device Vendor ID 0x6200010 Device Version OpenCL 1.2 v1.r12p0-04rel0.03af15950392f3702b248717f4938b82 Driver Version 1.2 Device OpenCL C Version OpenCL C 1.2 v1.r12p0-04rel0.03af15950392f3702b248717f4938b82 Device Type GPU Device Profile FULL_PROFILE Max compute units 2 Max clock frequency 600MHz Device Partition (core) Max number of sub-devices 0 Supported partition types None Max work item dimensions 3 Max work item sizes 256x256x256 Max work group size 256 Preferred work group size multiple 4 Preferred / native vector sizes
    char 16 / 16
    short 8 / 8
    int 4 / 4
    long 2 / 2
    half 8 / 8 (cl_khr_fp16) float 4 / 4
    double 2 / 2 (cl_khr_fp64) Half-precision Floating-point support (cl_khr_fp16) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Single-precision Floating-point support (core) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Address bits 64, Little-Endian Global memory size 2090405888 (1.947GiB) Error Correction support No Max memory allocation 522601472 (498.4MiB) Unified memory for Host and Device Yes Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Global Memory cache type Read/Write Global Memory cache size <printDeviceInfo:89: get CL_DEVICE_GLOBAL_MEM_CACHE_SIZE : error -30> Global Memory cache line 64 bytes Image support Yes Max number of samplers per kernel 16 Max size for 1D images from buffer 65536 pixels Max 1D or 2D image array size 2048 images Max 2D image size 65536x65536 pixels Max 3D image size 65536x65536x65536 pixels Max number of read image args 128 Max number of write image args 8 Local memory type Global Local memory size 32768 (32KiB) Max constant buffer size 65536 (64KiB) Max number of constant args 8 Max size of kernel argument 1024 Queue properties
    Out-of-order execution Yes Profiling Yes Prefer user sync for interop No Profiling timer resolution 1000ns Execution capabilities
    Run OpenCL kernels Yes Run native kernels No printf() buffer size 1048576 (1024KiB) Built-in kernels
    Device Available Yes Compiler Available Yes Linker Available Yes Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_gl_sharing cl_khr_icd cl_khr_egl_event cl_khr_egl_image cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory

    NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) ARM Platform clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [ARM] clCreateContext(NULL, ...) [default] Success [ARM] clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (2) Platform Name ARM Platform Device Name Mali-T628 Device Name Mali-T628 clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (2) Platform Name ARM Platform Device Name Mali-T628 Device Name Mali-T628

    ICD loader properties ICD loader Name OpenCL ICD Loader ICD loader Vendor OCL Icd free software ICD loader Version 2.2.11 ICD loader Profile OpenCL 2.1

    opened by nitescuc 15
  • Capture affine.store op

    Capture affine.store op

    Hello.

    I try to perform stencil on the following code

    affine.store %1, %arg2[%arg3, %arg4, %arg5, %arg6] : memref<?x?x?x?xi64>
    affine.yield %arg2 : memref<?x?x?x?xi64>
    

    by using the matchPattern function:

    matchPattern(yield, m_Op<AffineYieldOp>(m_Capture(&store, m_Op<AffineStoreOp>(m_Any(), m_Any())))
    

    But it seems m_Capture function and m_Op function used in existing examples, such as StencilGEMM, can not be used to capture operation without a return val, like affine.store here. Can I just use existing structure to match this pattern and capture the affine.store op ?

    opened by IsolatedMy 1
  • Stenciling of MAX/ADD for RN50

    Stenciling of MAX/ADD for RN50

    This patch fix the pass "--x86-stencil-tpp-unary" so that all the reduce patterns in RN50 get stenciled with correct TPP parameters and unary flags.

    opened by ZhangMZh 0
  • Batch parallelization and allocs to alloca changes

    Batch parallelization and allocs to alloca changes

    This wip patch modifies scoped allocs to allocas using PromoteBuffersToStackPass as well as pxa localization pass. As of now, the first pass does not seem to be scoping allocs other than weights. On the other hand, pxa localization pass throws a segfault at runtime for threads>1. This patch also parallelizes layers along batch dimension barring those which don't have batch dimension as the outer loop's induction variable.

    wip 
    opened by KavithaTipturMadhu 0
  • TPSS: parallelization directives

    TPSS: parallelization directives

    This patch adds support for parallelization directives to be specified in a file using the environment <PLAIDML_PARALLELIZATION_CONFIG_FILE>. This patch adds a rule parser which matches the shapes of convolution based on the equalities/inequalities in the config file and applies the rules that follow.
    It is important to note that collapse directive also adds a parallelize directive by default and can only be applied to 2 loop levels corresponding to a perfect loop nest (validity of reordering of loops in order to support the requested order is not verified).

    wip 
    opened by KavithaTipturMadhu 0
  • how to fix

    how to fix "cannot import name 'Iterable' from 'collections' when running test code from main page

    Hey all,

    I've decided to try some mL projects and since I have a amd gpu (5700xt) I decided to use Plaidml. On the main website theres a test code for VGG-19 and I'm trying to run it right now but I run into the error attached in the screenshot. I tried to simply change collections to collections.abc but it looks like python 3.10 already does that? I'm pretty stuck, any help would be appreciated. Thanks! Screenshot from 2022-06-11 22-03-38

    opened by KSTRTK 3
  • build errors

    build errors

    I try build plaidml by myself with configure file in two different machines but both fail to, and the errors are same. The command that I used is ./configure --no_openvino --type=Debug

    Two machine are very similar: one machine: NAME="Ubuntu" VERSION="20.10 (Groovy Gorilla)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.10" VERSION_ID="20.10" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=groovy UBUNTU_CODENAME=groovy

    cmake: 3.20.3

    conda 4.10.1

    Python 3.8.8

    The other machine is:

    NAME="Ubuntu" VERSION="20.04.1 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.1 LTS" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal

    CMakeOutput.log

    Python 3.8.5 CMakeError.log

    opened by xinchen9 1
Releases(0.7.0)
Owner
PlaidML
PlaidML makes deep learning work everywhere.
PlaidML
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.

iNeural A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms. What is a Neural Network? Work on

Fatih Küçükkarakurt 5 Apr 5, 2022
International Business Machines 9 Jul 21, 2022
header only, dependency-free deep learning framework in C++14

The project may be abandoned since the maintainer(s) are just looking to move on. In the case anyone is interested in continuing the project, let us k

tiny-dnn 5.6k Jul 28, 2022
Caffe: a fast open framework for deep learning.

Caffe Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR)/The Berke

Berkeley Vision and Learning Center 32.8k Jul 31, 2022
TFCC is a C++ deep learning inference framework.

TFCC is a C++ deep learning inference framework.

Tencent 110 Jul 16, 2022
KSAI Lite is a deep learning inference framework of kingsoft, based on tensorflow lite

KSAI Lite English | 简体中文 KSAI Lite是一个轻量级、灵活性强、高性能且易于扩展的深度学习推理框架,底层基于tensorflow lite,定位支持包括移动端、嵌入式以及服务器端在内的多硬件平台。 当前KSAI Lite已经应用在金山office内部业务中,并逐步支持金山

null 75 Apr 14, 2022
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

Mobile AI Compute Engine (or MACE for short) is a deep learning inference framework optimized for mobile heterogeneous computing on Android, iOS, Linux and Windows devices.

Xiaomi 4.7k Jul 28, 2022
CubbyDNN - Deep learning framework using C++17 in a single header file

CubbyDNN CubbyDNN is C++17 implementation of deep learning. It is suitable for deep learning on limited computational resource, embedded systems and I

Chris Ohk 31 May 30, 2022
Caffe2 is a lightweight, modular, and scalable deep learning framework.

Source code now lives in the PyTorch repository. Caffe2 Caffe2 is a lightweight, modular, and scalable deep learning framework. Building on the origin

Meta Archive 8.4k Aug 5, 2022
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

This is the Vowpal Wabbit fast online learning code. Why Vowpal Wabbit? Vowpal Wabbit is a machine learning system which pushes the frontier of machin

Vowpal Wabbit 8k Aug 4, 2022
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Apache MXNet (incubating) for Deep Learning Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to m

The Apache Software Foundation 20k Aug 5, 2022
Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

Microsoft 17.2k Aug 6, 2022
LibDEEP BSD-3-ClauseLibDEEP - Deep learning library. BSD-3-Clause

LibDEEP LibDEEP is a deep learning library developed in C language for the development of artificial intelligence-based techniques. Please visit our W

Joao Paulo Papa 18 Mar 15, 2022
Deep Learning API and Server in C++11 support for Caffe, Caffe2, PyTorch,TensorRT, Dlib, NCNN, Tensorflow, XGBoost and TSNE

Open Source Deep Learning Server & API DeepDetect (https://www.deepdetect.com/) is a machine learning API and server written in C++11. It makes state

JoliBrain 2.4k Aug 2, 2022
Forward - A library for high performance deep learning inference on NVIDIA GPUs

a library for high performance deep learning inference on NVIDIA GPUs.

Tencent 123 Mar 17, 2021
A library for high performance deep learning inference on NVIDIA GPUs.

Forward - A library for high performance deep learning inference on NVIDIA GPUs Forward - A library for high performance deep learning inference on NV

Tencent 502 Jul 31, 2022
Nimble: Physics Engine for Deep Learning

Nimble: Physics Engine for Deep Learning

Keenon Werling 271 Aug 2, 2022
Deploying Deep Learning Models in C++: BERT Language Model

This repository show the code to deploy a deep learning model serialized and running in C++ backend.

null 42 Mar 24, 2022