Caffe: a fast open framework for deep learning.



Build Status License

Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR)/The Berkeley Vision and Learning Center (BVLC) and community contributors.

Check out the project site for all the details like

and step-by-step examples.

Custom distributions


Join the chat at

Please join the caffe-users group or gitter chat to ask questions and talk about methods and models. Framework development discussions and thorough bug reports are collected on Issues.

Happy brewing!

License and Citation

Caffe is released under the BSD 2-Clause license. The BAIR/BVLC reference models are released for unrestricted use.

Please cite Caffe in your publications if it helps your research:

  Author = {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor},
  Journal = {arXiv preprint arXiv:1408.5093},
  Title = {Caffe: Convolutional Architecture for Fast Feature Embedding},
  Year = {2014}
  • Caffe OpenCL support

    Caffe OpenCL support

    DISCONTINUED, now available as official Caffe branch here:

    Technical Report

    Available on arXiv:

    opened by naibaf7 323
  • OpenCL Backend

    OpenCL Backend


    The proposed changes add OpenCL support to Caffe. All GPU functions can be executed using AMD GPUs w/ OpenCL 1.2 or 2.0 as well as nVidia GPUs w/ OpenCL 1.1.

    Build Instructions

    OpenCL Tests

    All GPU tests successfully complete using this OpenCL version of Caffe.

    Performance and Stability

    The main goal was to provide an OpenCL port to the Caffe community. As such it is not yet optimized for performance or stability.

    Help Wanted

    Let's make it better and faster together.

    enhancement compatibility OpenCL 
    opened by lunochod 233
  • Multi-GPU


    Uses CUDA peer-to-peer for communication, and parts of #1148. SGD is now synchronous instead of asynchronous, as @longjon showed bandwidth on one box is actually high enough. We haven’t really benchmarked yet, but it seems to work great. It also gets rid of the momentum coordination problem.

    The synchronization code needs to hook into the solver, so it is a bit more invasive than before, but still pretty isolated. I refactored solver.cpp to separate the regularization and gradient compute phases so that they can be invoked at different times by the parallel solver.

    One thing still missing is the way to compute the actual number of iterations. For now each solver runs as if it was by itself, so the run is going to take as long as without parallelism. I guess we could adapt the solver to run 1/N steps instead. Also the batch size should be experimented with, as now effectively N times larger. On that, would it be more convenient to switch to the number of images to compute progress, instead of iterations, to be independent of batch size?

    To try it, run the samples in example/parallel/

    focus speed-up 
    opened by cypof 96
  • Unrolled recurrent layers (RNN, LSTM)

    Unrolled recurrent layers (RNN, LSTM)

    (Replaces #1873)

    Based on #2032 (adds EmbedLayer -- not needed for, but often used with RNNs in practice, and is needed for my examples), which in turn is based on #1977.

    This adds an abstract class RecurrentLayer intended to support recurrent architectures (RNNs, LSTMs, etc.) using an internal network unrolled in time. RecurrentLayer implementations (here, just RNNLayer and LSTMLayer) specify the recurrent architecture by filling in a NetParameter with appropriate layers.

    RecurrentLayer requires 2 input (bottom) Blobs. The first -- the input data itself -- has shape T x N x ... and the second -- the "sequence continuation indicators" delta -- has shape T x N, each holding T timesteps of N independent "streams". delta_{t,n} should be a binary indicator (i.e., value in {0, 1}), where a value of 0 means that timestep t of stream n is the beginning of a new sequence, and a value of 1 means that timestep t of stream n is continuing the sequence from timestep t-1 of stream n. Under the hood, the previous timestep's hidden state is multiplied by these delta values. The fact that these indicators are specified on a per-timestep and per-stream basis allows for streams of arbitrary different lengths without any padding or truncation. At the beginning of the forward pass, the final hidden state from the previous forward pass (h_T) is copied into the initial hidden state for the new forward pass (h_0), allowing for exact inference across arbitrarily long sequences, even if T == 1. However, if any sequences cross batch boundaries, backpropagation through time is approximate -- it is truncated along the batch boundaries.

    Note that the T x N arrangement in memory, used for computational efficiency, is somewhat counterintuitive, as it requires one to "interleave" the data streams.

    There is also an optional third input whose dimensions are simply N x ... (i.e. the first axis must have dimension N and the others can be anything) which is a "static" input to the LSTM. It's equivalent to (but more efficient than) copying the input across the T timesteps and concatenating it with the "dynamic" first input (I was using my TileLayer -- #2083 -- for this purpose at some point before adding the static input). It's used in my captioning experiments to input the image features as they don't change over time. For most problems there will be no such "static" input and you should simply ignore it and just specify the first two input blobs.

    I've added scripts to download COCO2014 (and splits), and prototxts for training a language model and LRCN captioning model on the data. From the Caffe root directory, you should be able to download and parse the data by doing:

    cd data/coco
    ./ # download train/val/test splits
    ./ # download official COCO tool
    cd tools
    python install # follow instructions to install tools and download COCO data if needed
    cd ../../.. # back to caffe root

    Then, you can train a language model using ./examples/coco_caption/, or train LRCN for captioning using ./examples/coco_caption/ (assuming you have downloaded models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel).

    Still on the TODO list: upload a pretrained model to the zoo; add a tool to preview generated image captions and compute retrieval & generation scores.

    JL ES 
    opened by jeffdonahue 95
  • Improved CMake scripts

    Improved CMake scripts

    @shelhamer @jeffdonahue @baeuml @kloudkl @akosiorek @Yangqing @BlGene

    Hello all,

    hope I referenced everyone who participated cmake development (at least I didn't find others)

    Following discussion here, as I promised, I prepared slightly improved caffe cmake scripts. The improvement was developed using Ubuntu 14.04 and tested on Yosemite (with libstdc++). I believe Windows support now is as difficult as compiling all dependencies. But I prefer to postpone testing on Windows until current very linux-ish build scripts and behaviour are slightly adapted for cross-platform use and some dependencies are made optional.

    Description of changes and new features added

    Added OpenCV like formatted configuration log

    Added CaffeConfig.cmake generation for build/install cases. This allows you to connect Caffe to your application using CMake's find_package(Caffe). For more detailed description see below.

    BUILD_SHARED_LIB=ON (default) or OFF build caffe as shared library. In CMake it not good practice to build both shared and static simultaneously. That’s why the switch.

    CPU_ONLY = OFF(default) or ON Forces excluding CUDA support. Also Caffe will compile in CPU_ONLY mode if CUDA Toolkit is not installed or found by cmake. Before build please read chic configuration log dumped by cmake to control this case.

    USE_CUDNN = ON (default) If enabled and cudnn is found build with it, otherwise build without.

    CUDA_ARCH_NAME=Auto(default), All, Fermi, Kepler, Maxwell, Manual specifies target GPU architecture, Selecting concrete value reduces CUDA code compilation time (for instance compilation for sm_20 and sm_30 is twice longer than just for one of them). In case of Auto, cmake will make an attempt to detect GPUS installed in your computer and compile only for them. In case of manual, new CUDA_ARCH_BIN/PTX cmake variables are created where space separated list of architectures should be set. Example, CUDA_ARCH_BIN=”20 21(20) 50”

    BUILD_docs = ON (default)

    • If doxygen installed and found enables doc target. Use make docs to build and make jekyll to run web server. html docs are built in <source folder>/doxygen, and next symlink is created in <source folder>/docs/doxygen. Functionality from scripts/ is now implemented in cmake, but is still required.
    • Source folder for generation is used because .Doxyfile contains relative paths and I prefer not to modify it now, but think generation in binary folder is better

    BUILD_python = ON (default) Build python interface if all required dependencies found, otherwise excluded from build automatically

    BUILD_matlab = OFF(default) Enables building matlab interfaces. Currently it supports both Octave and Matlab. For Octave set Octave_compiler to mkoctfile if not found automatically. For Matlab specify Matlab_DIR or Matlab_mex and Matlab_mexext if again not found automatically. If both installed and found, to select which one to use, set Matlab_build_mex_using=Matlab(default) or Octave. Note matlab wrappers can only be built if BUILD_SHARED_LIB=On. On macos both doesn’t compile.

    Proto-files Now protobuf files ARE NOT copied to <caffe_root>/include/caffe/proto anymore. Instead they are generated to <build_dir>/include/caffe/proto. Know one may include old headers, but this is interest rates to payback of technical debt appeared due to incorrect original cmake scripts design. Also removed them from .gitignore


    • Now NO cmake_test_defines.hpp and sample_data_list.txt are configured by cmake to source directory and NO -DCMAKE_BUILD definition added and all *.in templates were removed. This is because make runtest command is executed in source directory, and embedding absolute paths to test cpp-files is not required! Consider configure_file() to source folder as antipattern. However, one may return such embedding by uncommenting couple lines in srcs/test/CMakeLists.txt.
    • All garbage targets (one per each test file) were removed because they flood IDEs while compilation time reduction is controversial. I replaced them with option BUILD_only_tests that allows quickly include only selected tests. Example: cmake -DBUILD_only_tests=="common,net,blob,im2col_kernel"

    Yosemite support I was able to compile with CUDA support using the Caffe instruction with libstdc++ and patching opencv as here Accelerate.framework support added. Matlab interface was failed to compile.

    Temporary changes

    • make symlink creates symlink [caffe_root]/build -> cmake_build_directory
    • Now all examples are built without .bin suffix and next symlink with .bin suffix created nearby. So that tutorials could work. Once naming standardized, should remove this.

    Including Caffe in your CMake project via find_package()

    git clone [email protected]:BVLC/caffe.git. 
    cd caffe && mkdir cmake_build && cd cmake_build
    cmake .. -DBUILD_SHARED_LIB=ON

    Verify that cmake found everything and in proper locations. After can run make -j 12 right now or better do this:

    cmake . -DCMAKE_BUILD_TYPE=Debug     # switch to debug
    make -j 12 && make install           # installs by default to build_dir/install
    cmake . -DCMAKE_BUILD_TYPE=Release   # switch to release
    make -j 12 && make install           # doesn’t overwrite debug install
    make symlink

    After the operations complete, caffe tutorials should work from caffe root directory. Let’s now see how to connect caffe to a C++ application that uses Caffe API with cmake. Prepare the following script:

    cmake_minimum_required(VERSION 2.8.8)
    add_definitions(${Caffe_DEFINITIONS})    # ex. -DCPU_ONLY
    add_executable(caffeinated_application main.cpp)
    target_link_libraries(caffeinated_application ${Caffe_LIBRARIES})

    Run CMake to configure this application and generate build scripts or IDE project. It will automatically find Caffe in its build directory and pick up all necessarily dependencies (includes, libraries, definitions) and application will compile without any additional actions. Caffe dependencies will also have been included. If you have several Caffe builds or for some reason Cmake wasn’t able find Caffe automatically, you may specify Caffe_DIR=<path-to-caffe-build-dir> in Cmake and this guarantees that everything will work.

    Specified Caffe_DIR to build directory leads to always using a build configuration (say, Release or Debug) Caffe compiled last time for. If you set Caffe_DIR=<caffe-install-dir>/share/Caffe where both configurations have been installed, proper debug or release caffe binaries will be selected depending on for which configuration you compile your caffeinated_application.


    (Fixed typos in CUDA architectures - @Noiredd)

    compatibility focus ready for review 
    opened by Nerei 87
  • Provide a Caffe package in Debian

    Provide a Caffe package in Debian


    Caffe packages are available for Debian/unstable.
    Caffe packages are failing to build for Ubuntu-devel and need to be patched.

    Last update: Dec.20 2016

    Draft guide

    Deploy Caffe with merely one command.

    Brief Guide for Debian/unstable users

    Only experienced linux users are recommended to try Debian/unstable (Sid). To install caffe, first make sure you have something like the follows in file /etc/apt/sources.list: (Uncomment the second line if you want to re-compile caffe locally.)

    deb sid main contrib non-free
    #deb-src sid main contrib non-free

    Then update apt cache and install it. Note, you cannot install both the cpu version and the cuda version.

    # apt update
    # apt install [ caffe-cpu | caffe-cuda ]
    # caffe

    It should work out of box. I hope this work is helpful since there are many people struggling at the Caffe compiling process.

    Here are some notes:

    • Please re-compile OpenBLAS locally with optimization flags for sake of performance. This is highly recommended if you are writing a paper. The way to re-compile OpenBLAS from Debian source is very similar with the next subsection.
    • If you are going to install caffe-cuda, it will automatically pull the CUDA package and the nvidia driver packages. The installation procress may fail if any part of the caffe dependency chain gets into trouble. That is to say, please take care if you have manually installed or significantly modified nvidia driver or CUDA toolkit or protobuf or any other related stuff.
    • if you encountered any problem when installing caffe-cuda on a clean Debian system, report bug to me (via Debian's bug tracking system) please.
    • If you encountered any problem when installing caffe-cpu, please report bug to me via Debian's bug tracking system.
    • Both of caffe-cpu and caffe-cuda contain a manpage (man caffe) and a bash complementation script (caffe <TAB><TAB>, caffe train <TAB><TAB>). Both of them are still not merged into caffe master.
    • The python interface is Python3 version: python3-caffe-{cpu,cuda}. No plan to support python2.

    Compiling your custom caffe package on Debian/unstable

    There is no promise for the content in this subsection. If you just want to compile again from the source without any change, the following should work as expected. If you want to compile it with e.g. CUDNN support, you should at least be able to read and hack the file debian/rules under the source tree (It's a Makefile).

    First make sure you have a correct deb-src line in your apt source list file. Then we compile caffe with several simple commands.

    # apt update
    # apt install build-essential debhelper devscripts    # These are standard package building tools
    # apt build-dep [ caffe-cpu | caffe-cuda ]    # the most elegant way to pull caffe build dependencies
    # apt source [ caffe-cpu | caffe-cuda ]    # download the source tarball
    # cd caffe-XXXX    # now we enter into the source tree
    [ ... optional, make your custom changes at your own risk ... ]
    # debuild -B -j4    # build caffe with 4 parallel jobs (similar to make -j4)
    [ ... building ...]
    # debc    # optional, if you want to check the package contents
    # debi    # install the generated packages


    1. where is caffe-cudnn?
      Due to legal reason the cudnn library cannot be redistributed. I'll be happy to make this package when CUDNN becomes re-distributable. The workaround is to install cudnn by yourself, and hack at least the debian/rules file if you really want the caffe *.deb packages with CUDNN support.

    2. how to report bug via Debian bug tracking system?
      See .

    3. I installed the CPU version, what should I do if I want to switch to CUDA verison?
      sudo apt install caffe-cuda, apt's dependency resolver is smart enough for this.

    4. Where is the examples, the models and other documentation stuff?
      sudo apt install caffe-doc; dpkg -L caffe-doc

    opened by cdluminate 83
  • Caffe Opencl - ViennaCL - Could not find kernel `fill_float`

    Caffe Opencl - ViennaCL - Could not find kernel `fill_float`

    Please use the caffe-users list for usage, installation, or modeling questions, or other requests for help. Do not post such requests to Issues. Doing so interferes with the development of Caffe.

    Please read the guidelines for contributing before submitting this issue.

    Issue summary

    Upon running my network, I get the following error:

    ViennaCL: FATAL ERROR: Could not find kernel 'fill_float' from program ''
    Number of kernels in program: 0
    Kernel not found

    Steps to reproduce

    If you are having difficulty building Caffe or training a model, please ask the caffe-users mailing list. If you are reporting a build error that seems to be due to a bug in Caffe, please attach your build configuration (either Makefile.config or CMakeCache.txt) and the output of the make (or cmake) command.

    Your system configuration

    Operating system: Ubuntu 16.04 Compiler: g++ 5.4 CUDA version (if applicable): 8 CUDNN version (if applicable): Latest one BLAS: Titan Xp on OpenCL drivers Python or MATLAB version (for pycaffe and matcaffe respectively):

    question interface OpenCL 
    opened by soulslicer 76
  • Any simple example?

    Any simple example?


    I started with Caffe and the mnist example ran well. However, I can not understand how am I suppose to use this for my own data for a classification task. What should be the data format? Where should I specify the files? How do I see the results for a test set? All of these are not mentioned at all in the documentation. Any pointers will be appreciated, thanks.

    opened by rmanor 76
  • Yet another batch normalization PR

    Yet another batch normalization PR

    This PR squashes together #1965 and #3161 to make sure that proper credit is given. The final functionality is much more like #3161: we ultimately decided that the scale/shift could be implemented as a separate layer (and should hence get its own PR) and the data shuffling, if it gets merged, should also be done as a separate PR (I have not reviewed that code closely enough to say whether it is mergeable). This version includes the global stats computations, and fixes the issue where #3161 was using the biased variance estimate (took a little while to convince myself that this is indeed the correct estimator to use).

    It would be great if @ducha-aiki and @jeffdonahue could take a look at this.

    opened by cdoersch 74
  • Multi-GPU Data Parallelism (with Parallel Data Layers)

    Multi-GPU Data Parallelism (with Parallel Data Layers)

    This is my package of #2870 (and originally, #2114)

    Modification: Allow data layers (and also PythonLayer when used as data layer) to be shared among worker solver's training net, and also test net for future-proof if one wants to do Multi-GPU testing. Data layers are locked during forward to ensure sequential forward. Now all worker solvers fetch data from one single data layer.

    This ensure that single-gpu training is consistent with multi-gpu training, and allow tests in #2870 to pass. Otherwise in #2870 (#2114) , there are multiple data layers created for worker solver, and these data layers are unaware of each other. This can be a serious issue if one uses deterministic data layers or turn off shuffling. In such case, since data layers in each worker solver reads the same data, one eventually gets same gradient on each solver, so it is almost equivalent to multiply learning rate by GPU number. This is definitely not the desired behavior of Multi-GPU data parallelism, since one should train on different subsets of dataset. Although in #2114 a DataReader is provided, it only applied to leveldb and lmdb, and is hardly extensible to other data layers.

    DataReader is preserved in this PR and LMDB/LEVELDB DataLayer is not shared.


    • [x] Add ShareInParallel function to layer.hpp, data_layer.hpp and pythonlayer.hpp .
    • [x] Implement share layers during net construction, construct top blobs of shared layers.
    • [x] Add lock to forward in layer.hpp to lock layers.
    • [x] Share layers during workersolver construction.
    • [x] ~~Remove DataReader. Restore old behavior of DataLayer.~~ DataReader is kept.
    • [x] Test make runtest on multiple GPU machine.
    • [x] Test multi-gpu training on MNIST. (log:
    • [x] Test multi-gpu training on ILSVRC.
    • [x] Fix NVCC warning on boost/thread.hpp to get Travis CI pass.


    Multi-GPU training is numerically non-deterministic on data layers excepted for LMDB/LEVELDB DataLayer, see

    focus speed-up ready for review parallelism 
    opened by ronghanghu 67
  • ND convolution with im2col

    ND convolution with im2col

    This PR extends convolution to N spatial axes, where Caffe's current convolution supports only 2D convolution (with 2 spatial axes: height and width). For 2D convolution, this implementation doesn't compare favorably with the existing one -- I haven't done much benchmarking, but I believe it's 25-75% slower on both CPU and GPU. So before this could be merged, I'd need to restore the existing implementation and use it as the default "engine" for 2D convolutions (but this more destructive version makes it easier to tell what I was thinking from looking at the diff). If anyone has any suggestions on improving the performance or thoughts on why it might be so much slower, I'd love to hear them.

    Edit: benchmarking this on alexnet, it's about 33% slower:

    @ master:

    I0305 21:07:25.042047 22060 caffe.cpp:271] Average Forward pass: 486.327 ms.
    I0305 21:07:25.042064 22060 caffe.cpp:273] Average Backward pass: 824.147 ms.
    I0305 21:07:25.042079 22060 caffe.cpp:275] Average Forward-Backward: 1310.68 ms.

    @ nd-convolution:

    I0305 21:02:03.827594 12909 caffe.cpp:271] Average Forward pass: 681.38 ms.
    I0305 21:02:03.827608 12909 caffe.cpp:273] Average Backward pass: 1068.98 ms.
    I0305 21:02:03.827623 12909 caffe.cpp:275] Average Forward-Backward: 1750.56 ms.
    focus ready for review ES 
    opened by jeffdonahue 67
  • rename CV_LOAD_IMAGE_* enums

    rename CV_LOAD_IMAGE_* enums

    On RHEL 9, opencv-4.6.0-7.el9.x86_64 there is this representative build error.

    src/caffe/util/io.cpp:76:34: error: ‘CV_LOAD_IMAGE_COLOR’ was not declared in this scope 76 | int cv_read_flag = (is_color ? CV_LOAD_IMAGE_COLOR :

    The header containing this enum has moved to opencv4/opencv2/imgcodecs/legacy/constants_c.h And is no longer in the include path.


    from ./opencv4/opencv2/imgcodecs/legacy/constants_c.h CV_LOAD_IMAGE_GRAYSCALE =0, CV_LOAD_IMAGE_COLOR =1, to ./opencv2/imgcodecs.hpp IMREAD_GRAYSCALE = 0, //!< If set, always convert image to the single channel grayscale image (codec internal conversion). IMREAD_COLOR = 1, //!< If set, always convert image to the 3 channel BGR color image.

    Signed-off-by: Tom Rix [email protected]

    opened by trixirt 1
  • Failed inference with nyud-fcn32s-hha

    Failed inference with nyud-fcn32s-hha

    Important - read before submitting

    Please read the guidelines for contributing before submitting this issue!

    Please do not post installation, build, usage, or modeling questions, or other requests for help to Issues. Use the caffe-users list instead. This helps developers maintain a clear, uncluttered, and efficient view of the state of Caffe.

    Issue summary

    Using the model nyud-fcn32s-hha-heavy.caffemodel to infer images in NYU Depth Dataset V2, the result is wrong. Inference script as blow.

    import numpy as np
    from PIL import Image
    import os 
    import sys
    current_path = os.path.dirname(__file__)
    project_path = os.path.dirname(os.path.dirname(__file__))
    import caffe
    import vis
    def zero_multi_padding(in_array, padding_size=0):
        in_channels, h, w = in_array.shape
        padding_array = np.zeros([in_channels, h + 2 * padding_size, w + 2 * padding_size],dtype=in_array.dtype)
        for i in range(in_channels):
            for xx in range(h):
                for yy in range(w):
                    padding_array[i, xx + padding_size, yy + padding_size] = in_array[i, xx, yy]
        return padding_array
    # the demo image is "2007_000129" from PASCAL VOC
    # load image, switch to BGR, subtract mean, and make dims C x H x W for Caffe
    # im ='/demo/image.jpg')
    # in_ = np.array(im, dtype=np.float32)
    # print(in_.shape)
    # in_ = in_[:,:,::-1]
    # in_ -= np.array((104.00698793,116.66876762,122.67891434))
    # in_ = in_.transpose((2,0,1))
    # in_pad = zero_multi_padding(in_, 99)
    # print(in_pad.shape)
    # print(in_.shape)
    im ='/home/azure002/my_worksapce/')
    # im ='../demo/image.jpg')
    new_image = im.resize((1024,1024), Image.Resampling.BICUBIC)'/test_init.png')
    in_ = np.array(im, dtype=np.float32)
    in_ = in_[:,:,::-1]
    in_ -= np.array((104.00698793,116.66876762,122.67891434))
    in_ = in_.transpose((2,0,1))
    # load net
    net = caffe.Net(current_path+'/deploy.prototxt', current_path+'/nyud-fcn32s-hha-heavy.caffemodel', caffe.TEST)
    # # shape for input (data blob is N x C x H x W), set data
    # net.blobs['None'].reshape(1, *in_.shape)
    # net.blobs['None'].data[...] = in_
    # net.blobs['data'].reshape(1, *in_pad.shape)
    # net.blobs['data'].data[...] = in_pad
    net.blobs['data'].reshape(1, *in_.shape)
    net.blobs['data'].data[...] = in_
    # run net and take argmax for prediction
    out = net.blobs['score'].data[0].argmax(axis=0)
    import matplotlib.pyplot as plt
    # visualize segmentation in PASCAL VOC colors
    voc_palette = vis.make_palette(40)
    out_im = Image.fromarray(vis.color_seg(out, voc_palette))'/output.png')
    masked_im = Image.fromarray(vis.vis_seg(im, out, voc_palette))'/visualization.jpg')

    IMAGE: PROCESS: image

    Steps to reproduce

    run the python script

    Tried solutions

    System configuration

    • Operating system: Linux azure002-System-Product-Name 5.4.0-126-generic #142-Ubuntu SMP Fri Aug 26 12:12:57 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
    • Compiler:
    • CUDA version (if applicable):
    • CUDNN version (if applicable):
    • BLAS:
    • Python version (if using pycaffe): Python 3.8.10
    • MATLAB version (if using matcaffe):

    Issue checklist

    • [ ] read the guidelines and removed the first paragraph
    • [ ] written a short summary and detailed steps to reproduce
    • [ ] explained how solutions to related problems failed (tick if found none)
    • [ ] filled system configuration
    • [ ] attached relevant logs/config files (tick if not applicable)
    opened by wangxudong-cq 0
  • blob.hpp dimension check code problem

    blob.hpp dimension check code problem

    Issue summary

    blob.hpp offset dimension check code problem image Line 157 I think should check CHECK_GE(c, 0), while not CHECK_GE(channels(), 0); also the last 2 lines, line 159 and line 161, for width and height

    Steps to reproduce

    Tried solutions

    System configuration

    • Operating system:
    • Compiler:
    • CUDA version (if applicable):
    • CUDNN version (if applicable):
    • BLAS:
    • Python version (if using pycaffe):
    • MATLAB version (if using matcaffe):

    Issue checklist

    • [ ] read the guidelines and removed the first paragraph
    • [ ] written a short summary and detailed steps to reproduce
    • [ ] explained how solutions to related problems failed (tick if found none)
    • [ ] filled system configuration
    • [ ] attached relevant logs/config files (tick if not applicable)
    opened by qiulinzhang 0
  • Glib 3.4.30 not found

    Glib 3.4.30 not found

    Important - read before submitting

    Issue summary

    Not able to install glib 3.4.30 on github actinons ro run build check. I tried using conda install -c conda-forge gcc=12.1.0

    Steps to reproduce

    Add conda install -c conda-forge gcc=12.1.0 in yml file for github actions, add commit push

    Tried solutions

    System configuration

    • Operating system: Windows
    • Compiler:
    • CUDA version (if applicable):
    • CUDNN version (if applicable):
    • BLAS:
    • Python version (if using pycaffe): using 3.9
    • MATLAB version (if using matcaffe):

    Issue checklist

    • [X ] read the guidelines and removed the first paragraph
    • [ X] written a short summary and detailed steps to reproduce
    • [ X] explained how solutions to related problems failed (tick if found none)
    • [ X] filled system configuration
    • [ X] attached relevant logs/config files (tick if not applicable)
    opened by karkir0003 1
  • 1.0(Apr 18, 2017)

    This release marks the convergence of development into a stable, reference release of the framework and a shift into maintenance mode. Let's review the progress culminating in our 1.0:

    • research: nearly 4,000 citations, usage by award papers at CVPR/ECCV/ICCV, and tutorials at ECCV'14 and CVPR'15
    • industry: adopted by Facebook, NVIDIA, Intel, Sony, Yahoo! Japan, Samsung, Adobe, A9, Siemens, Pinterest, the Embedded Vision Alliance, and more
    • community: 250+ contributors, 15k+ subscribers on github, and 7k+ members of the mailing list
    • development: 10k+ forks, >1 contribution/day on average, and dedicated branches for OpenCL and Windows
    • downloads: 10k+ downloads and updates a month, ~50k unique visitors to the home page every two weeks, and >100k unique downloads of the reference models
    • winner of the ACM MM open source award 2014 and presented as a talk at ICML MLOSS 2015

    Thanks for all of your efforts leading us to Caffe 1.0! Your part in development, community, feedback, and framework usage brought us here. As part of 1.0 we will be welcoming collaborators old and new to join as members of the Caffe core.

    Stay tuned for the next steps in DIY deep learning with Caffe. As development is never truly done, there's always 1.1!

    Now that 1.0 is done, the next generation of the framework—Caffe2—is ready to keep up the progress on DIY deep learning in research and industry. While Caffe 1.0 development will continue with 1.1, Caffe2 is the new framework line for future development led by Yangqing Jia. Although Caffe2 is a departure from the development line of Caffe 1.0, we are planning a migration path for models just as we have future-proofed Caffe models in the past.

    Happy brewing, The Caffe Crew


    Source code(tar.gz)
    Source code(zip)
  • rc5(Feb 21, 2017)

    This packages up 42 commits by 15 contributors to help hone in on 1.0. Thanks all!

    With all releases one should do make clean && make superclean to clear out old materials before compiling the new release.

    • set soversion properly #5296
    • documentation: improved dockerfiles and usage notes #5153, links and fixes #5227
    • build: groom cmake build #4609, find veclib more reliably on mac #5236
    • pycaffe: give Net a layer dictionary #4347
    • matcaffe: destroy individual nets and solvers #4737


    • restore solvers for resuming multi-GPU training #5215
    • draw net helper #5010


    Source code(tar.gz)
    Source code(zip)
  • rc4(Jan 20, 2017)

    It's a new year and a new release candidate. This packages up 348 commits by 68 authors. Thanks all!

    This is intended to be the last release candidate before 1.0. We hope to catch any lurking issues, improve documentation, and polish the packaging for then.

    With all releases one should do make clean && make superclean to clear out old materials before compiling the new release. See all merged PRs since the last release.

    • RNNs + LSTMs #3948
    • layers
      • Parameter layer for learning any bottom #2047
      • Crop layer for aligning coordinate maps for FCNs #3570
      • Tied weights with transpose for InnerProduct layer #3612
      • Batch Norm docs, numerics, and robust proto def #4704 #5184
      • Sigmoid Cross Entropy Loss on GPU #4908 and with ignore #4986
    • pycaffe
      • solver callbacks #3020
      • net spec coordinate mapping and cropping for FCNs #3613
      • N-D blob interface #3703
      • python3 compatibility by six #3716
      • dictionary-style net spec #3747
      • Python layer can have phase #3995
    • Docker image #3518
    • expose all NetState options for all-in-one nets #3863
    • force backprop on or off by propagate_down #3942
    • cuDNN v5 #4159
    • multi-GPU parallelism through NCCL + multi-GPU python interface #4563


    • Net upgrade tools catch mixed versions, handle input fields, and log outputs #3755
    • Exp layer for base e and shift != 0 #3937
    • Crop layer checks only the crop dimensions it should #3993


    • cuDNN compatibility is now at v5 + v4 and cuDNN v3 and earlier are not supported
    • NCCL is now required for multi-GPU operation

    As a reminder the OpenCL and Windows branches continue to make progress with the community leadership of Fabian Tschopp and Guillaume Dumont resp.


    Source code(tar.gz)
    Source code(zip)
  • rc3(Jan 30, 2016)

    A lot has happened since the last release! This packages up ~800 commits by 119 authors. Thanks all!

    With all releases one should do make clean && make superclean to clear out old materials before compiling the new release.

    • layers
      • batch normalization #3229 #3299
      • scale + bias layers #3591
      • PReLU #1940 #2414, ELU #3388, and log #2090 non-linearities
      • tile layer #2083, reduction layer #2089
      • embed layer #2032
      • spatial pyramid pooling #2117
      • batch reindex layer #2966
      • filter layer #2054
    • solvers: Adam #2918, RMSProp #2867, AdaDelta #2782
      • accumulate gradients to decouple computational and learning batch size #1977
      • de-duplicate solver code #2518
      • make solver type a string and split classes #3166 -- you should update your solver definitions
    • MSRA #1946 and bilinear interpolation #2213 weight fillers
    • N-D blobs #1970 and convolution #2049 for higher dimensional data and filters
    • tools:
      • test caffe command line tool execution #1926
      • network summarization tool #3090
      • snapshot on signal / before quit #2253
      • report ignored layers when loading weights #3305
      • caffe command fine-tunes from multiple caffemodels #1456
    • pycaffe:
      • python net spec #2086 #2813 #2959
      • handle python exceptions #2462
      • python layer arguments #2871
      • python layer weights #2944
      • snapshot in pycaffe #3082
      • top + bottom names in pycaffe #2865
      • python3 compatibility improvements
    • matcaffe: totally new interface with examples and tests #2505
    • cuDNN: switch to v2 #2038, switch to v3 #3160, make v4 compatible #3439
    • separate IO dependencies for configurable build #2523
    • large model and solverstate serialization through hdf5 #2836
    • train by multi-GPU data parallelism #2903 #2921 #2924 #2931 #2998
    • dismantle layer headers so every layer has its own include #3315
    • workflow: adopt build versioning #3311 #3593, contributing guide #2837, and badges for build status and license #3133
    • SoftmaxWithLoss normalization options #3296
    • dilated convolution #3487
    • expose Solver Restore() to C++ and Python #2037
    • set mode once and only once in testing #2511
    • turn off backprop by skip_propagate_down #2095
    • flatten layer learns axis #2082
    • trivial slice and concat #3014
    • hdf5 data layer: loads integer data #2978, can shuffle #2118
    • cross platform adjustments #3300 #3320 #3321 #3362 #3361 #3378
    • speed-ups for GPU solvers #3519 and CPU im2col #3536
    • make and cmake build improvements
    • and more!


    • #2866 fix weight sharing to (1) reduce memory usage and computation (2) correct momentum and other solver computations
    • #2972 fix concat (broken in #1970)
    • #2964 #3162 fix MVN layer
    • #2321 fix contrastive loss layer to match Hadsell et al. 2006
    • fix deconv backward #3095 and conv reshape #3096 (broken in #2049)
    • #3393 fix in-place reshape and flatten
    • #3152 fix silence layer to not zero bottom on backward
    • #3574 disable cuDNN max pooling (incompatible with in-place)
    • make backward compatible with negative LR #3007
    • #3332 fix pycaffe forward_backward_all()
    • #1922 fix cross-channel LRN for large channel band
    • #1457 fix shape of C++ feature extraction demo output


    • hdf5 is required
    • cuDNN compatibility is now at v3 + v4 and cuDNN v1 and v2 are not supported
    • IO dependencies (lmdb, leveldb, opencv) are now optional #2523


    Source code(tar.gz)
    Source code(zip)
  • rc2(Feb 20, 2015)

    This is the release candidate for Caffe 1.0 once more with feeling. See #1849 for details.

    With documentation, fixes, and feedback this could soon be 1.0!

    Source code(tar.gz)
    Source code(zip)
  • rc(Sep 19, 2014)

    This is the release candidate for Caffe 1.0. See #1112 for details.

    • documentation
    • standard model format and model zoo for sharing models
    • cuDNN acceleration
    Source code(tar.gz)
    Source code(zip)
  • v0.9999(Aug 8, 2014)

    See #880 for details.

    Dependencies: lmdb and gflags are required. CPU-only Caffe without any GPU / CUDA dependencies is turned on by setting CPU_ONLY := 1 in your Makefile.config.

    Deprecations: the new caffe tool includes commands for model training and testing, querying devices, and timing models. The corresponding train_net.bin, finetune_net.bin, test_net.bin, device_query.bin, and net_speed_benchmark.bin are deprecated.

    Source code(tar.gz)
    Source code(zip)
  • acm-mm-oss(May 24, 2014)

  • v0.999(May 20, 2014)

    See #429 for details.

    Please upgrade your models! Caffe's proto definition was changed in #208 and #219 for extensibility. The upgrade_net_proto_binary.bin and upgrade_net_proto_text.bin tools are provided to convert current models. Caffe will attempt to automagically upgrade old models when loaded, but doesn't save the changes.

    Update your Makefile.config! Caffe has a new Makefile and Makefile.config that learned to auto-configure themselves a bit better. Look at the new Makefile.config.example and update your configuration accordingly.

    Dependencies: Caffe's matrix and vector computations can be done with ATLAS, OpenBLAS, or MKL. The hard dependency on MKL is no more!

    Deprecation: V0 model definitions. While Caffe will try to automagically upgrade old models when loaded, see tools/upgrade_net_proto* to make the permanent upgrade since this will be dropped.

    Source code(tar.gz)
    Source code(zip)
  • v0.99(Mar 20, 2014)

    See #231 for details.

    New Dependency: hdf5 is now required. Caffe learned how to load blobs and (multiple!) labels from hdf5.

    • sudo apt-get install libhdf5-serial-dev for ubuntu.
    • brew install homebrew/science/hdf5 for osx.

    Deprecation: padding layers. See 2848aa1f8da0272797ee51234293dfa87eda266a for an example of how to update your model schema and note that an automated tool is coming for this and other model schema updates #219.

    Source code(tar.gz)
    Source code(zip)
  • rcnn-release(Mar 20, 2014)

  • v0.9(Mar 19, 2014)

Berkeley Vision and Learning Center
Autonomous Perception Research
Berkeley Vision and Learning Center
Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models

Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine DSSTNE (pronounced "Destiny") is an open source software library for training and deploying

Amazon Archives 4.4k Dec 30, 2022
yolov5 onnx caffe

环境配置 ubuntu:18.04 cuda:10.0 cudnn:7.6.5 caffe: 1.0 OpenCV:3.4.2 Anaconda3:5.2.0 相关的安装包我已经放到百度云盘,可以从如下链接下载:

null 61 Dec 29, 2022
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.

iNeural A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms. What is a Neural Network? Work on

Fatih Küçükkarakurt 5 Apr 5, 2022
header only, dependency-free deep learning framework in C++14

The project may be abandoned since the maintainer(s) are just looking to move on. In the case anyone is interested in continuing the project, let us k

tiny-dnn 5.6k Dec 31, 2022
TFCC is a C++ deep learning inference framework.

TFCC is a C++ deep learning inference framework.

Tencent 113 Dec 23, 2022
KSAI Lite is a deep learning inference framework of kingsoft, based on tensorflow lite

KSAI Lite English | 简体中文 KSAI Lite是一个轻量级、灵活性强、高性能且易于扩展的深度学习推理框架,底层基于tensorflow lite,定位支持包括移动端、嵌入式以及服务器端在内的多硬件平台。 当前KSAI Lite已经应用在金山office内部业务中,并逐步支持金山

null 80 Dec 27, 2022
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

Mobile AI Compute Engine (or MACE for short) is a deep learning inference framework optimized for mobile heterogeneous computing on Android, iOS, Linux and Windows devices.

Xiaomi 4.7k Jan 3, 2023
Plaidml - PlaidML is a framework for making deep learning work everywhere.

A platform for making deep learning work everywhere. Documentation | Installation Instructions | Building PlaidML | Contributing | Troubleshooting | R

PlaidML 4.5k Jan 7, 2023
CubbyDNN - Deep learning framework using C++17 in a single header file

CubbyDNN CubbyDNN is C++17 implementation of deep learning. It is suitable for deep learning on limited computational resource, embedded systems and I

Chris Ohk 30 Aug 16, 2022
Caffe2 is a lightweight, modular, and scalable deep learning framework.

Source code now lives in the PyTorch repository. Caffe2 Caffe2 is a lightweight, modular, and scalable deep learning framework. Building on the origin

Meta Archive 8.4k Jan 6, 2023
Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit ( is a unified deep learning toolkit that describes

Microsoft 17.3k Dec 23, 2022
An Open-Source Analytical Placer for Large Scale Heterogeneous FPGAs using Deep-Learning Toolkit

DREAMPlaceFPGA An Open-Source Analytical Placer for Large Scale Heterogeneous FPGAs using Deep-Learning Toolkit. This work leverages the open-source A

Rachel Selina Rajarathnam 25 Dec 5, 2022
Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

The Microsoft Cognitive Toolkit is a unified deep learning toolkit that describes neural networks as a series of computational steps via a directed graph.

Microsoft 17.3k Jan 6, 2023
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

This is the Vowpal Wabbit fast online learning code. Why Vowpal Wabbit? Vowpal Wabbit is a machine learning system which pushes the frontier of machin

Vowpal Wabbit 8.1k Dec 30, 2022
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Apache MXNet (incubating) for Deep Learning Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to m

The Apache Software Foundation 20.2k Dec 31, 2022
LibDEEP BSD-3-ClauseLibDEEP - Deep learning library. BSD-3-Clause

LibDEEP LibDEEP is a deep learning library developed in C language for the development of artificial intelligence-based techniques. Please visit our W

Joao Paulo Papa 22 Dec 8, 2022
Forward - A library for high performance deep learning inference on NVIDIA GPUs

a library for high performance deep learning inference on NVIDIA GPUs.

Tencent 123 Mar 17, 2021
A library for high performance deep learning inference on NVIDIA GPUs.

Forward - A library for high performance deep learning inference on NVIDIA GPUs Forward - A library for high performance deep learning inference on NV

Tencent 509 Dec 17, 2022