Caffe: a fast open framework for deep learning.



Build Status License

Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR)/The Berkeley Vision and Learning Center (BVLC) and community contributors.

Check out the project site for all the details like

and step-by-step examples.

Custom distributions


Join the chat at

Please join the caffe-users group or gitter chat to ask questions and talk about methods and models. Framework development discussions and thorough bug reports are collected on Issues.

Happy brewing!

License and Citation

Caffe is released under the BSD 2-Clause license. The BAIR/BVLC reference models are released for unrestricted use.

Please cite Caffe in your publications if it helps your research:

  Author = {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor},
  Journal = {arXiv preprint arXiv:1408.5093},
  Title = {Caffe: Convolutional Architecture for Fast Feature Embedding},
  Year = {2014}
  • Caffe OpenCL support

    Caffe OpenCL support

    DISCONTINUED, now available as official Caffe branch here:

    Technical Report

    Available on arXiv:

    opened by naibaf7 323
  • OpenCL Backend

    OpenCL Backend


    The proposed changes add OpenCL support to Caffe. All GPU functions can be executed using AMD GPUs w/ OpenCL 1.2 or 2.0 as well as nVidia GPUs w/ OpenCL 1.1.

    Build Instructions

    OpenCL Tests

    All GPU tests successfully complete using this OpenCL version of Caffe.

    Performance and Stability

    The main goal was to provide an OpenCL port to the Caffe community. As such it is not yet optimized for performance or stability.

    Help Wanted

    Let's make it better and faster together.

    enhancement compatibility OpenCL 
    opened by lunochod 232
  • Multi-GPU


    Uses CUDA peer-to-peer for communication, and parts of #1148. SGD is now synchronous instead of asynchronous, as @longjon showed bandwidth on one box is actually high enough. We haven’t really benchmarked yet, but it seems to work great. It also gets rid of the momentum coordination problem.

    The synchronization code needs to hook into the solver, so it is a bit more invasive than before, but still pretty isolated. I refactored solver.cpp to separate the regularization and gradient compute phases so that they can be invoked at different times by the parallel solver.

    One thing still missing is the way to compute the actual number of iterations. For now each solver runs as if it was by itself, so the run is going to take as long as without parallelism. I guess we could adapt the solver to run 1/N steps instead. Also the batch size should be experimented with, as now effectively N times larger. On that, would it be more convenient to switch to the number of images to compute progress, instead of iterations, to be independent of batch size?

    To try it, run the samples in example/parallel/

    focus speed-up 
    opened by cypof 96
  • Unrolled recurrent layers (RNN, LSTM)

    Unrolled recurrent layers (RNN, LSTM)

    (Replaces #1873)

    Based on #2032 (adds EmbedLayer -- not needed for, but often used with RNNs in practice, and is needed for my examples), which in turn is based on #1977.

    This adds an abstract class RecurrentLayer intended to support recurrent architectures (RNNs, LSTMs, etc.) using an internal network unrolled in time. RecurrentLayer implementations (here, just RNNLayer and LSTMLayer) specify the recurrent architecture by filling in a NetParameter with appropriate layers.

    RecurrentLayer requires 2 input (bottom) Blobs. The first -- the input data itself -- has shape T x N x ... and the second -- the "sequence continuation indicators" delta -- has shape T x N, each holding T timesteps of N independent "streams". delta_{t,n} should be a binary indicator (i.e., value in {0, 1}), where a value of 0 means that timestep t of stream n is the beginning of a new sequence, and a value of 1 means that timestep t of stream n is continuing the sequence from timestep t-1 of stream n. Under the hood, the previous timestep's hidden state is multiplied by these delta values. The fact that these indicators are specified on a per-timestep and per-stream basis allows for streams of arbitrary different lengths without any padding or truncation. At the beginning of the forward pass, the final hidden state from the previous forward pass (h_T) is copied into the initial hidden state for the new forward pass (h_0), allowing for exact inference across arbitrarily long sequences, even if T == 1. However, if any sequences cross batch boundaries, backpropagation through time is approximate -- it is truncated along the batch boundaries.

    Note that the T x N arrangement in memory, used for computational efficiency, is somewhat counterintuitive, as it requires one to "interleave" the data streams.

    There is also an optional third input whose dimensions are simply N x ... (i.e. the first axis must have dimension N and the others can be anything) which is a "static" input to the LSTM. It's equivalent to (but more efficient than) copying the input across the T timesteps and concatenating it with the "dynamic" first input (I was using my TileLayer -- #2083 -- for this purpose at some point before adding the static input). It's used in my captioning experiments to input the image features as they don't change over time. For most problems there will be no such "static" input and you should simply ignore it and just specify the first two input blobs.

    I've added scripts to download COCO2014 (and splits), and prototxts for training a language model and LRCN captioning model on the data. From the Caffe root directory, you should be able to download and parse the data by doing:

    cd data/coco
    ./ # download train/val/test splits
    ./ # download official COCO tool
    cd tools
    python install # follow instructions to install tools and download COCO data if needed
    cd ../../.. # back to caffe root

    Then, you can train a language model using ./examples/coco_caption/, or train LRCN for captioning using ./examples/coco_caption/ (assuming you have downloaded models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel).

    Still on the TODO list: upload a pretrained model to the zoo; add a tool to preview generated image captions and compute retrieval & generation scores.

    JL ES 
    opened by jeffdonahue 95
  • Improved CMake scripts

    Improved CMake scripts

    @shelhamer @jeffdonahue @baeuml @kloudkl @akosiorek @Yangqing @BlGene

    Hello all,

    hope I referenced everyone who participated cmake development (at least I didn't find others)

    Following discussion here, as I promised, I prepared slightly improved caffe cmake scripts. The improvement was developed using Ubuntu 14.04 and tested on Yosemite (with libstdc++). I believe Windows support now is as difficult as compiling all dependencies. But I prefer to postpone testing on Windows until current very linux-ish build scripts and behaviour are slightly adapted for cross-platform use and some dependencies are made optional.

    Description of changes and new features added

    Added OpenCV like formatted configuration log

    Added CaffeConfig.cmake generation for build/install cases. This allows you to connect Caffe to your application using CMake's find_package(Caffe). For more detailed description see below.

    BUILD_SHARED_LIB=ON (default) or OFF build caffe as shared library. In CMake it not good practice to build both shared and static simultaneously. That’s why the switch.

    CPU_ONLY = OFF(default) or ON Forces excluding CUDA support. Also Caffe will compile in CPU_ONLY mode if CUDA Toolkit is not installed or found by cmake. Before build please read chic configuration log dumped by cmake to control this case.

    USE_CUDNN = ON (default) If enabled and cudnn is found build with it, otherwise build without.

    CUDA_ARCH_NAME=Auto(default), All, Fermi, Kepler, Maxwell, Manual specifies target GPU architecture, Selecting concrete value reduces CUDA code compilation time (for instance compilation for sm_20 and sm_30 is twice longer than just for one of them). In case of Auto, cmake will make an attempt to detect GPUS installed in your computer and compile only for them. In case of manual, new CUDA_ARCH_BIN/PTX cmake variables are created where space separated list of architectures should be set. Example, CUDA_ARCH_BIN=”20 21(20) 50”

    BUILD_docs = ON (default)

    • If doxygen installed and found enables doc target. Use make docs to build and make jekyll to run web server. html docs are built in <source folder>/doxygen, and next symlink is created in <source folder>/docs/doxygen. Functionality from scripts/ is now implemented in cmake, but is still required.
    • Source folder for generation is used because .Doxyfile contains relative paths and I prefer not to modify it now, but think generation in binary folder is better

    BUILD_python = ON (default) Build python interface if all required dependencies found, otherwise excluded from build automatically

    BUILD_matlab = OFF(default) Enables building matlab interfaces. Currently it supports both Octave and Matlab. For Octave set Octave_compiler to mkoctfile if not found automatically. For Matlab specify Matlab_DIR or Matlab_mex and Matlab_mexext if again not found automatically. If both installed and found, to select which one to use, set Matlab_build_mex_using=Matlab(default) or Octave. Note matlab wrappers can only be built if BUILD_SHARED_LIB=On. On macos both doesn’t compile.

    Proto-files Now protobuf files ARE NOT copied to <caffe_root>/include/caffe/proto anymore. Instead they are generated to <build_dir>/include/caffe/proto. Know one may include old headers, but this is interest rates to payback of technical debt appeared due to incorrect original cmake scripts design. Also removed them from .gitignore


    • Now NO cmake_test_defines.hpp and sample_data_list.txt are configured by cmake to source directory and NO -DCMAKE_BUILD definition added and all *.in templates were removed. This is because make runtest command is executed in source directory, and embedding absolute paths to test cpp-files is not required! Consider configure_file() to source folder as antipattern. However, one may return such embedding by uncommenting couple lines in srcs/test/CMakeLists.txt.
    • All garbage targets (one per each test file) were removed because they flood IDEs while compilation time reduction is controversial. I replaced them with option BUILD_only_tests that allows quickly include only selected tests. Example: cmake -DBUILD_only_tests=="common,net,blob,im2col_kernel"

    Yosemite support I was able to compile with CUDA support using the Caffe instruction with libstdc++ and patching opencv as here Accelerate.framework support added. Matlab interface was failed to compile.

    Temporary changes

    • make symlink creates symlink [caffe_root]/build -> cmake_build_directory
    • Now all examples are built without .bin suffix and next symlink with .bin suffix created nearby. So that tutorials could work. Once naming standardized, should remove this.

    Including Caffe in your CMake project via find_package()

    git clone [email protected]:BVLC/caffe.git. 
    cd caffe && mkdir cmake_build && cd cmake_build
    cmake .. -DBUILD_SHARED_LIB=ON

    Verify that cmake found everything and in proper locations. After can run make -j 12 right now or better do this:

    cmake . -DCMAKE_BUILD_TYPE=Debug     # switch to debug
    make -j 12 && make install           # installs by default to build_dir/install
    cmake . -DCMAKE_BUILD_TYPE=Release   # switch to release
    make -j 12 && make install           # doesn’t overwrite debug install
    make symlink

    After the operations complete, caffe tutorials should work from caffe root directory. Let’s now see how to connect caffe to a C++ application that uses Caffe API with cmake. Prepare the following script:

    cmake_minimum_required(VERSION 2.8.8)
    add_definitions(${Caffe_DEFINITIONS})    # ex. -DCPU_ONLY
    add_executable(caffeinated_application main.cpp)
    target_link_libraries(caffeinated_application ${Caffe_LIBRARIES})

    Run CMake to configure this application and generate build scripts or IDE project. It will automatically find Caffe in its build directory and pick up all necessarily dependencies (includes, libraries, definitions) and application will compile without any additional actions. Caffe dependencies will also have been included. If you have several Caffe builds or for some reason Cmake wasn’t able find Caffe automatically, you may specify Caffe_DIR=<path-to-caffe-build-dir> in Cmake and this guarantees that everything will work.

    Specified Caffe_DIR to build directory leads to always using a build configuration (say, Release or Debug) Caffe compiled last time for. If you set Caffe_DIR=<caffe-install-dir>/share/Caffe where both configurations have been installed, proper debug or release caffe binaries will be selected depending on for which configuration you compile your caffeinated_application.


    (Fixed typos in CUDA architectures - @Noiredd)

    compatibility focus ready for review 
    opened by Nerei 87
  • Provide a Caffe package in Debian

    Provide a Caffe package in Debian


    Caffe packages are available for Debian/unstable.
    Caffe packages are failing to build for Ubuntu-devel and need to be patched.

    Last update: Dec.20 2016

    Draft guide

    Deploy Caffe with merely one command.

    Brief Guide for Debian/unstable users

    Only experienced linux users are recommended to try Debian/unstable (Sid). To install caffe, first make sure you have something like the follows in file /etc/apt/sources.list: (Uncomment the second line if you want to re-compile caffe locally.)

    deb sid main contrib non-free
    #deb-src sid main contrib non-free

    Then update apt cache and install it. Note, you cannot install both the cpu version and the cuda version.

    # apt update
    # apt install [ caffe-cpu | caffe-cuda ]
    # caffe

    It should work out of box. I hope this work is helpful since there are many people struggling at the Caffe compiling process.

    Here are some notes:

    • Please re-compile OpenBLAS locally with optimization flags for sake of performance. This is highly recommended if you are writing a paper. The way to re-compile OpenBLAS from Debian source is very similar with the next subsection.
    • If you are going to install caffe-cuda, it will automatically pull the CUDA package and the nvidia driver packages. The installation procress may fail if any part of the caffe dependency chain gets into trouble. That is to say, please take care if you have manually installed or significantly modified nvidia driver or CUDA toolkit or protobuf or any other related stuff.
    • if you encountered any problem when installing caffe-cuda on a clean Debian system, report bug to me (via Debian's bug tracking system) please.
    • If you encountered any problem when installing caffe-cpu, please report bug to me via Debian's bug tracking system.
    • Both of caffe-cpu and caffe-cuda contain a manpage (man caffe) and a bash complementation script (caffe <TAB><TAB>, caffe train <TAB><TAB>). Both of them are still not merged into caffe master.
    • The python interface is Python3 version: python3-caffe-{cpu,cuda}. No plan to support python2.

    Compiling your custom caffe package on Debian/unstable

    There is no promise for the content in this subsection. If you just want to compile again from the source without any change, the following should work as expected. If you want to compile it with e.g. CUDNN support, you should at least be able to read and hack the file debian/rules under the source tree (It's a Makefile).

    First make sure you have a correct deb-src line in your apt source list file. Then we compile caffe with several simple commands.

    # apt update
    # apt install build-essential debhelper devscripts    # These are standard package building tools
    # apt build-dep [ caffe-cpu | caffe-cuda ]    # the most elegant way to pull caffe build dependencies
    # apt source [ caffe-cpu | caffe-cuda ]    # download the source tarball
    # cd caffe-XXXX    # now we enter into the source tree
    [ ... optional, make your custom changes at your own risk ... ]
    # debuild -B -j4    # build caffe with 4 parallel jobs (similar to make -j4)
    [ ... building ...]
    # debc    # optional, if you want to check the package contents
    # debi    # install the generated packages


    1. where is caffe-cudnn?
      Due to legal reason the cudnn library cannot be redistributed. I'll be happy to make this package when CUDNN becomes re-distributable. The workaround is to install cudnn by yourself, and hack at least the debian/rules file if you really want the caffe *.deb packages with CUDNN support.

    2. how to report bug via Debian bug tracking system?
      See .

    3. I installed the CPU version, what should I do if I want to switch to CUDA verison?
      sudo apt install caffe-cuda, apt's dependency resolver is smart enough for this.

    4. Where is the examples, the models and other documentation stuff?
      sudo apt install caffe-doc; dpkg -L caffe-doc

    opened by cdluminate 83
  • Caffe Opencl - ViennaCL - Could not find kernel `fill_float`

    Caffe Opencl - ViennaCL - Could not find kernel `fill_float`

    Please use the caffe-users list for usage, installation, or modeling questions, or other requests for help. Do not post such requests to Issues. Doing so interferes with the development of Caffe.

    Please read the guidelines for contributing before submitting this issue.

    Issue summary

    Upon running my network, I get the following error:

    ViennaCL: FATAL ERROR: Could not find kernel 'fill_float' from program ''
    Number of kernels in program: 0
    Kernel not found

    Steps to reproduce

    If you are having difficulty building Caffe or training a model, please ask the caffe-users mailing list. If you are reporting a build error that seems to be due to a bug in Caffe, please attach your build configuration (either Makefile.config or CMakeCache.txt) and the output of the make (or cmake) command.

    Your system configuration

    Operating system: Ubuntu 16.04 Compiler: g++ 5.4 CUDA version (if applicable): 8 CUDNN version (if applicable): Latest one BLAS: Titan Xp on OpenCL drivers Python or MATLAB version (for pycaffe and matcaffe respectively):

    question interface OpenCL 
    opened by soulslicer 76
  • Any simple example?

    Any simple example?


    I started with Caffe and the mnist example ran well. However, I can not understand how am I suppose to use this for my own data for a classification task. What should be the data format? Where should I specify the files? How do I see the results for a test set? All of these are not mentioned at all in the documentation. Any pointers will be appreciated, thanks.

    opened by rmanor 76
  • Yet another batch normalization PR

    Yet another batch normalization PR

    This PR squashes together #1965 and #3161 to make sure that proper credit is given. The final functionality is much more like #3161: we ultimately decided that the scale/shift could be implemented as a separate layer (and should hence get its own PR) and the data shuffling, if it gets merged, should also be done as a separate PR (I have not reviewed that code closely enough to say whether it is mergeable). This version includes the global stats computations, and fixes the issue where #3161 was using the biased variance estimate (took a little while to convince myself that this is indeed the correct estimator to use).

    It would be great if @ducha-aiki and @jeffdonahue could take a look at this.

    opened by cdoersch 74
  • Multi-GPU Data Parallelism (with Parallel Data Layers)

    Multi-GPU Data Parallelism (with Parallel Data Layers)

    This is my package of #2870 (and originally, #2114)

    Modification: Allow data layers (and also PythonLayer when used as data layer) to be shared among worker solver's training net, and also test net for future-proof if one wants to do Multi-GPU testing. Data layers are locked during forward to ensure sequential forward. Now all worker solvers fetch data from one single data layer.

    This ensure that single-gpu training is consistent with multi-gpu training, and allow tests in #2870 to pass. Otherwise in #2870 (#2114) , there are multiple data layers created for worker solver, and these data layers are unaware of each other. This can be a serious issue if one uses deterministic data layers or turn off shuffling. In such case, since data layers in each worker solver reads the same data, one eventually gets same gradient on each solver, so it is almost equivalent to multiply learning rate by GPU number. This is definitely not the desired behavior of Multi-GPU data parallelism, since one should train on different subsets of dataset. Although in #2114 a DataReader is provided, it only applied to leveldb and lmdb, and is hardly extensible to other data layers.

    DataReader is preserved in this PR and LMDB/LEVELDB DataLayer is not shared.


    • [x] Add ShareInParallel function to layer.hpp, data_layer.hpp and pythonlayer.hpp .
    • [x] Implement share layers during net construction, construct top blobs of shared layers.
    • [x] Add lock to forward in layer.hpp to lock layers.
    • [x] Share layers during workersolver construction.
    • [x] ~~Remove DataReader. Restore old behavior of DataLayer.~~ DataReader is kept.
    • [x] Test make runtest on multiple GPU machine.
    • [x] Test multi-gpu training on MNIST. (log:
    • [x] Test multi-gpu training on ILSVRC.
    • [x] Fix NVCC warning on boost/thread.hpp to get Travis CI pass.


    Multi-GPU training is numerically non-deterministic on data layers excepted for LMDB/LEVELDB DataLayer, see

    focus speed-up ready for review parallelism 
    opened by ronghanghu 67
  • ND convolution with im2col

    ND convolution with im2col

    This PR extends convolution to N spatial axes, where Caffe's current convolution supports only 2D convolution (with 2 spatial axes: height and width). For 2D convolution, this implementation doesn't compare favorably with the existing one -- I haven't done much benchmarking, but I believe it's 25-75% slower on both CPU and GPU. So before this could be merged, I'd need to restore the existing implementation and use it as the default "engine" for 2D convolutions (but this more destructive version makes it easier to tell what I was thinking from looking at the diff). If anyone has any suggestions on improving the performance or thoughts on why it might be so much slower, I'd love to hear them.

    Edit: benchmarking this on alexnet, it's about 33% slower:

    @ master:

    I0305 21:07:25.042047 22060 caffe.cpp:271] Average Forward pass: 486.327 ms.
    I0305 21:07:25.042064 22060 caffe.cpp:273] Average Backward pass: 824.147 ms.
    I0305 21:07:25.042079 22060 caffe.cpp:275] Average Forward-Backward: 1310.68 ms.

    @ nd-convolution:

    I0305 21:02:03.827594 12909 caffe.cpp:271] Average Forward pass: 681.38 ms.
    I0305 21:02:03.827608 12909 caffe.cpp:273] Average Backward pass: 1068.98 ms.
    I0305 21:02:03.827623 12909 caffe.cpp:275] Average Forward-Backward: 1750.56 ms.
    focus ready for review ES 
    opened by jeffdonahue 67
  • Segmentation fault (core dumped) when creating imageset

    Segmentation fault (core dumped) when creating imageset

    Issue summary

    Failed to create my own image set, Segmentation fault (core dumped).

    Steps to reproduce config:

    set -e
    GLOG_logtostderr=1 $TOOLS/convert_imageset \
        --resize_height=$RESIZE_HEIGHT \
        --resize_width=$RESIZE_WIDTH \
        --shuffle \
        $TRAIN_DATA_ROOT \
        $DATA/train.txt \
    echo "Creating val lmdb..."
    GLOG_logtostderr=1 $TOOLS/convert_imageset \
        --resize_height=$RESIZE_HEIGHT \
        --resize_width=$RESIZE_WIDTH \
        --shuffle \
        $VAL_DATA_ROOT \
        $DATA/val.txt \

    The format of the train.txt and val.txt:

    63_scratch6.jpg 63


    (22:09 [email protected] imagenet) > ./ 
    Creating train lmdb...
    I0518 22:09:31.906777 2695928 convert_imageset.cpp:86] Shuffling data
    I0518 22:09:32.294107 2695928 convert_imageset.cpp:89] A total of 17493 images.
    I0518 22:09:32.348204 2695928 db_lmdb.cpp:35] Opened lmdb data/cars_train_lmdb
    Segmentation fault (core dumped)

    System configuration

    • Operating system: Ubuntu 20.04
    opened by SimonWang2014 0
  • import error: segment fault when import caffe

    import error: segment fault when import caffe

    Issue summary

    i make all && make pycaffe successfully and i can run the examples successfully,but when i import caffe i got segment fault without any errror mesg.And I used lldb to debug the core file,but noting as well.And I commented "import_array1()" at the end of $CAFFE_ROOT/python/caffe/_caffe.cpp and remake,and i can import successfully.However something is error cause i run the py-faster-rcnn got a error message that is "numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject".Anybody knows reason?

    System configuration

    I use anaconda manage python env

    • Operating system: Mac os11.4
    • Compiler:
    • CUDA version (if applicable):
    • CUDNN version (if applicable):
    • BLAS: openblas 1.0
    • Python version (if using pycaffe): python 3.7.13
    • MATLAB version (if using matcaffe):
    opened by zjjjz 0
  • Makefile


    Important - read before submitting

    Please read the guidelines for contributing before submitting this issue!

    Please do not post installation, build, usage, or modeling questions, or other requests for help to Issues. Use the caffe-users list instead. This helps developers maintain a clear, uncluttered, and efficient view of the state of Caffe.

    Issue summary

    Steps to reproduce

    Tried solutions

    System configuration

    • Operating system:
    • Compiler:
    • CUDA version (if applicable):
    • CUDNN version (if applicable):
    • BLAS:
    • Python version (if using pycaffe):
    • MATLAB version (if using matcaffe):

    Issue checklist

    • [ ] read the guidelines and removed the first paragraph
    • [ ] written a short summary and detailed steps to reproduce
    • [ ] explained how solutions to related problems failed (tick if found none)
    • [ ] filled system configuration
    • [ ] attached relevant logs/config files (tick if not applicable)
    opened by tlzhaoboling 0
  • caffe time -model -weights -gpu=0

    caffe time -model -weights -gpu=0

    caffe time -gpu

    Issue summary

    caffe time -model=xxx -weighs=xxx -gpu=0 the log is: I0312 15:29:30.427956 2367 caffe.cpp:406] Average time per layer: I0312 15:29:30.427961 2367 caffe.cpp:409] data forward: 0.0018944 ms. I0312 15:29:30.427969 2367 caffe.cpp:412] data backward: 0.0018848 ms. I0312 15:29:30.427975 2367 caffe.cpp:409] conv1 forward: 0.10807 ms. I0312 15:29:30.427982 2367 caffe.cpp:412] conv1 backward: 0.182646 ms. I0312 15:29:30.427989 2367 caffe.cpp:409] relu1 forward: 0.0140288 ms. I0312 15:29:30.427994 2367 caffe.cpp:412] relu1 backward: 0.0018432 ms. I0312 15:29:30.428000 2367 caffe.cpp:409] norm1 forward: 0.0628864 ms. I0312 15:29:30.428007 2367 caffe.cpp:412] norm1 backward: 0.105226 ms. I0312 15:29:30.428014 2367 caffe.cpp:409] pool1 forward: 0.0158592 ms. I0312 15:29:30.428020 2367 caffe.cpp:412] pool1 backward: 0.0018784 ms. I0312 15:29:30.428027 2367 caffe.cpp:409] conv2 forward: 0.291235 ms. I0312 15:29:30.428033 2367 caffe.cpp:412] conv2 backward: 0.515402 ms. I0312 15:29:30.428040 2367 caffe.cpp:409] relu2 forward: 0.0101152 ms. I0312 15:29:30.428048 2367 caffe.cpp:412] relu2 backward: 0.0018592 ms. I0312 15:29:30.428056 2367 caffe.cpp:409] norm2 forward: 0.137219 ms. I0312 15:29:30.428066 2367 caffe.cpp:412] norm2 backward: 0.256826 ms. I0312 15:29:30.428073 2367 caffe.cpp:409] pool2 forward: 0.0133536 ms. I0312 15:29:30.428084 2367 caffe.cpp:412] pool2 backward: 0.0024768 ms. I0312 15:29:30.428092 2367 caffe.cpp:409] conv3 forward: 0.14239 ms. I0312 15:29:30.428098 2367 caffe.cpp:412] conv3 backward: 0.3532 ms. I0312 15:29:30.428107 2367 caffe.cpp:409] relu3 forward: 0.008976 ms. I0312 15:29:30.428114 2367 caffe.cpp:412] relu3 backward: 0.0020128 ms. I0312 15:29:30.428123 2367 caffe.cpp:409] conv4 forward: 0.117597 ms. I0312 15:29:30.428130 2367 caffe.cpp:412] conv4 backward: 0.292886 ms. I0312 15:29:30.428138 2367 caffe.cpp:409] relu4 forward: 0.0090048 ms. I0312 15:29:30.428145 2367 caffe.cpp:412] relu4 backward: 0.001872 ms. I0312 15:29:30.428153 2367 caffe.cpp:409] conv5 forward: 0.109824 ms. I0312 15:29:30.428160 2367 caffe.cpp:412] conv5 backward: 0.368051 ms. I0312 15:29:30.428165 2367 caffe.cpp:409] relu5 forward: 0.0088512 ms. I0312 15:29:30.428174 2367 caffe.cpp:412] relu5 backward: 0.0018848 ms. I0312 15:29:30.428182 2367 caffe.cpp:409] pool5 forward: 0.0117792 ms. I0312 15:29:30.428189 2367 caffe.cpp:412] pool5 backward: 0.00256 ms. I0312 15:29:30.428197 2367 caffe.cpp:409] fc6 forward: 0.417875 ms. I0312 15:29:30.428205 2367 caffe.cpp:412] fc6 backward: 3.15267 ms. I0312 15:29:30.428212 2367 caffe.cpp:409] relu6 forward: 0.0122656 ms. I0312 15:29:30.428264 2367 caffe.cpp:412] relu6 backward: 0.0018912 ms. I0312 15:29:30.428273 2367 caffe.cpp:409] drop6 forward: 0.0127136 ms. I0312 15:29:30.428282 2367 caffe.cpp:412] drop6 backward: 0.001856 ms. I0312 15:29:30.428292 2367 caffe.cpp:409] fc7 forward: 0.1988 ms. I0312 15:29:30.428300 2367 caffe.cpp:412] fc7 backward: 2.72682 ms. I0312 15:29:30.428308 2367 caffe.cpp:409] relu7 forward: 0.0122848 ms. I0312 15:29:30.428316 2367 caffe.cpp:412] relu7 backward: 0.0019136 ms. I0312 15:29:30.428328 2367 caffe.cpp:409] drop7 forward: 0.0126016 ms. I0312 15:29:30.428339 2367 caffe.cpp:412] drop7 backward: 0.0018944 ms. I0312 15:29:30.428347 2367 caffe.cpp:409] fc8 forward: 0.109283 ms. I0312 15:29:30.428378 2367 caffe.cpp:412] fc8 backward: 2.68584 ms. I0312 15:29:30.428388 2367 caffe.cpp:409] prob forward: 0.0146496 ms. I0312 15:29:30.428395 2367 caffe.cpp:412] prob backward: 0.0018528 ms. I0312 15:29:30.428421 2367 caffe.cpp:417] Average Forward pass: 55.8925 ms. I0312 15:29:30.428429 2367 caffe.cpp:419] Average Backward pass: 65.4428 ms. I0312 15:29:30.430272 2367 caffe.cpp:421] Average Forward-Backward: 127.954 ms. I0312 15:29:30.430285 2367 caffe.cpp:423] Total Time: 1279.54 ms. I0312 15:29:30.430291 2367 caffe.cpp:424] *** Benchmark ends ***

    The sum of forward_time_per_layer is not equal to the average forward pass(2.01ms < 55.89ms) , please help me to solve it, thanks very much.

    opened by everjcc 0
  • make runtest error: no CUDA-capable device is detected

    make runtest error: no CUDA-capable device is detected

    Issue summary


    I'm installing caffe 1.0 on wsl2 Ubuntu 20.04 I already managed to get make all make test to run without error.

    However, when I run make runtest, I got a bunch of errors.

    (base) b***@DESKTOP-****:/mnt/c/Users/bx/caffe-1.0$ make runtest
    caffe: command line brew
    usage: caffe <command> <args>
      train           train or finetune a model
      test            score a model
      device_query    show GPU diagnostic information
      time            benchmark model execution time
      Flags from tools/caffe.cpp:
        -gpu (Optional; run in GPU mode on given device IDs separated by ','.Use
          '-gpu all' to run on all available GPUs. The effective training batch
          size is multiplied by the number of devices.) type: string default: ""
        -iterations (The number of iterations to run.) type: int32 default: 50
        -level (Optional; network level.) type: int32 default: 0
        -model (The model definition protocol buffer text file.) type: string
          default: ""
        -phase (Optional; network phase (TRAIN or TEST). Only used for 'time'.)
          type: string default: ""
        -sighup_effect (Optional; action to take when a SIGHUP signal is received:
          snapshot, stop or none.) type: string default: "snapshot"
        -sigint_effect (Optional; action to take when a SIGINT signal is received:
          snapshot, stop or none.) type: string default: "stop"
        -snapshot (Optional; the snapshot solver state to resume training.)
          type: string default: ""
        -solver (The solver definition protocol buffer text file.) type: string
          default: ""
        -stage (Optional; network stages (not to be confused with phase), separated
          by ','.) type: string default: ""
        -weights (Optional; the pretrained weights to initialize finetuning,
          separated by ','. Cannot be set simultaneously with snapshot.)
          type: string default: ""
    .build_release/test/test_all.testbin 0 --gtest_shuffle
    Cuda number of devices: 0
    Setting to use device 0
    Current device id: 0
    Current device name:
    Note: Randomizing tests' orders with a seed of 55461 .
    [==========] Running 2101 tests from 277 test cases.
    [----------] Global test environment set-up.
    [----------] 5 tests from EmbedLayerTest/1, where TypeParam = caffe::CPUDevice<double>
    [ RUN      ] EmbedLayerTest/1.TestForwardWithBias
    E0307 22:13:32.392771  7483 common.cpp:114] Cannot create Cublas handle. Cublas won't be available.
    E0307 22:13:32.432719  7483 common.cpp:121] Cannot create Curand generator. Curand won't be available.
    [       OK ] EmbedLayerTest/1.TestForwardWithBias (114 ms)
    [ RUN      ] EmbedLayerTest/1.TestGradient
    E0307 22:13:32.469251  7483 common.cpp:141] Curand not available. Skipping setting the curand seed.
    [       OK ] EmbedLayerTest/1.TestGradient (7 ms)
    [ RUN      ] EmbedLayerTest/1.TestForward
    [       OK ] EmbedLayerTest/1.TestForward (0 ms)
    [ RUN      ] EmbedLayerTest/1.TestSetUp
    [       OK ] EmbedLayerTest/1.TestSetUp (0 ms)
    [ RUN      ] EmbedLayerTest/1.TestGradientWithBias
    [       OK ] EmbedLayerTest/1.TestGradientWithBias (11 ms)
    [----------] 5 tests from EmbedLayerTest/1 (132 ms total)
    [----------] 8 tests from SliceLayerTest/2, where TypeParam = caffe::GPUDevice<float>
    [ RUN      ] SliceLayerTest/2.TestGradientTrivial
    F0307 22:13:32.488232  7483 syncedmem.hpp:22] Check failed: error == cudaSuccess (100 vs. 0)  no CUDA-capable device is detected
    *** Check failure stack trace: ***
        @     0x7fc281c001c3  google::LogMessage::Fail()
        @     0x7fc281c0525b  google::LogMessage::SendToLog()
        @     0x7fc281bffebf  google::LogMessage::Flush()
        @     0x7fc281c006ef  google::LogMessageFatal::~LogMessageFatal()
        @     0x7fc280783103  caffe::SyncedMemory::mutable_cpu_data()
        @     0x7fc280600779  caffe::Blob<>::Reshape()
        @     0x7fc280600bce  caffe::Blob<>::Reshape()
        @     0x7fc280600c80  caffe::Blob<>::Blob()
        @     0x55ab7cfc2a6a  caffe::SliceLayerTest<>::SliceLayerTest()
        @     0x55ab7cfc2e20  testing::internal::TestFactoryImpl<>::CreateTest()
        @     0x55ab7d0633c1  testing::internal::HandleExceptionsInMethodIfSupported<>()
        @     0x55ab7d05b106  testing::TestInfo::Run()
        @     0x55ab7d05b265  testing::TestCase::Run()
        @     0x55ab7d05b78c  testing::internal::UnitTestImpl::RunAllTests()
        @     0x55ab7d05b857  testing::UnitTest::Run()
        @     0x55ab7cb36217  main
        @     0x7fc2801060b3  __libc_start_main
        @     0x55ab7cb3dd9e  _start
    make: *** [Makefile:534: runtest] Aborted

    NO CUDA issue

    It cannot find CUDA!! But I have CUDA and driver installed. Verified with: nvcc --version and nvidia-smi

    (base) b***@DESKTOP-****:/mnt/c/Users/bx/caffe-1.0$ nvcc --version
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2022 NVIDIA Corporation
    Built on Thu_Feb_10_18:23:41_PST_2022
    Cuda compilation tools, release 11.6, V11.6.112
    Build cuda_11.6.r11.6/compiler.30978841_0
    (base) b***@DESKTOP-****:/mnt/c/Users/bx/caffe-1.0$ nvidia-smi
    Mon Mar  7 22:25:20 2022
    | NVIDIA-SMI 510.47.03    Driver Version: 511.79       CUDA Version: 11.6     |
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
    |  0%   51C    P8    11W / 120W |    431MiB /  3072MiB |      5%      Default |
    |                               |                      |                  N/A |
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |  No running processes found                                                 |


    ## Refer to
    # Contributions simplifying and improving our build system are welcome!
    # cuDNN acceleration switch (uncomment to build with cuDNN).
    USE_CUDNN := 1
    # CPU-only switch (uncomment to build without GPU support).
    # CPU_ONLY := 1
    # uncomment to disable IO dependencies and corresponding data layers
    # USE_OPENCV := 0
    # USE_LEVELDB := 0
    # USE_LMDB := 0
    # uncomment to allow MDB_NOLOCK when reading LMDB files (only if necessary)
    #	You should not set this flag if you will be reading LMDBs with any
    #	possibility of simultaneous read and write
    # Uncomment if you're using OpenCV 3
    # To customize your choice of compiler, uncomment and set the following.
    # N.B. the default for Linux is g++ and the default for OSX is clang++
    # CUSTOM_CXX := g++
    # CUDA directory contains bin/ and lib/ directories that we need.
    CUDA_DIR := /usr/local/cuda
    # CUDA_DIR := /usr/local/cuda-11.6
    # On Ubuntu 14.04, if cuda tools are installed via
    # "sudo apt-get install nvidia-cuda-toolkit" then use this instead:
    # CUDA_DIR := /usr
    # CUDA architecture setting: going with all of them.
    # For CUDA < 6.0, comment the *_50 through *_61 lines for compatibility.
    # For CUDA < 8.0, comment the *_60 and *_61 lines for compatibility.
    CUDA_ARCH := -gencode arch=compute_50,code=sm_50 \
    		#-gencode arch=compute_20,code=sm_20 \
    		#-gencode arch=compute_20,code=sm_21 \
    		#-gencode arch=compute_30,code=sm_30 \
    		#-gencode arch=compute_35,code=sm_35 \
    		#-gencode arch=compute_50,code=sm_50 \
    		-gencode arch=compute_52,code=sm_52 \
    		-gencode arch=compute_60,code=sm_60 \
    		-gencode arch=compute_61,code=sm_61 \
    		-gencode arch=compute_61,code=compute_61
    # BLAS choice:
    # atlas for ATLAS (default)
    # mkl for MKL
    # open for OpenBlas
    BLAS := atlas
    # Custom (MKL/ATLAS/OpenBLAS) include and lib directories.
    # Leave commented to accept the defaults for your choice of BLAS
    # (which should work)!
    # BLAS_INCLUDE := /path/to/your/blas
    # BLAS_LIB := /path/to/your/blas
    # Homebrew puts openblas in a directory that is not on the standard search path
    # BLAS_INCLUDE := $(shell brew --prefix openblas)/include
    # BLAS_LIB := $(shell brew --prefix openblas)/lib
    # This is required only if you will compile the matlab interface.
    # MATLAB directory should contain the mex binary in /bin.
    # MATLAB_DIR := /usr/local
    # MATLAB_DIR := /Applications/
    # NOTE: this is required only if you will compile the python interface.
    # We need to be able to find Python.h and numpy/arrayobject.h.
    # PYTHON_INCLUDE := /usr/include/python2.7 \
    		# /usr/lib/python2.7/dist-packages/numpy/core/include
    # Anaconda Python distribution is quite popular. Include path:
    # Verify anaconda location, sometimes it's in root.
    # ANACONDA_HOME := $(HOME)/anaconda
    ANACONDA_HOME := /home/bear233/anaconda3
    # PYTHON_INCLUDE := $(ANACONDA_HOME)/include \
    		# $(ANACONDA_HOME)/include/python3.9 \
    		# $(ANACONDA_HOME)/lib/python2.7/site-packages/numpy/core/include
    # Uncomment to use Python 3 (default is Python 2)
     PYTHON_LIBRARIES := boost_python3 python3.8
     PYTHON_INCLUDE := /usr/include/python3.8 \
                     # /usr/lib/python3.8/dist-packages/numpy/core/include
    # We need to be able to find or .dylib.
    PYTHON_LIB := /usr/lib
    # Homebrew installs numpy in a non standard path (keg only)
    # PYTHON_INCLUDE += $(dir $(shell python -c 'import numpy.core; print(numpy.core.__file__)'))/include
    # PYTHON_LIB += $(shell brew --prefix numpy)/lib
    # Uncomment to support layers written in Python (will link against Python libs)
    # Whatever else you find you need goes here.
    INCLUDE_DIRS := $(PYTHON_INCLUDE)/usr/local/incllude  /usr/include/hdf5/serial/
    # LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib
    LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/hdf5/serial
    # If Homebrew is installed at a non standard location (for example your home directory) and you use it for general dependencies
    # INCLUDE_DIRS += $(shell brew --prefix)/include
    # LIBRARY_DIRS += $(shell brew --prefix)/lib
    # NCCL acceleration switch (uncomment to build with NCCL)
    # (last tested version: v1.2.3-1+cuda8.0)
    # USE_NCCL := 1
    # Uncomment to use `pkg-config` to specify OpenCV library paths.
    # (Usually not necessary -- OpenCV libraries are normally installed in one of the above $LIBRARY_DIRS.)
    # USE_PKG_CONFIG := 1
    # N.B. both build and distribute dirs are cleared on `make clean`
    BUILD_DIR := build
    DISTRIBUTE_DIR := distribute
    # Uncomment for debugging. Does not work on OSX due to
    # DEBUG := 1
    # The ID of the GPU that 'make runtest' will use to run unit tests.
    TEST_GPUID := 0
    # enable pretty build (comment to see full commands)
    Q ?= @

    System configuration

    • Operating system: Linux(WSL2)
    • CUDA version (if applicable): 11.6
    • CUDNN version (if applicable): 8.3.2
    • Python version (if using pycaffe): 3.9.7

    Could someone please help me with it? I already tried out most solutions I found on internet and no luck. Many thanks!!!!!!

    opened by bxiong97 1
  • 1.0(Apr 18, 2017)

    This release marks the convergence of development into a stable, reference release of the framework and a shift into maintenance mode. Let's review the progress culminating in our 1.0:

    • research: nearly 4,000 citations, usage by award papers at CVPR/ECCV/ICCV, and tutorials at ECCV'14 and CVPR'15
    • industry: adopted by Facebook, NVIDIA, Intel, Sony, Yahoo! Japan, Samsung, Adobe, A9, Siemens, Pinterest, the Embedded Vision Alliance, and more
    • community: 250+ contributors, 15k+ subscribers on github, and 7k+ members of the mailing list
    • development: 10k+ forks, >1 contribution/day on average, and dedicated branches for OpenCL and Windows
    • downloads: 10k+ downloads and updates a month, ~50k unique visitors to the home page every two weeks, and >100k unique downloads of the reference models
    • winner of the ACM MM open source award 2014 and presented as a talk at ICML MLOSS 2015

    Thanks for all of your efforts leading us to Caffe 1.0! Your part in development, community, feedback, and framework usage brought us here. As part of 1.0 we will be welcoming collaborators old and new to join as members of the Caffe core.

    Stay tuned for the next steps in DIY deep learning with Caffe. As development is never truly done, there's always 1.1!

    Now that 1.0 is done, the next generation of the framework—Caffe2—is ready to keep up the progress on DIY deep learning in research and industry. While Caffe 1.0 development will continue with 1.1, Caffe2 is the new framework line for future development led by Yangqing Jia. Although Caffe2 is a departure from the development line of Caffe 1.0, we are planning a migration path for models just as we have future-proofed Caffe models in the past.

    Happy brewing, The Caffe Crew


    Source code(tar.gz)
    Source code(zip)
  • rc5(Feb 21, 2017)

    This packages up 42 commits by 15 contributors to help hone in on 1.0. Thanks all!

    With all releases one should do make clean && make superclean to clear out old materials before compiling the new release.

    • set soversion properly #5296
    • documentation: improved dockerfiles and usage notes #5153, links and fixes #5227
    • build: groom cmake build #4609, find veclib more reliably on mac #5236
    • pycaffe: give Net a layer dictionary #4347
    • matcaffe: destroy individual nets and solvers #4737


    • restore solvers for resuming multi-GPU training #5215
    • draw net helper #5010


    Source code(tar.gz)
    Source code(zip)
  • rc4(Jan 20, 2017)

    It's a new year and a new release candidate. This packages up 348 commits by 68 authors. Thanks all!

    This is intended to be the last release candidate before 1.0. We hope to catch any lurking issues, improve documentation, and polish the packaging for then.

    With all releases one should do make clean && make superclean to clear out old materials before compiling the new release. See all merged PRs since the last release.

    • RNNs + LSTMs #3948
    • layers
      • Parameter layer for learning any bottom #2047
      • Crop layer for aligning coordinate maps for FCNs #3570
      • Tied weights with transpose for InnerProduct layer #3612
      • Batch Norm docs, numerics, and robust proto def #4704 #5184
      • Sigmoid Cross Entropy Loss on GPU #4908 and with ignore #4986
    • pycaffe
      • solver callbacks #3020
      • net spec coordinate mapping and cropping for FCNs #3613
      • N-D blob interface #3703
      • python3 compatibility by six #3716
      • dictionary-style net spec #3747
      • Python layer can have phase #3995
    • Docker image #3518
    • expose all NetState options for all-in-one nets #3863
    • force backprop on or off by propagate_down #3942
    • cuDNN v5 #4159
    • multi-GPU parallelism through NCCL + multi-GPU python interface #4563


    • Net upgrade tools catch mixed versions, handle input fields, and log outputs #3755
    • Exp layer for base e and shift != 0 #3937
    • Crop layer checks only the crop dimensions it should #3993


    • cuDNN compatibility is now at v5 + v4 and cuDNN v3 and earlier are not supported
    • NCCL is now required for multi-GPU operation

    As a reminder the OpenCL and Windows branches continue to make progress with the community leadership of Fabian Tschopp and Guillaume Dumont resp.


    Source code(tar.gz)
    Source code(zip)
  • rc3(Jan 30, 2016)

    A lot has happened since the last release! This packages up ~800 commits by 119 authors. Thanks all!

    With all releases one should do make clean && make superclean to clear out old materials before compiling the new release.

    • layers
      • batch normalization #3229 #3299
      • scale + bias layers #3591
      • PReLU #1940 #2414, ELU #3388, and log #2090 non-linearities
      • tile layer #2083, reduction layer #2089
      • embed layer #2032
      • spatial pyramid pooling #2117
      • batch reindex layer #2966
      • filter layer #2054
    • solvers: Adam #2918, RMSProp #2867, AdaDelta #2782
      • accumulate gradients to decouple computational and learning batch size #1977
      • de-duplicate solver code #2518
      • make solver type a string and split classes #3166 -- you should update your solver definitions
    • MSRA #1946 and bilinear interpolation #2213 weight fillers
    • N-D blobs #1970 and convolution #2049 for higher dimensional data and filters
    • tools:
      • test caffe command line tool execution #1926
      • network summarization tool #3090
      • snapshot on signal / before quit #2253
      • report ignored layers when loading weights #3305
      • caffe command fine-tunes from multiple caffemodels #1456
    • pycaffe:
      • python net spec #2086 #2813 #2959
      • handle python exceptions #2462
      • python layer arguments #2871
      • python layer weights #2944
      • snapshot in pycaffe #3082
      • top + bottom names in pycaffe #2865
      • python3 compatibility improvements
    • matcaffe: totally new interface with examples and tests #2505
    • cuDNN: switch to v2 #2038, switch to v3 #3160, make v4 compatible #3439
    • separate IO dependencies for configurable build #2523
    • large model and solverstate serialization through hdf5 #2836
    • train by multi-GPU data parallelism #2903 #2921 #2924 #2931 #2998
    • dismantle layer headers so every layer has its own include #3315
    • workflow: adopt build versioning #3311 #3593, contributing guide #2837, and badges for build status and license #3133
    • SoftmaxWithLoss normalization options #3296
    • dilated convolution #3487
    • expose Solver Restore() to C++ and Python #2037
    • set mode once and only once in testing #2511
    • turn off backprop by skip_propagate_down #2095
    • flatten layer learns axis #2082
    • trivial slice and concat #3014
    • hdf5 data layer: loads integer data #2978, can shuffle #2118
    • cross platform adjustments #3300 #3320 #3321 #3362 #3361 #3378
    • speed-ups for GPU solvers #3519 and CPU im2col #3536
    • make and cmake build improvements
    • and more!


    • #2866 fix weight sharing to (1) reduce memory usage and computation (2) correct momentum and other solver computations
    • #2972 fix concat (broken in #1970)
    • #2964 #3162 fix MVN layer
    • #2321 fix contrastive loss layer to match Hadsell et al. 2006
    • fix deconv backward #3095 and conv reshape #3096 (broken in #2049)
    • #3393 fix in-place reshape and flatten
    • #3152 fix silence layer to not zero bottom on backward
    • #3574 disable cuDNN max pooling (incompatible with in-place)
    • make backward compatible with negative LR #3007
    • #3332 fix pycaffe forward_backward_all()
    • #1922 fix cross-channel LRN for large channel band
    • #1457 fix shape of C++ feature extraction demo output


    • hdf5 is required
    • cuDNN compatibility is now at v3 + v4 and cuDNN v1 and v2 are not supported
    • IO dependencies (lmdb, leveldb, opencv) are now optional #2523


    Source code(tar.gz)
    Source code(zip)
  • rc2(Feb 20, 2015)

    This is the release candidate for Caffe 1.0 once more with feeling. See #1849 for details.

    With documentation, fixes, and feedback this could soon be 1.0!

    Source code(tar.gz)
    Source code(zip)
  • rc(Sep 19, 2014)

    This is the release candidate for Caffe 1.0. See #1112 for details.

    • documentation
    • standard model format and model zoo for sharing models
    • cuDNN acceleration
    Source code(tar.gz)
    Source code(zip)
  • v0.9999(Aug 8, 2014)

    See #880 for details.

    Dependencies: lmdb and gflags are required. CPU-only Caffe without any GPU / CUDA dependencies is turned on by setting CPU_ONLY := 1 in your Makefile.config.

    Deprecations: the new caffe tool includes commands for model training and testing, querying devices, and timing models. The corresponding train_net.bin, finetune_net.bin, test_net.bin, device_query.bin, and net_speed_benchmark.bin are deprecated.

    Source code(tar.gz)
    Source code(zip)
  • acm-mm-oss(May 24, 2014)

  • v0.999(May 20, 2014)

    See #429 for details.

    Please upgrade your models! Caffe's proto definition was changed in #208 and #219 for extensibility. The upgrade_net_proto_binary.bin and upgrade_net_proto_text.bin tools are provided to convert current models. Caffe will attempt to automagically upgrade old models when loaded, but doesn't save the changes.

    Update your Makefile.config! Caffe has a new Makefile and Makefile.config that learned to auto-configure themselves a bit better. Look at the new Makefile.config.example and update your configuration accordingly.

    Dependencies: Caffe's matrix and vector computations can be done with ATLAS, OpenBLAS, or MKL. The hard dependency on MKL is no more!

    Deprecation: V0 model definitions. While Caffe will try to automagically upgrade old models when loaded, see tools/upgrade_net_proto* to make the permanent upgrade since this will be dropped.

    Source code(tar.gz)
    Source code(zip)
  • v0.99(Mar 20, 2014)

    See #231 for details.

    New Dependency: hdf5 is now required. Caffe learned how to load blobs and (multiple!) labels from hdf5.

    • sudo apt-get install libhdf5-serial-dev for ubuntu.
    • brew install homebrew/science/hdf5 for osx.

    Deprecation: padding layers. See 2848aa1f8da0272797ee51234293dfa87eda266a for an example of how to update your model schema and note that an automated tool is coming for this and other model schema updates #219.

    Source code(tar.gz)
    Source code(zip)
  • rcnn-release(Mar 20, 2014)

  • v0.9(Mar 19, 2014)

Berkeley Vision and Learning Center
Autonomous Perception Research
Berkeley Vision and Learning Center
Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models

Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine DSSTNE (pronounced "Destiny") is an open source software library for training and deploying

Amazon Archives 4.4k Jun 26, 2022
yolov5 onnx caffe

环境配置 ubuntu:18.04 cuda:10.0 cudnn:7.6.5 caffe: 1.0 OpenCV:3.4.2 Anaconda3:5.2.0 相关的安装包我已经放到百度云盘,可以从如下链接下载:

null 49 Jun 22, 2022
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.

iNeural A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms. What is a Neural Network? Work on

Fatih Küçükkarakurt 5 Apr 5, 2022
header only, dependency-free deep learning framework in C++14

The project may be abandoned since the maintainer(s) are just looking to move on. In the case anyone is interested in continuing the project, let us k

tiny-dnn 5.5k Jun 22, 2022
TFCC is a C++ deep learning inference framework.

TFCC is a C++ deep learning inference framework.

Tencent 108 May 19, 2022
KSAI Lite is a deep learning inference framework of kingsoft, based on tensorflow lite

KSAI Lite English | 简体中文 KSAI Lite是一个轻量级、灵活性强、高性能且易于扩展的深度学习推理框架,底层基于tensorflow lite,定位支持包括移动端、嵌入式以及服务器端在内的多硬件平台。 当前KSAI Lite已经应用在金山office内部业务中,并逐步支持金山

null 75 Apr 14, 2022
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

Mobile AI Compute Engine (or MACE for short) is a deep learning inference framework optimized for mobile heterogeneous computing on Android, iOS, Linux and Windows devices.

Xiaomi 4.6k Jun 26, 2022
Plaidml - PlaidML is a framework for making deep learning work everywhere.

A platform for making deep learning work everywhere. Documentation | Installation Instructions | Building PlaidML | Contributing | Troubleshooting | R

PlaidML 4.4k Jun 27, 2022
CubbyDNN - Deep learning framework using C++17 in a single header file

CubbyDNN CubbyDNN is C++17 implementation of deep learning. It is suitable for deep learning on limited computational resource, embedded systems and I

Chris Ohk 31 May 30, 2022
Caffe2 is a lightweight, modular, and scalable deep learning framework.

Source code now lives in the PyTorch repository. Caffe2 Caffe2 is a lightweight, modular, and scalable deep learning framework. Building on the origin

Meta Archive 8.4k Jun 22, 2022
Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit ( is a unified deep learning toolkit that describes

Microsoft 17.2k Jun 24, 2022
An Open-Source Analytical Placer for Large Scale Heterogeneous FPGAs using Deep-Learning Toolkit

DREAMPlaceFPGA An Open-Source Analytical Placer for Large Scale Heterogeneous FPGAs using Deep-Learning Toolkit. This work leverages the open-source A

Rachel Selina Rajarathnam 14 May 30, 2022
Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

The Microsoft Cognitive Toolkit is a unified deep learning toolkit that describes neural networks as a series of computational steps via a directed graph.

Microsoft 17.2k Jun 26, 2022
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

This is the Vowpal Wabbit fast online learning code. Why Vowpal Wabbit? Vowpal Wabbit is a machine learning system which pushes the frontier of machin

Vowpal Wabbit 8k Jun 19, 2022
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Apache MXNet (incubating) for Deep Learning Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to m

The Apache Software Foundation 20k Jun 23, 2022
LibDEEP BSD-3-ClauseLibDEEP - Deep learning library. BSD-3-Clause

LibDEEP LibDEEP is a deep learning library developed in C language for the development of artificial intelligence-based techniques. Please visit our W

Joao Paulo Papa 18 Mar 15, 2022
Forward - A library for high performance deep learning inference on NVIDIA GPUs

a library for high performance deep learning inference on NVIDIA GPUs.

Tencent 123 Mar 17, 2021
A library for high performance deep learning inference on NVIDIA GPUs.

Forward - A library for high performance deep learning inference on NVIDIA GPUs Forward - A library for high performance deep learning inference on NV

Tencent 502 May 31, 2022