oneAPI Data Analytics Library (oneDAL)

Overview

Intel® oneAPI Data Analytics Library

Installation   |   Documentation   |   Support   |   Examples   |   Samples   |   How to Contribute   

Build Status License Join the community on GitHub Discussions

Intel® oneAPI Data Analytics Library (oneDAL) is a powerful machine learning library that helps speed up big data analysis. oneDAL solvers are also used in Intel Distribution for Python for scikit-learn optimization.

Intel® oneAPI Data Analytics Library is an extension of Intel® Data Analytics Acceleration Library (Intel® DAAL).

Table of Contents

Build your high-performance data science application with oneDAL

oneDAL uses all capabilities of Intel® hardware, which allows you to get a significant performance boost for the classic machine learning algorithms.

We provide highly optimized algorithmic building blocks for all stages of data analytics: preprocessing, transformation, analysis, modeling, validation, and decision making.

oneDAL also provides Data Parallel C++ (DPC++) API extensions to the traditional C++ interfaces.

The size of the data is growing exponentially as does the need for high-performance and scalable frameworks to analyze all this data and benefit from it. Besides superior performance on a single node, oneDAL also provides distributed computation mode that shows excellent results for strong and weak scaling:

oneDAL K-Means fit, strong scaling result oneDAL K-Means fit, weak scaling results

Technical details: FPType: float32; HW: Intel Xeon Processor E5-2698 v3 @2.3GHz, 2 sockets, 16 cores per socket; SW: Intel® DAAL (2019.3), MPI4Py (3.0.0), Intel® Distribution Of Python (IDP) 3.6.8; Details available in the article https://arxiv.org/abs/1909.11822

Refer to our examples and documentation for more information about our API.

Python API

oneDAL has a Python API that is provided as a standalone Python library called daal4py.

The example below shows how daal4py can be used to calculate K-Means clusters:

import numpy as np
import pandas as pd
import daal4py as d4p

data = pd.read_csv("local_kmeans_data.csv", dtype = np.float32)

init_alg = d4p.kmeans_init(nClusters = 10,
                           fptype = "float",
                           method = "randomDense")

centroids = init_alg.compute(data).centroids
alg = d4p.kmeans(nClusters = 10, maxIterations = 50, fptype = "float",
                 accuracyThreshold = 0, assignFlag = False)
result = alg.compute(data, centroids)

Scikit-learn patching

With a Python API provided by daal4py, you can create scikit-learn compatible estimators, transformers, or clusterers that are powered by oneDAL and are nearly as efficient as native programs.

Speedup of oneDAL-powered scikit-learn over the original scikit-learn, 28 cores, 1 thread/core
Technical details: FPType: float32; HW: Intel(R) Xeon(R) Platinum 8276L CPU @ 2.20GHz, 2 sockets, 28 cores per socket; SW: scikit-learn 0.22.2, Intel® DAAL (2019.5), Intel® Distribution Of Python (IDP) 3.7.4; Details available in the article https://medium.com/intel-analytics-software/accelerate-your-scikit-learn-applications-a06cacf44912

daal4py have an API that matches scikit-learn API. This framework allows you to speed up your existing projects by changing one line of code.

from daal4py.sklearn.svm import SVC
from sklearn.datasets import load_digits

digits = load_digits()
X, y = digits.data, digits.target

svm = SVC(kernel='rbf', gamma='scale', C = 0.5).fit(X, y)
print(svm.score(X, y))

In addition, daal4py provides an option to replace some scikit-learn methods by oneDAL solvers, which makes it possible to get a performance gain without any code changes. This approach is the basis of Intel distribution for Python scikit-learn. You can patch the stock scikit-learn by using the following command-line flag:

python -m daal4py my_application.py

Patches can also be enabled programmatically:

from sklearn.svm import SVC
from sklearn.datasets import load_digits
from time import time

svm_sklearn = SVC(kernel="rbf", gamma="scale", C=0.5)

digits = load_digits()
X, y = digits.data, digits.target

start = time()
svm_sklearn = svm_sklearn.fit(X, y)
end = time()
print(end - start) # output: 0.141261...
print(svm_sklearn.score(X, y)) # output: 0.9905397885364496

from daal4py.sklearn import patch_sklearn
patch_sklearn() # <-- apply patch
from sklearn.svm import SVC

svm_d4p = SVC(kernel="rbf", gamma="scale", C=0.5)

start = time()
svm_d4p = svm_d4p.fit(X, y)
end = time()
print(end - start) # output: 0.032536...
print(svm_d4p.score(X, y)) # output: 0.9905397885364496

Distributed multi-node mode

Data scientists often require different tools for analysis of regular and big data. daal4py offers various processing models, which makes it easy to enable distributed multi-node mode.

import numpy as np
import pandas as pd
import daal4py as d4p

d4p.daalinit() # <-- Initialize SPMD mode
data = pd.read_csv("local_kmeans_data.csv", dtype = np.float32)

init_alg = d4p.kmeans_init(nClusters = 10,
                           fptype = "float",
                           method = "randomDense",
                           distributed = True) # <-- change model to distributed

centroids = init_alg.compute(data).centroids

alg = d4p.kmeans(nClusters = 10, maxIterations = 50, fptype = "float",
                 accuracyThreshold = 0, assignFlag = False,
                 distributed = True)  # <-- change model to distributed

result = alg.compute(data, centroids)

For more details browse daal4py documentation.

oneDAL Apache Spark MLlib samples

oneDAL provides Scala and Java interfaces that match Apache Spark MlLib API and use oneDAL solvers under the hood. This implementation allows you to get a 3-18X increase in performance compared to the default Apache Spark MLlib.

Technical details: FPType: double; HW: 7 x m5.2xlarge AWS instances; SW: Intel DAAL 2020 Gold, Apache Spark 2.4.4, emr-5.27.0; Spark config num executors 12, executor cores 8, executor memory 19GB, task cpus 8

Check the samples tab for more details.

Installation

You can install oneDAL:

Installation from Source

See Installation from Sources for details.

Examples

Beside C++ and Python API, oneDAL also provides APIs for DPC++ and Java:

Documentation

Refer to GitHub Wiki to browse the full list of oneDAL and daal4py resources.

Support

Ask questions and engage in discussions with oneDAL developers, contributers, and other users through the following channels:

You may reach out to project maintainers privately at [email protected].

Security

To report a vulnerability, refer to Intel vulnerability reporting policy.

Contribute

Report issues and make feature requests using GitHub Issues.

We welcome community contributions, so check our contributing guidelines to learn more.

Feedback

Use GitHub Wiki to provide feedback about oneDAL.

Samples

Samples are examples of how oneDAL can be used in different applications:

Technical Preview Features

Technical preview features are introduced to gain early feedback from developers. A technical preview feature is subject to change in the future releases. Using a technical preview feature in a production code base is therefore strongly discouraged.

In C++ APIs, technical preview features are located in daal::preview and oneapi::dal::preview namespaces. In Java APIs, technical preview features are located in packages that have the com.intel.daal.preview name prefix.

The preview features list:

  • Graph Analytics:
    • Undirected graph without edge and vertex weights (undirected_adjacency_vector_graph), where vertex indices can only be of type int32
    • Jaccard Similarity Coefficients for all pairs of vertices, a batch algorithm that processes the graph by blocks
    • Local and Global Triangle Counting

oneDAL and Intel® DAAL

Intel® oneAPI Data Analytics Library is an extension of Intel® Data Analytics Acceleration Library (Intel® DAAL).

This repository contains branches corresponding to both oneAPI and classical versions of the library. We encourage you to use oneDAL located under the master branch.

Product Latest release Branch
oneDAL 2021.1 master
rls/2021-gold-mnt
Intel® DAAL 2020 Update 3 rls/daal-2020-u3-rls

License

Distributed under the Apache License 2.0 license. See LICENSE for more information.

Comments
  • FreeBSD port

    FreeBSD port

    I'm trying to port daal into FreeBSD. The library itself compiles fine, but when I try to compile a sample, I'm getting a lot of unresolved references defined in "libdaal_mkl_*.a" libraries. Is it possible to obtain a source code of these libraries to compile them natively and finalize porting? Thanks.

    enhancement 
    opened by rayrapetyan 30
  • online covariance improvements

    online covariance improvements

    Description

    Please include a summary of the change. For large or complex changes please include enough information to introduce your change and explain motivation for it.

    Changes proposed in this pull request:

    dpc++ 
    opened by mchernov-intel 23
  • DPCPP KNN

    DPCPP KNN

    • [x] Extend ndview functionality with slicing and copying functionality, test it
    • [x] Develop search primitive and test it
    • [x] Develop voting primitive and test it
    • [x] Extend selection functionality, fix bugs and test it
    • [x] Compose changes into the full pipeline of the kNN algorithm
    • [x] Check performance and fix degradations if needed
    • [x] Extend kNN-GPU functionality with the full range of Minkowsky metrics

    Depends on this PR

    dpc++ oneAPI 
    opened by KulikovNikita 22
  • Fix for threading init.

    Fix for threading init.

    This PR fixes case when we use internal threading before using getInstance()->setNumberOfThreads(num_threads) function to change number of using threads.

    opened by KalyanovD 20
  • KNN algorithm: added runtime dispatching by K; added faster selection…

    KNN algorithm: added runtime dispatching by K; added faster selection…

    … method for 1 <= K <= 32

    Description

    Main goal was to customize selection methods for different ranges of K in brute force K-Nearest-Neighbors (KNN) algorithm.

    Changes proposed in this pull request: -Run-time dispatching by K was added -Faster selection was added for K <= 32

    dpc++ 
    opened by amgrigoriev 20
  • LightGBM/Xgboost c++ example?

    LightGBM/Xgboost c++ example?

    Daal4py has an example on converting lightgbm and xgboost models to python and use them for inference.

    Is the same possible for C++? Using python to train a model and doing inference in c++?

    opened by skaae 18
  • graph functionality introduction

    graph functionality introduction

    Description

    Please include a summary of the change. For large or complex changes please include enough information to introduce your change and explain motivation for it.

    Changes proposed in this pull request:

    backport-2021 
    opened by bysheva 18
  • New graph structure

    New graph structure

    Initial graph structure with example of one graph service function. Note: load functionality is not ready as an example, it is working temp version. The things like cout will be removed. Their purpose to demonstrate the functionality. Jaccard is not adapted yet.

    opened by orrrrtem 17
  • DAAL 2020 Spark kmeans crash

    DAAL 2020 Spark kmeans crash

    Using DAAL 2020 in Spark kmeans has crash problem. The error log :

    A fatal error has been detected by the Java Runtime Environment:

    SIGSEGV (0xb) at pc=0x00007f30b7cb86cf, pid=21839, tid=0x00007f30c68f4700

    JRE version: OpenJDK Runtime Environment (8.0_161-b16) (build 1.8.0_161-internal-Cloud_Programming_framework_TD_V100R001C00B223-b16) Java VM: OpenJDK 64-Bit Server VM (25.161-b16 mixed mode linux-amd64 ) Problematic frame: C [libJavaAPI.so+0x22396cf] daal::services::interface1::Status daal::JavaNumericTable<10010>::getTBlock(unsigned long, unsigned long, daal::data_management::ReadWriteMode, daal::data_management::interface1::BlockDescriptor&, char const*, char const*)+0x12f

    Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

    An error report file with more information is saved as: /data/sdj/nm-local-dir/usercache/root/appcache/application_1572491123048_0003/container_1572491123048_0003_01_000027/hs_err_pid21839.log [thread 139846865626880 also had an error]

    Environment info: OS : CentOS Linux release 7.6.1810 DAAL: 2020 HADOOP: 2.7.6 Spark:2.3.0 JDK: 1.8.0 HiBench: 7.0

    Test procedure:

    1. Install and config hadoop/spark/daal.
    2. Test kmeans using hibench shell and cmd : source /opt/intel/daal/bin/daalvars.sh intel64 SHAREDLIBS=${DAALROOT}/lib/${daal_ia}_lin/libJavaAPI.so,${DAALROOT}/../tbb/lib/${daal_ia}_lin/gcc4.4/libtbb.so.2,${DAALROOT}/../tbb/lib/${daal_ia}_lin/gcc4.4/libtbbmalloc.so.2 echo "==============$LD_LIBRARY_PATH" run_spark_job --jars ${DAALROOT}/lib/daal.jar --files ${SHAREDLIBS} com.intel.daal.sparkbench.ml.DaalKMeans -k $K --numIterations $MAX_ITERATION $INPUT_HDFS/samples

    LOG file:

    hs_err_pid21839.log.txt

    opened by MacChen01 17
  • Refine usage of daal namespaces in examples

    Refine usage of daal namespaces in examples

    daal::data_management entities are used in the C++ samples, but they are not injected explicitly. service.h header provides such injection, but its content is:

    Auxiliary functions used in C++ examples

    Attempts to inject the sample code into standalone code will fail in case users won't need service.h header and utilities.

    opened by PovelikinRostislav 17
  • Logistic Regression prediction CPU optimizations

    Logistic Regression prediction CPU optimizations

    Perf for Airline data set:

    NT | FPType | Before, ms | Now, ms | Speedup -- | -- | -- | -- | -- Homogen | double | 85.06 | 14.88 | 5.7 SOA | double | 89.50 | 19.31 | 4.6 Homogen | float | 110.17 | 9.85 | 11.2 SOA | float | 113.31 | 9.88 | 11.5

    First of all the optimizations are applicable for binary classification, there is only minor gain for multiclass.

    backport-2021 
    opened by SmirnovEgorRu 16
  • Update dependency lxml to v4.9.1 [SECURITY]

    Update dependency lxml to v4.9.1 [SECURITY]

    Mend Renovate

    This PR contains the following updates:

    | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | lxml (source, changelog) | ==4.6.5 -> ==4.9.1 | age | adoption | passing | confidence |

    GitHub Vulnerability Alerts

    CVE-2022-2309

    NULL Pointer Dereference allows attackers to cause a denial of service (or application crash). This only applies when lxml is used together with libxml2 2.9.10 through 2.9.14. libxml2 2.9.9 and earlier are not affected. It allows triggering crashes through forged input data, given a vulnerable code sequence in the application. The vulnerability is caused by the iterwalk function (also used by the canonicalize function). Such code shouldn't be in wide-spread use, given that parsing + iterwalk would usually be replaced with the more efficient iterparse function. However, an XML converter that serialises to C14N would also be vulnerable, for example, and there are legitimate use cases for this code sequence. If untrusted input is received (also remotely) and processed via iterwalk function, a crash can be triggered.


    Release Notes

    lxml/lxml

    v4.9.1

    Compare Source

    ==================

    Bugs fixed

    • A crash was resolved when using iterwalk() (or canonicalize()) after parsing certain incorrect input. Note that iterwalk() can crash on valid input parsed with the same parser after failing to parse the incorrect input.

    v4.9.0

    Compare Source

    ==================

    Bugs fixed

    • GH#​341: The mixin inheritance order in lxml.html was corrected. Patch by xmo-odoo.

    Other changes

    • Built with Cython 0.29.30 to adapt to changes in Python 3.11 and 3.12.

    • Wheels include zlib 1.2.12, libxml2 2.9.14 and libxslt 1.1.35 (libxml2 2.9.12+ and libxslt 1.1.34 on Windows).

    • GH#​343: Windows-AArch64 build support in Visual Studio. Patch by Steve Dower.

    v4.8.0

    Compare Source

    ==================

    Features added

    • GH#​337: Path-like objects are now supported throughout the API instead of just strings. Patch by Henning Janssen.

    • The ElementMaker now supports QName values as tags, which always override the default namespace of the factory.

    Bugs fixed

    • GH#​338: In lxml.objectify, the XSI float annotation "nan" and "inf" were spelled in lower case, whereas XML Schema datatypes define them as "NaN" and "INF" respectively. Patch by Tobias Deiminger.

    Other changes

    • Built with Cython 0.29.28.

    v4.7.1

    Compare Source

    ==================

    Features added

    • Chunked Unicode string parsing via parser.feed() now encodes the input data to the native UTF-8 encoding directly, instead of going through Py_UNICODE / wchar_t encoding first, which previously required duplicate recoding in most cases.

    Bugs fixed

    • The standard namespace prefixes were mishandled during "C14N2" serialisation on Python 3. See https://mail.python.org/archives/list/[email protected]/thread/6ZFBHFOVHOS5GFDOAMPCT6HM5HZPWQ4Q/

    • lxml.objectify previously accepted non-XML numbers with underscores (like "1_000") as integers or float values in Python 3.6 and later. It now adheres to the number format of the XML spec again.

    • LP#​1939031: Static wheels of lxml now contain the header files of zlib and libiconv (in addition to the already provided headers of libxml2/libxslt/libexslt).

    Other changes

    • Wheels include libxml2 2.9.12+ and libxslt 1.1.34 (also on Windows).

    v4.7.0

    Compare Source

    ==================

    • Release retracted due to missing files in lxml/includes/.

    Configuration

    📅 Schedule: Branch creation - "" (UTC), Automerge - At any time (no schedule defined).

    🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

    Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

    🔕 Ignore: Close this PR and you won't be reminded about this update again.


    • [ ] If you want to rebase/retry this PR, click this checkbox.

    This PR has been generated by Mend Renovate. View repository job log here.

    infra 
    opened by renovate[bot] 0
  • Prioritize wget while downloading dependencies

    Prioritize wget while downloading dependencies

    Description

    Moving of wget to 1st place in dependencies downloading options. Curl might fell while downloading because of network libraries conflict in some environments.

    opened by Alexsandruss 0
  • Initialize ze_device_properties_t before calling zeDeviceGetProperties

    Initialize ze_device_properties_t before calling zeDeviceGetProperties

    Description

    stype and pNext members of ze_device_properties_t are input parameters and should be initialized before calling zeDeviceGetProperties

    Signed-off-by: Mateusz Jablonski [email protected]

    opened by JablonskiMateusz 0
  • Bump lxml from 4.6.5 to 4.9.1 in /docs

    Bump lxml from 4.6.5 to 4.9.1 in /docs

    Bumps lxml from 4.6.5 to 4.9.1.

    Changelog

    Sourced from lxml's changelog.

    4.9.1 (2022-07-01)

    Bugs fixed

    • A crash was resolved when using iterwalk() (or canonicalize()) after parsing certain incorrect input. Note that iterwalk() can crash on valid input parsed with the same parser after failing to parse the incorrect input.

    4.9.0 (2022-06-01)

    Bugs fixed

    • GH#341: The mixin inheritance order in lxml.html was corrected. Patch by xmo-odoo.

    Other changes

    • Built with Cython 0.29.30 to adapt to changes in Python 3.11 and 3.12.

    • Wheels include zlib 1.2.12, libxml2 2.9.14 and libxslt 1.1.35 (libxml2 2.9.12+ and libxslt 1.1.34 on Windows).

    • GH#343: Windows-AArch64 build support in Visual Studio. Patch by Steve Dower.

    4.8.0 (2022-02-17)

    Features added

    • GH#337: Path-like objects are now supported throughout the API instead of just strings. Patch by Henning Janssen.

    • The ElementMaker now supports QName values as tags, which always override the default namespace of the factory.

    Bugs fixed

    • GH#338: In lxml.objectify, the XSI float annotation "nan" and "inf" were spelled in lower case, whereas XML Schema datatypes define them as "NaN" and "INF" respectively.

    ... (truncated)

    Commits
    • d01872c Prevent parse failure in new test from leaking into later test runs.
    • d65e632 Prepare release of lxml 4.9.1.
    • 86368e9 Fix a crash when incorrect parser input occurs together with usages of iterwa...
    • 50c2764 Delete unused Travis CI config and reference in docs (GH-345)
    • 8f0bf2d Try to speed up the musllinux AArch64 build by splitting the different CPytho...
    • b9f7074 Remove debug print from test.
    • b224e0f Try to install 'xz' in wheel builds, if available, since it's now needed to e...
    • 897ebfa Update macOS deployment target version from 10.14 to 10.15 since 10.14 starts...
    • 853c9e9 Prepare release of 4.9.0.
    • d3f77e6 Add a test for https://bugs.launchpad.net/lxml/+bug/1965070 leaving out the a...
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
Releases(2021.6.0)
  • 2021.6.0(Aug 18, 2022)

    The release Intel® oneAPI Data Analytics Library 2021.6 introduces the following changes:

    📚 Support Materials:

    Kaggle kernels for Intel® Extension for Scikit-learn:

    🛠️ Library Engineering

    • Reduced the size of oneDAL python run-time package by approximately 8%
    • Added Python 3.10 support for daal4py and Intel(R) Extension for Scikit-learn packages

    🚨 What's New

    • Improved performance of oneDAL algorithms:
      • Optimized data conversion for tables with column-major layout in host memory to tables with row-major layout in device memory
      • Optimized the computation of Minkowski distances in brute-force kNN on CPU
      • Optimized Covariance algorithm
      • Added DPC++ column-wise atomic reduction
    • Introduced new oneDAL functionality:
      • KMeans distributed random dense initialization
      • Distributed PcaCov
      • sendrecv_replace communicator method
    • Added new parameters to oneDAL algorithms:
      • Weights in Decision Forest for CPU
      • Cosine and Chebyshev distances for KNN on GPU
    Source code(tar.gz)
    Source code(zip)
  • 2021.5(Jan 12, 2022)

    The release introduces the following changes:

    📚 Support Materials

    The following additional materials were created:

    🛠️ Library Engineering

    • Reduced the size of oneDAL library by approximately ~15%.

    🚨 What's New

    • Introduced new oneDAL functionality:
      • Distributed algorithms for Covariance, DBSCAN, Decision Forest, Low Order Moments
      • oneAPI interfaces for Linear Regression, DBSCAN, KNN
    • Improved error handling for distributed algorithms in oneDAL in case of compute nodes failures
    • Improved performance for the following oneDAL algorithms:
      • Louvain algorithm
      • KNN and SVM algorithms on GPU
    • Introduced new functionality for Intel® Extension for Scikit-learn:
      • Scikit-learn 1.0 support
    • Fixed the following issues:
      • Stabilized the results of Linear Regression in oneDAL and Intel® Extension for Scikit-learn
      • Fixed an issue with RPATH on MacOS
    Source code(tar.gz)
    Source code(zip)
  • 2021.4(Oct 14, 2021)

    The release introduces the following changes:

    📚 Support Materials

    The following additional materials were created:

    🛠️ Library Engineering

    • Introduced new functionality for Intel® Extension for Scikit-learn*:
      • Enabled patching for all Scikit-learn applications at once:
      • Added the support of Python 3.9 for both Intel® Extension for Scikit-learn and daal4py. The packages are available from PyPI and the Intel Channel on Anaconda Cloud.
    • Introduced new oneDAL functionality:
      • Added pkg-config support for Linux, macOS, Windows and for static/dynamic, thread/sequential configurations of oneDAL applications.
      • Reduced the size of oneDAL library by approximately ~30%.

    🚨 What's New

    Introduced new oneDAL functionality:

    • General:
      • Basic statistics (Low order moments) algorithm in oneDAL interfaces
      • Result options for kNN Brute-force in oneDAL interfaces: using a single function call to return any combination of responses, indices, and distances
    • CPU:
      • Sigmoid kernel of SVM algorithm
      • Model converter from CatBoost to oneDAL representation
      • Louvain Community Detection algorithm technical preview
      • Connected Components algorithm technical preview
      • Search task and cosine distance for kNN Brute-force
    • GPU:
      • The full range support of Minkowski distances in kNN Brute-force

    Improved oneDAL performance for the following algorithms:

    • CPU:
      • Decision Forest training and prediction
      • Brute-force kNN
      • KMeans
      • NuSVMs and SVR training

    Introduced new functionality in Intel® Extension for Scikit-learn:

    • General:
      • Enabled the global patching of all Scikit-learn applications
      • Provided an integration with dpctl for heterogeneous computing (the support of dpctl.tensor.usm_ndarray for input and output)
      • Extended API with set_config and get_config methods. Added the support of target_offload and allow_fallback_to_host options for device offloading scenarios
      • Added the support of predict_proba in RandomForestClassifier estimator
    • CPU:
      • Added the support of Sigmoid kernel in SVM algorithms
    • GPU:
      • Added binary SVC support with Linear and RBF kernels

    Improved the performance of the following scikit-learn estimators via scikit-learn patching:

    • SVR algorithm training
    • NuSVC and NuSVR algorithms training
    • RandomForestRegression and RandomForestClassifier algorithms training and prediction
    • KMeans

    🐛 Bug Fixes

    • General:
      • Fixed an incorrectly raised exception during the patching of Random Forest algorithm when the number of trees was more than 7000.
    • CPU:
      • Fixed an accuracy issue in Random Forest algorithm caused by the exclusion of constant features.
      • Fixed an issue in NuSVC Multiclass.
      • Fixed an issue with KMeans convergence inconsistency.
      • Fixed incorrect work of train_test_split with specific subset sizes.
    • GPU:
      • Fixed incorrect bias calculation in SVM.

    ❗ Known Issues

    • GPU:
      • For most algorithms, performance degradations were observed when the 2021.4 version of Intel® oneAPI DPC++ Compiler was used.
      • Examples are failing when run with Visual Studio Solutions on hardware that does not support double precision floating-point operations.
    Source code(tar.gz)
    Source code(zip)
  • 2021.3(Jul 2, 2021)

    The release introduces the following changes:

    📚 Support Materials

    The following additional materials were created:

    🛠️ Library Engineering

    • Introduced a new Python package, Intel® Extension for Scikit-learn*. The scikit-learn-intelex package contains scikit-learn patching functionality that was originally available in daal4py package. All future updates for the patches will be available only in Intel® Extension for Scikit-learn. We recommend using scikit-learn-intelex package instead of daal4py.
      • Download the extension using one of the following commands:
        • pip install scikit-learn-intelex
        • conda install scikit-learn-intelex -c conda-forge
      • Enable Scikit-learn patching:
        • from sklearnex import patch_sklearn
        • patch_sklearn()
    • Introduced optional dependencies on DPC++ runtime to daal4py. To enable DPC++ backend, install dpcpp_cpp_rt package. It reduces the default package size with all dependencies from 1.2GB to 400 MB.
    • Added the support of building oneDAL-based applications with /MD and /MDd options on Windows. The -d suffix is used in the names of oneDAL libraries that are built with debug run-time (/MDd).

    🚨 What's New

    Introduced new oneDAL and daal4py functionality:

    • CPU:
      • SVM Regression algorithm
      • NuSVM algorithm for both Classification and Regression tasks
      • Polynomial kernel support for all SVM algorithms (SVC, SVR, NuSVC, NuSVR)
      • Minkowski and Chebyshev distances for kNN Brute-force
      • The brute-force method and the voting mode support for kNN algorithm in oneDAL interfaces
      • Multiclass support for SVM algorithms in oneDAL interfaces
      • CSR-matrix support for SVM algorithms in oneDAL interfaces
      • Subgraph Isomorphism algorithm technical preview
      • Single Source Shortest Path (SSSP) algorithm technical preview

    Improved oneDAL and daal4py performance for the following algorithms:

    • CPU:
      • Support Vector Machines training and prediction
      • Linear, Ridge, ElasticNet, and LASSO regressions prediction
    • GPU:
      • Decision Forest training and prediction
      • Principal Components Analysis training

    Introduced the support of scikit-learn 1.0 version in Intel Extension for Scikit-learn.

    • The 2021.3 release of Intel Extension for Scikit-learn supports the latest scikit-learn releases: 0.22.X, 0.23.X, 0.24.X and 1.0.X.

    Introduced new functionality for Intel Extension for Scikit-learn:

    • General:
      • The support of patch_sklearn for all algorithms
    • CPU:
      • Acceleration of SVR estimator
      • Acceleration of NuSVC and NuSVR estimators
      • Polynomial kernel support in SVM algorithms

    Improved the performance of the following scikit-learn estimators via scikit-learn patching:

    • SVM algorithms training and prediction
    • Linear, Ridge, ElasticNet, and Lasso regressions prediction

    Fixed the following issues:

    • General:
      • Fixed binary incompatibility for the versions of numpy earlier than 1.19.4
      • Fixed an issue with a very large number of trees (> 7000) for Random Forest algorithm.
      • Fixed patch_sklearn to patch both fit and predict methods of Logistic Regression when the algorithm is given as a single parameter to patch_sklearn
    • CPU:
      • Improved numerical stability of training for Alternating Least Squares (ALS) and Linear and Ridge regressions with Normal Equations method
      • Reduced the memory consumption of SVM prediction
    • GPU:
      • Fixed an issue with kernel compilation on the platforms without hardware FP64 support

    ❗ Known Issues

    • Intel® Extension for Scikit-learn and daal4py packages installed from PyPI repository can’t be found on Debian systems (including Google Collab). Mitigation: add “site-packages” folder into Python packages searching before importing the packages:
    import sys 
    import os 
    import site 
    
    sys.path.append(os.path.join(os.path.dirname(site.getsitepackages()[0]), "site-packages")) 
    
    Source code(tar.gz)
    Source code(zip)
  • 2021.2(Mar 31, 2021)

    The release introduces the following changes:

    Library Engineering:

    • Enabled new PyPI distribution channel for daal4py:
      • Four latest Python versions (3.6, 3.7, 3.8, 3.9) are supported on Linux, Windows and MacOS.
      • Support of both CPU and GPU is included in the package.
      • You can download daal4py using the following command: pip install daal4py
    • Introduced CMake support for oneDAL examples

    Support Materials

    The following additional materials were created:

    What's New

    Introduced new oneDAL and daal4py functionality:

    • CPU:
      • Hist method for Decision Forest Classification and Regression, which outperforms the existing exact method
      • Bit-to-bit results reproducibility for: Linear and Ridge regressions, LASSO and ElasticNet, KMeans training and initialization, PCA, SVM, kNN Brute Force method, Decision Forest Classification and Regression
    • GPU:
      • Multi-node multi-GPU algorithms: KMeans (batch), Covariance (batch and online), Low order moments (batch and online) and PCA
      • Sparsity support for SVM algorithm

    Improved oneDAL and daal4py performance for the following algorithms:

    • CPU:
      • Decision Forest training Classification and Regression
      • Support Vector Machines training and prediction
      • Logistic Regression, Logistic Loss and Cross Entropy for non-homogeneous input types
    • GPU:
      • Decision Forest training Classification and Regression
      • All algorithms with GPU kernels (as a result of migration to Unified Shared Memory data management)
      • Reduced performance overhead for oneAPI C++ interfaces on CPU and oneAPI DPC++ interfaces on GPU

    Added technical preview features in Graph Analytics:

    • CPU:
      • Local and Global Triangle Counting

    Introduced new functionality for scikit-learn patching through daal4py:

    • CPU:
      • Patches for four latest scikit-learn releases: 0.21.X, 0.22.X, 0.23.X and 0.24.X
      • Acceleration of roc_auc_score function
      • Bit-to-bit results reproducibility for: LinearRegression, Ridge, SVC, KMeans, PCA, Lasso, ElasticNet, tSNE, KNeighborsClassifier, KNeighborsRegressor, NearestNeighbors, RandomForestClassifier, RandomForestRegressor

    ​Improved performance of the following scikit-learn estimators via scikit-learn patching:

    • CPU
      • RandomForestClassifier and RandomForestRegressor scikit-learn estimators: training and prediction
      • Principal Component Analysis (PCA) scikit-learn estimator: training
      • Support Vector Classification (SVC) scikit-learn estimators: training and prediction
      • Support Vector Classification (SVC) scikit-learn estimator with the probability==True parameter: training and prediction

    Fixed the following issues:

    • Scikit-learn patching:

      • Improved accuracy of RandomForestClassifier and RandomForestRegressor scikit-learn estimators
      • Fixed patching issues with pairwise_distances
      • Fixed the behavior of the patch_sklearn and unpatch_sklearn functions
      • Fixed unexpected behavior that made accelerated functionality unavailable through scikit-learn patching if the unput was not of float32 or float64 data types. Scikit-learn patching now works with all numpy data types.
      • Fixed a memory leak that appeared when DataFrame from pandas was used as an input type
      • Fixed performance issue for interoperability with Modin
    • daal4py:

      • Fixed the crash of SVM and kNN algorithms on Windows on GPU
    • oneDAL:

      • Improved accuracy of Decision Forest Classification and Regression on CPU
      • Improved accuracy of KMeans algorithm on GPU
      • Improved stability of Linear Regression and Logistic Regression algorithms on GPU

    ​​Known Issues

    • oneDAL vars.sh script does not support kornShell
    Source code(tar.gz)
    Source code(zip)
  • 2021.1(Dec 14, 2020)

    The release contains all functionality of Intel® DAAL. See Intel® DAAL release notes for more details.

    What's New

    Library Engineering:

    • Renamed the library from Intel® Data Analytics Acceleration Library to Intel® oneAPI Data Analytics Library and changed the package names to reflect this.
    • Deprecated 32-bit version of the library.
    • Introduced Intel GPU support for both OpenCL and Level Zero backends.
    • Introduced Unified Shared Memory (USM) support

    Introduced new Intel® oneDAL and daal4py functionality:

    • GPU:
      • Batch algorithms: K-means, Covariance, PCA, Logistic Regression, Linear Regression, Random Forest Classification and Regression, Gradient Boosting Classification and Regression, kNN, SVM, DBSCAN and Low-order moments
      • Online algorithms: Covariance, PCA, Linear Regression and Low-order moments
      • Added Data Management functionality to support DPC++ APIs: a new table type for representation of SYCL-based numeric tables (SyclNumericTable) and an optimized CSV data source

    Improved Intel® oneDAL and daal4py performance for the following algorithms:

    • CPU:
      • Logistic Regression training and prediction
      • k-Nearest Neighbors prediction with Brute Force method
      • Logistic Loss and Cross Entropy objective functions

    Added Technical Preview Features in Graph Analytics:

    • CPU:
      • Undirected graph without edge and vertex weights (undirected_adjacency_array_graph), where vertex indices can only be of type int32
      • Jaccard Similarity Coefficients for all pairs of vertices, a batch algorithm that processes the graph by blocks

    Aligned the library with Intel® oneDAL Specification 1.0 for the following algorithms:

    • CPU/GPU:
      • K-means, PCA, kNN

    Introduced new functionality for scikit-learn patching through daal4py:

    • CPU:
      • Acceleration of NearestNeighbors and KNeighborsRegressor scikit-learn estimators with Brute Force and K-D tree methods
      • Acceleration of TSNE scikit-learn estimator
    • GPU:
      • Intel GPU support in scikit-learn for DBSCAN, K-means, Linear and Logistic Regression

    Improved performance of the following scikit-learn estimators via scikit-learn patching:

    • CPU:
      • LogisticRegression fit, predict and predict_proba methods
      • KNeighborsClassifier predict, predict_proba and kneighbors methods with “brute” method

    Known Issues

    • Intel® oneDAL DPC++ APIs does not work on GEN12 graphics with OpenCL backend. Use Level Zero backend for such cases.
    • train_test_split in daal4py patches for Scikit-learn can produce incorrect shuffling on Windows*
    Source code(tar.gz)
    Source code(zip)
  • 2020u3(Nov 3, 2020)

    What's New in Intel® DAAL 2020 Update 3:

    Introduced new Intel® DAAL and daal4py functionality:

    • Brute Force method for k-Nearest Neighbors classification algorithm, which for datasets with more than 13 features demonstrates a better performance than the existing K-D tree method
    • k-Nearest Neighbors search for K-D tree and Brute Force methods with computation of distances to nearest neighbors and their indices

    Extended existing Intel® DAAL and daal4py functionality:

    • Voting methods for prediction in k-Nearest Neighbors classification and search: based on inverse-distance and uniform weighting
    • New parameters in Decision Forest classification and regression: minObservationsInSplitNode, minWeightFractionInLeafNode, minImpurityDecreaseInSplitNode, maxLeafNodes with best-first strategy and sample weights
    • Support of Support Vector Machine (SVM) decision function for Multi-class Classifier

    Improved Intel® DAAL and daal4py performance for the following algorithms:

    • SVM training and prediction
    • Decision Forest classification training
    • RBF and Linear kernel functions

    Introduced new daal4py functionality:

    • Conversion of trained XGBoost* and LightGBM* models into a daal4py Gradient Boosted Trees model for fast prediction
    • Support of Modin* DataFrame as an input

    Introduced new functionality for scikit-learn patching through daal4py:

    • Acceleration of KNeighborsClassifier scikit-learn estimator with Brute Force and K-D tree methods
    • Acceleration of RandomForestClassifier and RandomForestRegressor scikit-learn estimators
    • Sparse input support for KMeans and Support Vector Classification (SVC) scikit-learn estimators
    • Prediction of probabilities for SVC scikit-learn estimator
    • Support of ‘normalize’ parameter for Lasso and ElasticNet scikit-learn estimators

    Improved performance of the following functionality for scikit-learn patching through daal4py:

    • train_test_split()
    • Support Vector Classification (SVC) fit and prediction
    Source code(tar.gz)
    Source code(zip)
  • 2020(Sep 25, 2019)

  • 2019_u4(Jun 4, 2019)

    Revision: 33235

    Linux* (32-bit and 64-bit binary): l_daal_oss_p_2019.4.007.tgz macOS* (32-bit and 64-bit binary): m_daal_oss_p_2019.4.007.tgz

    Note: Please, use Git client with enabled Git LFS module to clone repository if you want to get sources. We are working with GitHub support to enable correct work of archives ”Source code (zip)" and "Source code (tar.gz)".

    Source code(tar.gz)
    Source code(zip)
    l_daal_oss_p_2019.4.007.tgz(588.23 MB)
    mklfpk_lnx_20180112_7.tgz(322.57 MB)
    mklfpk_mac_20180112_7.tgz(160.53 MB)
    mklfpk_win_20180112_7.zip(80.11 MB)
    m_daal_oss_p_2019.4.007.tgz(290.73 MB)
Owner
oneAPI-SRC
oneAPI open source projects
oneAPI-SRC
oneAPI Deep Neural Network Library (oneDNN)

oneAPI Deep Neural Network Library (oneDNN) This software was previously known as Intel(R) Math Kernel Library for Deep Neural Networks (Intel(R) MKL-

oneAPI-SRC 3k Sep 28, 2022
An unified library for fitting primitives from 3D point cloud data with both C++&Python API.

PrimitivesFittingLib An unified library for fitting multiple primitives from 3D point cloud data with both C++&Python API. The supported primitives ty

Yueci Deng 10 Jun 30, 2022
Yet another tensor library in C++. It allows direct access to its underlying data buffer, and serializes in JSON.

Yet another tensor library in C++. It allows direct access to its underlying data buffer, and serializes in JSON. Built on top of zax json parser, C++ structures having tensor members can also be JSON-serialized and deserialized, allowing one to save and load the state of a highly hierarchical object.

Tamas Levente Kis 2 May 28, 2022
Simple C++ one-header library for the creation of animated GIFs from image data.

gif-h This one-header library offers a simple, very limited way to create animated GIFs directly in code. Those looking for particular cleverness are

Charlie Tangora 415 Sep 25, 2022
A tiny C++11 library for reading BVH motion capture data

bvh11 A tiny C++11 library for reading (and writing) BVH motion capture data. Dependencies C++11 standard library Eigen 3 http://eigen.tuxfamily.org/

Yuki Koyama 33 Sep 20, 2022
Hopsworks - Data-Intensive AI platform with a Feature Store

Give us a star if you appreciate what we do What is Hopsworks? Quick Start Development and Operational ML on Hopsworks Docs Who’s behind Hopsworks? Op

Logical Clocks AB 793 Sep 27, 2022
In-situ data analyses and machine learning with OpenFOAM and Python

PythonFOAM: In-situ data analyses with OpenFOAM and Python Using Python modules for in-situ data analytics with OpenFOAM 8. NOTE that this is NOT PyFO

Argonne Leadership Computing Facility - ALCF 111 Sep 6, 2022
Code and Data for our CVPR 2021 paper "Structured Scene Memory for Vision-Language Navigation"

SSM-VLN Code and Data for our CVPR 2021 paper "Structured Scene Memory for Vision-Language Navigation". Environment Installation Download Room-to-Room

hanqing 34 Aug 24, 2022
DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks

DAMOV is a benchmark suite and a methodical framework targeting the study of data movement bottlenecks in modern applications. It is intended to study new architectures, such as near-data processing. Described by Oliveira et al.

SAFARI Research Group at ETH Zurich and Carnegie Mellon University 27 Sep 5, 2022
A Modern C++ Data Sciences Toolkit

MeTA: ModErn Text Analysis Please visit our web page for information and tutorials about MeTA! Build Status (by branch) master: develop: Outline Intro

null 649 Sep 28, 2022
Prediction of party to party alliance and politician to party alliance using the twitter data of last two years of Indian politicians.

Prediction of party to party alliance and politician to party alliance using the twitter data of last two years of Indian politicians. Gephi to visualize complex and dense Mention and Retweet network. Conventional algorithms on Retweet network to find politician to party alliance. Deep learning algorithms to find party to party alliance.

Nihar Shah 3 Sep 12, 2021
Extracts high-precision mouse/pointer motion data on Windows. Good for drawing software!

window_mouse_queue This is a wrapper for GetMouseMovePointsEx function that allows to extract high-precision mouse/pointer motion data on Windows. Goo

YellowAfterlife's GameMaker Things 6 Feb 21, 2022
Implementation of Univaraint Linear Regresion (Supervised Machine Learning) in c++. With a data set (training set) you can predict outcomes.

Linear-Regression Implementation of Univaraint Linear Regresion (Supervised Machine Learning) in c++. With a data set (training set) you can predict o

vincent laizer 1 Nov 3, 2021
A lightweight version of OrcVIO that uses monocular images, inertial data, as well as bounding box measurements

OrcVIO-Lite About Object residual constrained Visual-Inertial Odometry (OrcVIO) is a visual-inertial odometry pipeline, which is tightly coupled with

Sean 23 Sep 17, 2022
A system to flag anomalous source code expressions by learning typical expressions from training data

A friendly request: Thanks for visiting control-flag GitHub repository! If you find control-flag useful, we would appreciate a note from you (to niran

Intel Labs 1.2k Sep 24, 2022
Web Application to visualize TLE Data build for Space Apps 2021

This repository was created by Brian Donald, Bryan Pikaard, Zach Stence, and Andreas Wenzel for SpaceApps 2021. The Orbital Tracker uses TLE data to c

Brian D 5 Aug 28, 2022
A program developed using MPI for distributed computation of Histogram for large data and their performance anaysis on multi-core systems

mpi-histo A program developed using MPI for distributed computation of Histogram for large data and their performance anaysis on multi-core systems. T

Raj Shrestha 2 Dec 21, 2021
Beringei is a high performance, in-memory storage engine for time series data.

** THIS REPO HAS BEEN ARCHIVED AND IS NO LONGER BEING ACTIVELY MAINTAINED ** Beringei A high performance, in memory time series storage engine In the

Meta Archive 3.1k Sep 21, 2022
The module for my life story archive that gives data and statistics for the family Kindle Fire.

By: Top README.md Read this article in a different language Sorted by: A-Z Sorting options unavailable ( af Afrikaans Afrikaans | sq Shqiptare Albania

Sean P. Myrick V19.1.7.2 1 Aug 26, 2022