About daal4py
Introduction
daal4py is a low-level module within the Extension for Scikit-learn* package providing Python bindings
over the oneAPI Data Analytics Library. It has been deprecated in favor of the newer sklearnex module in the
same package, which offers a more idiomatic and higher-level interface for calling accelerated
routines from the oneAPI Data Analytics Library in Python.
Internally, daal4py is a Python wrapper over the now-deprecated “DAAL” interface
of the oneAPI Data Analytics Library, while sklearnex is a module built atop of the “oneAPI” interface, offering
DPC-based features such as GPU support.
There is a large degree of overlap in the functionalities offered between the two modules
daal4py and sklearnex - module sklearnex should be preferred whenever possible,
either by using it directly or through the patching mechanism - but daal4py
exposes some additional functionalities from the oneAPI Data Analytics Library that sklearnex doesn’t:
Fast serving of gradient boosted decision trees from other libraries such as XGBoost (model builders).
Previously daal4py was distributed as a separate package, but it is now an importable module
within the scikit-learn-intelex package - meaning, after installing scikit-learn-intelex,
it can be imported as follows:
import daal4py
For documentation about specific functions, see the daal4py API reference.
Using daal4py
Unlike sklearnex, daal4py, being a lower-level interface, does not follow scikit-learn
idioms - instead, the process for calling procedures from the daal4py interface is as follows:
Instantiate an ‘algorithm’ class by calling its constructor, without any data - for example:
qr_algo = daal4py.qr().Call the ‘compute’ method of that instantiated algorithm in order to obtain a ‘result’ object, passing it the data on which it will operate - for example:
qr_result = qr_algo.compute(X).Warning
Methods such as
computeshould only be called once on a daal4py object, unless the object is created in streaming mode (see rest of this doc page for more details). If subsequent calls to the same method are needed (e.g. if one wishes to re-fit the model on new data), the object providing this method should be re-created again - otherwise, crashes and spurious errors might happen.Access the relevant results in the ‘result’ object - for example:
R = qr_result.matrixR.
Full example calling the QR algorithm:
import daal4py
import numpy as np
rng = np.random.default_rng(seed=123)
X = rng.standard_normal(size=(100,5))
qr_algo = daal4py.qr()
qr_result = qr_algo.compute(X)
np.testing.assert_almost_equal(
np.abs( qr_result.matrixR ),
np.abs( np.linalg.qr(X).R ),
)
Note
QR factorization, unlike other linear algebra procedures, does not have a strictly unique solution - if the signs (+/-) of numbers are flipped for a particular column in both the Q and R matrices, they would still be valid and equivalent QR factorizations of the same original matrix ‘X’.
Procedures like Cholesky decomposition are typically constrained to have only positive signs in the main diagonal in order to make the results deterministic, but this is not always the case for QR in most software, hence the example above takes the absolute values when comparing results from different libraries.
Streaming mode
Many algorithms in daal4py accept an argument streaming=True, which allows executing the
computations in a ‘streaming’ or ‘online’ fashion, by supplying it different subsets of the data,
one at a time (batches), instead of passing the whole data upfront, while still arriving at the
same final result as if all the data had been passed at once.
Note
The sklearnex module also offers incremental versions of some algorithms - see the docs
on Non-Scikit-Learn Algorithms for more details.
This can be useful for executing algorithms on large datasets that don’t fit in memory but which can still be loaded in smaller chunks, or for machine learning models that are constantly being updated as new data is collected, for example.
In order to use streaming mode, the algorithm constructor needs to be passed argument streaming=True,
method .compute() needs to be called multiple times with different data (but note again: methods
such as compute() should not be called more than once when the algorithm is not in streaming
mode), and the ‘result’ object should be obtained by calling method .finalize() after all the data
has been passed.
Example:
import daal4py
import numpy as np
rng = np.random.default_rng(seed=123)
X_full = rng.standard_normal(size=(100,5))
batches = np.split(np.arange(100), 5)
qr_algo = daal4py.qr(streaming=True)
for batch in batches:
X_batch = X_full[batch]
qr_algo.compute(X_batch)
qr_result = qr_algo.finalize()
np.testing.assert_almost_equal(
np.abs( qr_result.matrixR ),
np.abs( np.linalg.qr(X).R ),
)
List of algorithms in daal4py supporting streaming mode:
Distributed mode
Many algorithms in daal4py accept an argument distributed=True, which allows
running computations in a distributed compute nodes using the MPI framework.
See the section Distributed mode (daal4py, CPU) for more details.
Documentation
See daal4py API Reference for the full documentation of functions and classes.