oneAPI and GPU support in Extension for Scikit-learn*
Extension for Scikit-learn* can execute computations on different devices (CPUs and GPUs, including integrated GPUs from laptops and desktops) through the SYCL framework in oneAPI.
The device used for computations can be easily controlled through the target offloading functionality (e.g. through sklearnex.config_context(target_offload="gpu")
, which moves data to GPU if it’s not already there - see rest of this page for more details), but for finer-grained controlled (e.g. operating on arrays that are already in a given device’s memory), it can also interact with objects from package dpctl, which offers a Python interface over SYCL concepts such as devices, queues, and USM (unified shared memory) arrays.
While not strictly required, package dpctl is recommended for a better experience on GPUs - for example, it can provide GPU-allocated arrays that enable compute-follows-data execution models (i.e. so that target_offload
wouldn’t need to move the data from CPU to GPU).
Important
Be aware that GPU usage requires non-Python dependencies on your system, such as the Intel(R) Compute Runtime (see below).
Prerequisites
For execution on GPUs, DPC++ runtime and Intel Compute Runtime (also referred to elsewhere as ‘GPGPU drivers’) are required.
DPC++ Runtime
DPC++ compiler runtime can be installed either from PyPI or Conda:
Install from PyPI:
pip install dpcpp-cpp-rt
Install using Conda from Intel’s repository:
conda install -c https://software.repos.intel.com/python/conda/ dpcpp_cpp_rt
Install using Conda from the conda-forge channel:
conda install -c conda-forge dpcpp_cpp_rt
Intel Compute Runtime
On Windows, GPU drivers for iGPUs and dGPUs include the required Intel Compute Runtime. Drivers for windows can be downloaded from this link.
For datacenters, see further instructions here.
On Linux, some distributions - namely Ubuntu Desktop 25.04 and higher, and Fedora Workstation 42 and higher - come with the compute runtime for iGPUs and dGPUs preinstalled, while others require installing them separately.
Debian systems require installing package intel-opencl-icd
(along with its dependencies such as intel-compute-runtime
and intel-graphics-compiler
), which is available from Debian’s main
repository:
sudo apt-get install intel-opencl-icd
Tip
For Debian Trixie (13), the Intel Compute Runtime is not available from the Stable repository, but can be installed by enabling the Sid (Unstable) repository.
For Arch Linux, and for other distributions in general, see the GPGPU article in the Arch wiki.
Be aware that datacenter-grade devices, such as ‘Flex’ and ‘Max’, require different drivers and runtimes. For CentOS and for datacenter-grade devices, see instructions here.
For more details, see the DPC++ requirements page.
Device offloading
Extension for Scikit-learn* offers two options for running an algorithm on a specified device:
Use global configurations of Extension for Scikit-learn**:
The
target_offload
argument (inconfig_context
and inset_config
/get_config
) can be used to set the device primarily used to perform computations. Accepted data types arestr
anddpctl.SyclQueue
. Strings must match to device names recognized by the SYCL* device filter selector - for example,"gpu"
. If passing"auto"
, the device will be deduced from the location of the input data. Examples:from sklearnex import config_context from sklearnex.linear_model import LinearRegression with config_context(target_offload="gpu"): model = LinearRegression().fit(X, y)
from sklearnex import set_config from sklearnex.linear_model import LinearRegression set_config(target_offload="gpu") model = LinearRegression().fit(X, y)
If passing a string different than
"auto"
, it must be a deviceThe
allow_fallback_to_host
argument in those same configuration functions is a Boolean flag. If set toTrue
, the computation is allowed to fallback to the host device when a particular estimator does not support the selected device. The default value isFalse
.
These options can be set using sklearnex.set_config()
function or
sklearnex.config_context
. To obtain the current values of these options,
call sklearnex.get_config()
.
Note
Functions set_config
, get_config
and config_context
are always patched after the sklearnex.patch_sklearn()
call.
Pass input data as
dpctl.tensor.usm_ndarray
to the algorithm.The computation will run on the device where the input data is located, and the result will be returned as
usm_ndarray
to the same device.Note
All the input data for an algorithm must reside on the same device.
Warning
The
usm_ndarray
can only be consumed by the base methods likefit
,predict
, andtransform
. Note that only the algorithms in Extension for Scikit-learn* supportusm_ndarray
. The algorithms from the stock version of scikit-learn do not support this feature.
Example
A full example of how to patch your code with Intel CPU/GPU optimizations:
from sklearnex import patch_sklearn, config_context
patch_sklearn()
from sklearn.cluster import DBSCAN
X = np.array([[1., 2.], [2., 2.], [2., 3.],
[8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
with config_context(target_offload="gpu:0"):
clustering = DBSCAN(eps=3, min_samples=2).fit(X)
Note
Current offloading behavior restricts fitting and predictions (a.k.a. inference) of any models to be
in the same context or absence of context. For example, a model whose .fit()
method was called in a GPU context with
target_offload="gpu:0"
will throw an error if a .predict()
call is then made outside the same GPU context.