GPU support
Overview
Extension for Scikit-learn* can execute computations on different devices (CPUs and GPUs, including integrated GPUs from laptops and desktops) supported by the SYCL framework. To execute computations on GPUs, an additional package scikit-learn-intelex-gpu is required. This package is distributed through the same channels as the regular scikit-learn-intelex - to install:
pip install scikit-learn-intelex-gpu
conda install -c conda-forge scikit-learn-intelex-gpu
Note that scikit-learn-intelex-gpu does not bring additional modules - it is meant to be used through the same functions and classes from the sklearnex module that run on CPU by default.
After installing said package, the device used for computations can be easily controlled through the target_offload option in config contexts, which moves data to GPU if it’s not already there - see Configuration Contexts and Global Options and the rest of this page for more details).
For finer-grained control (e.g. operating on arrays that are already in a given device’s memory), it can also interact with on-device array API classes like dpnp.ndarray, and with SYCL-related objects from package dpctl such as dpctl.SyclQueue.
Note
Note that not every operation from every estimator is supported on GPU - see the GPU support table for more information. See also Verbose Mode to verify where computations are performed.
Important
Be aware that GPU usage requires the Intel(R) Compute Runtime, which is a non-Python dependency (see below).
Software Requirements
In addition to the package scikit-learn-intelex-gpu and its transitive dependencies (such as the DPC++ runtime), the Intel Compute Runtime (also referred to elsewhere as ‘GPGPU drivers’) is also required, but note that this is system-level software that is not installable through Python-specific package managers.
On Windows, GPU drivers for iGPUs and dGPUs include the required Intel Compute Runtime. Drivers for windows can be downloaded from this link.
For datacenters, see further instructions here.
On Linux, some distributions - namely Ubuntu Desktop 25.04 and higher, and Fedora Workstation 42 and higher - come with the compute runtime for iGPUs and dGPUs preinstalled, while others require installing them separately.
Debian systems require installing package intel-opencl-icd (along with its dependencies such as intel-compute-runtime and intel-graphics-compiler), which is available from Debian’s main repository:
sudo apt-get install intel-opencl-icd
Tip
For Debian Trixie (13), the Intel Compute Runtime is not available from the Stable repository, but can be installed by enabling the Sid (Unstable) repository.
For Arch Linux, and for other distributions in general, see the GPGPU article in the Arch wiki.
Important
If using the Extension for Scikit-learn* in a conda environment, GPU support requires the OpenCL ICD package for conda to be installed in the conda environment, in addition to the system install of the same package:
conda install -c https://software.repos.intel.com/python/conda/ intel-gpu-ocl-icd-system
Be aware that datacenter-grade devices, such as ‘Flex’ and ‘Max’, require different drivers and runtimes. For CentOS and for datacenter-grade devices, see instructions here.
For more details, see the DPC++ requirements page.
Hint
If installing all the GPU dependencies on baremetal is not feasible, one might want to use Docker containers with these dependencies instead.
Verifying GPU setup
After installing all the necessary dependencies for GPU support, one might want to check that the device is correctly recognized by the SYCL framework, or one might want to check what are the names assigned to each device if multiple ones are available (e.g. "gpu:0" or "gpu:1") .
If using the dpctl package, the list of available devices can be obtained as follows:
python -m dpctl --full-list
If all the required dependencies are installed and a GPU device is correctly identified, this command should show some output like the following:
Platform 0 ::
Name Intel(R) oneAPI Unified Runtime over Level-Zero
Version 1.6
Vendor Intel(R) Corporation
Backend ext_oneapi_level_zero
Num Devices 1
# 0
Name Intel(R) Data Center GPU Max 1100
Version 1.6.33416
Filter string level_zero:gpu:0
Alternatively, if using oneAPI toolkits, the list of recognized devices can be obtained by executing the command sycl-ls:
sycl-ls
If a GPU device is correctly identified, it should show an output like the following:
[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Data Center GPU Max 1100 12.60.7 [1.6.33416]
If either of these commands shows only opencl:cpu devices when a GPU is available in the machine, it means that the software dependencies for SYCL are not available in the environment, or the GPU is not set up correctly.
Hint
If installing all the GPU dependencies on baremetal is not feasible, one might want to use Docker containers with these dependencies instead.
Running on GPU
Extension for Scikit-learn* offers different options for running an algorithm on a specified device (e.g. a GPU):
Target offload option
Just like scikit-learn, the Extension for Scikit-learn* can use configuration contexts and global options to modify how it interacts with different inputs - see Configuration Contexts and Global Options for details.
In particular, the Extension for Scikit-learn* allows an option target_offload which can be passed a SYCL device name like "gpu" indicating where the operations should be performed, moving the data to that device in the process if it’s not already there; or a dpctl.SyclQueue object from an already-existing queue on a device.
Hint
If repeated operations are going to be performed on the same data (e.g. cross-validators, resamplers, missing data imputers, etc.), it’s recommended to use the array API option instead - see the next section for details.
Example:
from sklearnex import config_context
from sklearnex.linear_model import LinearRegression
from sklearn.datasets import make_regression
X, y = make_regression()
model = LinearRegression()
with config_context(target_offload="gpu"):
model.fit(X, y)
pred = model.predict(X)
import dpctl
from sklearnex import config_context
from sklearnex.linear_model import LinearRegression
from sklearn.datasets import make_regression
X, y = make_regression()
model = LinearRegression()
queue = dpctl.SyclQueue("gpu")
with config_context(target_offload=queue):
model.fit(X, y)
pred = model.predict(X)
Warning
When using target_offload, operations on a fitted model must be executed under a context or global option with the same device or queue where the model was fitted - meaning: a model fitted on GPU cannot make predictions on CPU, and vice-versa. Note that upon serialization and subsequent deserialization of models, data is moved to the CPU.
Hint
Serialization of model objects that used target offload will move data to CPU upon deserialization. See Model serialization (pickling) for detail about serializing GPU models.
GPU arrays through array API
As another option, computations can also be performed on data that is already on a SYCL device without moving it there if it belongs to an array API-compatible class, such as dpnp.ndarray or torch.tensor (see also the PyTorch Intel GPU docs).
This is particularly useful when multiple operations are performed on the same data (e.g. cross validators, stacked ensembles, etc.), or when the data is meant to interact with other libraries besides the Extension for Scikit-learn*. Be aware that it requires enabling array API support in scikit-learn, which comes with additional dependencies.
See Array API support for details, instructions, and limitations. Example:
# Array API support from sklearn requires enabling it on SciPy too
import os
os.environ["SCIPY_ARRAY_API"] = "1"
import numpy as np
import torch
from sklearnex import config_context
from sklearnex.linear_model import LinearRegression
# Random data for a regression problem
rng = np.random.default_rng(seed=123)
X_np = rng.standard_normal(size=(100, 10), dtype=np.float32)
y_np = rng.standard_normal(size=100, dtype=np.float32)
# Torch offers an array-API-compliant class where data can be on GPU (referred to as 'xpu')
X = torch.tensor(X_np, device="xpu")
y = torch.tensor(y_np, device="xpu")
# Important to note again that array API must be enabled on scikit-learn
model = LinearRegression()
with config_context(array_api_dispatch=True):
model.fit(X, y)
# Array API support from sklearn requires enabling it on SciPy too
import os
os.environ["SCIPY_ARRAY_API"] = "1"
import numpy as np
import dpnp
from sklearnex import config_context
from sklearnex.linear_model import LinearRegression
# Random data for a regression problem
rng = np.random.default_rng(seed=123)
X_np = rng.standard_normal(size=(100, 10), dtype=np.float32)
y_np = rng.standard_normal(size=100, dtype=np.float32)
# DPNP offers an array-API-compliant class where data can be on GPU
X = dpnp.array(X_np, device="gpu")
y = dpnp.array(y_np, device="gpu")
# Important to note again that array API must be enabled on scikit-learn
model = LinearRegression()
with config_context(array_api_dispatch=True):
model.fit(X, y)
Hint
If serialization of a GPU model is desired, use Torch tensors instead of DPNP arrays. See Model serialization (pickling) for more information.
Note
Not all estimator classes in the Extension for Scikit-learn* support array API objects - see the list of estimators with array API support for details.
DPNP Arrays
As a special case, GPU arrays from dpnp can be used without enabling array API, even for estimators in the Extension for Scikit-learn* that do not currently support array API, but note that using this alternative without array API enabled involves data movement to host and back, thus not being the most efficient route in computational terms.
Example:
import numpy as np
import dpnp
from sklearnex import config_context
from sklearnex.linear_model import LinearRegression
rng = np.random.default_rng(seed=123)
X_np = rng.standard_normal(size=(100, 10), dtype=np.float32)
y_np = rng.standard_normal(size=100, dtype=np.float32)
X = dpnp.array(X_np, device="gpu")
y = dpnp.array(y_np, device="gpu")
model = LinearRegression()
model.fit(X, y)
Note that, if array API had been enabled, the snippet above would use the data as-is on the device where it resides, but without array API, it implies data movements using the SYCL queue contained by those objects.
Note
All the input data for an algorithm must reside on the same device.