.. Copyright 2020 Intel Corporation .. .. Licensed under the Apache License, Version 2.0 (the "License"); .. you may not use this file except in compliance with the License. .. You may obtain a copy of the License at .. .. http://www.apache.org/licenses/LICENSE-2.0 .. .. Unless required by applicable law or agreed to in writing, software .. distributed under the License is distributed on an "AS IS" BASIS, .. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. .. See the License for the specific language governing permissions and .. limitations under the License. .. _oneapi_gpu: ############################################################## oneAPI and GPU support in |intelex| ############################################################## |intelex| supports oneAPI concepts, which means that algorithms can be executed on different devices: CPUs and GPUs. This is done via integration with `dpctl `_ package that implements core oneAPI concepts like queues and devices. Prerequisites ------------- For execution on GPU, DPC++ compiler runtime and driver are required. Refer to `DPC++ system requirements `_ for details. DPC++ compiler runtime can be installed either from PyPI or Anaconda: - Install from PyPI:: pip install dpcpp-cpp-rt - Install using Conda via the Intel repository:: conda install dpcpp_cpp_rt -c https://software.repos.intel.com/python/conda/ Device offloading ----------------- |intelex| offers two options for running an algorithm on a specific device with the help of dpctl: - Pass input data as `dpctl.tensor.usm_ndarray `_ to the algorithm. The computation will run on the device where the input data is located, and the result will be returned as :code:`usm_ndarray` to the same device. .. note:: All the input data for an algorithm must reside on the same device. .. warning:: The :code:`usm_ndarray` can only be consumed by the base methods like :code:`fit`, :code:`predict`, and :code:`transform`. Note that only the algorithms in |intelex| support :code:`usm_ndarray`. The algorithms from the stock version of scikit-learn do not support this feature. - Use global configurations of |intelex|\*: 1. The :code:`target_offload` option can be used to set the device primarily used to perform computations. Accepted data types are :code:`str` and :code:`dpctl.SyclQueue`. If you pass a string to :code:`target_offload`, it should either be ``"auto"``, which means that the execution context is deduced from the location of input data, or a string with SYCL* filter selector. The default value is ``"auto"``. 2. The :code:`allow_fallback_to_host` option is a Boolean flag. If set to :code:`True`, the computation is allowed to fallback to the host device when a particular estimator does not support the selected device. The default value is :code:`False`. These options can be set using :code:`sklearnex.set_config()` function or :code:`sklearnex.config_context`. To obtain the current values of these options, call :code:`sklearnex.get_config()`. .. note:: Functions :code:`set_config`, :code:`get_config` and :code:`config_context` are always patched after the :code:`sklearnex.patch_sklearn()` call. .. rubric:: Compatibility considerations For compatibility reasons, algorithms in |intelex| may be offloaded to the device using :code:`daal4py.oneapi.sycl_context`. However, it is recommended to use one of the options described above for device offloading instead of using :code:`sycl_context`. Example ------- An example on how to patch your code with Intel CPU/GPU optimizations: .. code-block:: python from sklearnex import patch_sklearn, config_context patch_sklearn() from sklearn.cluster import DBSCAN X = np.array([[1., 2.], [2., 2.], [2., 3.], [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32) with config_context(target_offload="gpu:0"): clustering = DBSCAN(eps=3, min_samples=2).fit(X) .. note:: Current offloading behavior restricts fitting and inference of any models to be in the same context or absence of context. For example, a model trained in the GPU context with target_offload="gpu:0" throws an error if the inference is made outside the same GPU context.