SPMD (multi-GPU distributed mode)

Extension for Scikit-learn* offers Single Program, Multiple Data (SPMD) supported interfaces for distributed computations on multi-GPU setups (see the distributed mode on daal4py for distributed algorithms on CPU) when Running on Linux*.

Several GPU-supported algorithms also provide distributed, multi-GPU computing capabilities via integration with mpi4py. The prerequisites match those of GPU computing, along with an MPI backend of your choice (Intel MPI recommended, available via impi_rt python package) and the mpi4py python package. If using Extension for Scikit-learn* installed from sources, ensure that the spmd_backend is built.

Important

SMPD mode requires the mpi4py package used at runtime to be compiled with the same MPI backend as the Extension for Scikit-learn*. The PyPI and Conda distributions of Extension for Scikit-learn* both use Intel’s MPI as backend, and hence require an mpi4py also built with Intel’s MPI - it can be easily installed from Intel’s conda channel as follows:

conda install -c https://software.repos.intel.com/python/conda/ -c conda-forge --override-channels mpi4py

Warning

Packages from the Intel channel are meant to be compatible with dependencies from conda-forge, and might not work correctly in environments that have packages installed from the anaconda channel.

It also requires the MPI runtime executable (mpiexec / mpirun) to be from the same library that was used to compile Extension for Scikit-learn*. Intel’s MPI runtime library is offered as a Python package impi_rt and will be installed together with the mpi4py package if executing the command above, but otherwise, it can be installed separately from different distribution channels:

  • Conda-Forge:

    conda install -c conda-forge impi_rt
    

Tip

impi_rt is also available from the Intel channel: https://software.repos.intel.com/python/conda.

  • PyPI (not recommended, might require setting additional environment variables):

    pip install impi_rt
    

Using other MPI backends (e.g. OpenMPI) requires building Extension for Scikit-learn* from source with that backend.

Note that Extension for Scikit-learn* supports GPU offloading to speed up MPI operations. This is supported automatically with some MPI backends, but in order to use GPU offloading with Intel MPI, it is required to set the environment variable I_MPI_OFFLOAD to 1 (providing data on device without this may lead to a runtime error):

export I_MPI_OFFLOAD=1

SMPD-aware versions of estimators can be imported from the sklearnex.spmd module. Data should be distributed across multiple nodes as desired, and should be transferred to a dpctl or dpnp array before being passed to the estimator.

Note that SPMD estimators allow an additional argument queue in their .fit / .predict methods, which accept dpctl.SyclQueue objects. For example, while the signature for sklearn.linear_model.LinearRegression.predict would be

def predict(self, X): ...

The signature for the corresponding predict method in sklearnex.spmd.linear_model.LinearRegression.predict is:

def predict(self, X, queue=None): ...

Examples of SPMD usage can be found in the GitHub repository for the Extension for Scikit-learn* under examples/sklearnex.

To run on SPMD mode, first create a python file using SPMD estimators from sklearnex.spmd, such as linear_regression_spmd.py.

Then, execute the file through MPI under multiple ranks - for example:

mpirun -n 4 python linear_regression_spmd.py

(and remember to set I_MPI_OFFLOAD=1 for Intel’s MPI before calling mpirun/mpiexec)

Note that additional mpirun arguments can be added as desired. SPMD-supported estimators are listed in the SPMD Support section.