About the Extension for Scikit-learn*

The Extension for Scikit-learn* is a free and open-source software accelerator built atop of the scikit-learn and the oneDAL (oneAPI Data Analytics Library) libraries.

It mostly works by replacing selected calls to algorithms in scikit-learn with calls to the oneAPI Data Analytics Library library, which offers more optimized versions of the same routines (see Supported Algorithms). The optimizations in the oneAPI Data Analytics Library in turn are achieved by leveraging SIMD instructions and exploiting cache structures of modern hardware, along with using the oneMKL library for linear algebra operations in place of the OpenBLAS library used by default by scikit-learn.

Unlike other libraries in the Python ecosystem, classes and functions in the Extension for Scikit-learn* are not just scikit-learn-compatible, but rather are built atop of scikit-learn itself by inheriting from their classes directly, defining the same attributes that the stock version of scikit-learn would do for each estimator, and reusing most of scikit-learn’s estimator methods where appropriate.

The Extension for Scikit-learn* is regularly tested for API compatibility and for correctness against scikit-learn’s own test suite (see Scikit-learn’s test suite for more information), and can be easily swapped in place of the stock scikit-learn library by patching it.

The Extension for Scikit-learn* aims to be compatible with the last 3 minor releases of Extension for Scikit-learn* available at any given time, in addition to the 1.0 release as a special case, and ensures this compatibility by offering different code routes according to the scikit-learn version encountered at runtime - for example, if a given attribute of a class is removed in version 1.x of scikit-learn, the Extension for Scikit-learn* will not set that attribute when running with scikit-learn >=1.x, but will still do so when running with scikit-learn <1.x, in order to guarantee full API compatibility.

Performance of the Extension for Scikit-learn* is regularly measured and compared against that of other libraries using public and synthetic datasets through sklbench, which is also free and fully open-source.

Initially developed by Intel as the Intel Extension for Scikit-learn*, the Extension for Scikit-learn* and the oneAPI Data Analytics Library are now projects under the UXL Foundation umbrella, and can be built from source to provide accelerated routines for other platforms such as ARM and RISCV - see Building from Source for more information.