Parallelism Specifics

Extension for Scikit-learn* supports the n_jobs parameter of the original scikit-learn with the following differences:

n_jobs parameter is supported for all estimators patched by Extension for Scikit-learn*, while scikit-learn enables it for selected estimators only.
n_jobs estimator parameter sets the number of threads used by the underlying oneAPI Data Analytics Library.
Extension for Scikit-learn* doesn’t use joblib for parallelism in patched estimators and functions.
The only low-level parallelism library used by Extension for Scikit-learn* is oneTBB (through the oneAPI Data Analytics Library and oneMKL).
If n_jobs is not specified, Extension for Scikit-learn* uses all available threads whereas scikit-learn is single-threaded by default. Note that the deprecated daal4py module uses a global configuration instead of per-object n_jobs arguments, with the default also being all available threads.

Extension for Scikit-learn* follows the same rules as scikit-learn for the calculation of the n_jobs parameter value.

When scikit-learn’s utilities with built-in parallelism are used (for example, sklearn.model_selection.GridSearchCV or sklearn.model_selection.VotingClassifier), Extension for Scikit-learn* tries to determine the optimal number of threads per job using hints provided by joblib / threadpoolctl. If n_jobs is not specified for underlying estimator(s), Extension for Scikit-learn* sets it to the number of available threads (usually the number of logical CPUs divided by n_jobs set for higher-level parallelized entities).

Note

Environment variables such as OMP_NUM_THREADS, MKL_NUM_THREADS, OPENBLAS_NUM_THREADS, and others used by low-level parallelism libraries do not affect Extension for Scikit-learn*, nor does the mkl-service package.

Note

n_jobs has no effect if computations are performed on GPU.

Note

threadpoolctl context has no effect on Extension for Scikit-learn* threading if n_jobs is specified and non-negative. If n_jobs is equal to 0 or not specified then the number from threadpoolctl is propagated to Extension for Scikit-learn*. If n_jobs is negative then the threadpoolctl’s number will be max(1, n_threadpoolctl + n_jobs + 1).

Note

Extension for Scikit-learn* threading doesn’t automatically avoid nested parallelism when used in conjunction with OpenMP and/or python threads.

Warning

If several instances of Extension for Scikit-learn* algorithms are run sequentially and the n_jobs parameter for the first run is significantly greater than for subsequent ones, it may result in performance degradation due to a known issue with oneTBB.

Warning

In general, accelerated computations offered by estimators from the Extension for Scikit-learn* do not raise the Python GIL, thus they are not compatible with multi-threading backends that rely on Python threads.

Warning

Internally, the number of threads for calls to estimator methods from the Extension for Scikit-learn* is managed through global variables - thus, if multiple calls to estimators with different n_jobs are performed in parallel through Python threads, there might be threading races that override one another’s configuration, potentially leading to process-wide crashes. If concurrent calls are to be performed, process-based parallelism should be used instead.

Setting the DEBUG verbosity setting will produce logs indicating when the number of threads used is different from the default (number of logical threads in the machine).