Parallelism Specifics
Extension for Scikit-learn* supports the n_jobs parameter of the original scikit-learn with the following differences:
n_jobsparameter is supported for all estimators patched by Extension for Scikit-learn*, while scikit-learn enables it for selected estimators only.n_jobsestimator parameter sets the number of threads used by the underlying oneAPI Data Analytics Library.Extension for Scikit-learn* doesn’t use
joblibfor parallelism in patched estimators and functions.The only low-level parallelism library used by Extension for Scikit-learn* is oneTBB (through the oneAPI Data Analytics Library and oneMKL).
If
n_jobsis not specified, Extension for Scikit-learn* uses all available threads whereas scikit-learn is single-threaded by default. Note that the deprecated daal4py module uses a global configuration instead of per-objectn_jobsarguments, with the default also being all available threads.
Extension for Scikit-learn* follows the same rules as scikit-learn for the calculation of the n_jobs parameter value.
When scikit-learn’s utilities with built-in parallelism are used
(for example, sklearn.model_selection.GridSearchCV or sklearn.model_selection.VotingClassifier),
Extension for Scikit-learn* tries to determine the optimal number of threads per job using hints provided by joblib / threadpoolctl..
If n_jobs is not specified for underlying estimator(s), Extension for Scikit-learn* sets it to the number of available threads
(usually the number of logical CPUs divided by n_jobs set for higher-level parallelized entities).
Note
Environment variables such as OMP_NUM_THREADS, MKL_NUM_THREADS, OPENBLAS_NUM_THREADS, and others used by
low-level parallelism libraries do not affect Extension for Scikit-learn*, nor does the
mkl-service package.
Note
n_jobs has no effect if computations are performed on GPU.
Note
threadpoolctl context has no effect on Extension for Scikit-learn* threading if n_jobs is specified and non-negative.
If n_jobs is equal to 0 or not specified then the number from threadpoolctl is propagated to Extension for Scikit-learn*.
If n_jobs is negative then the threadpoolctl’s number will be max(1, n_threadpoolctl + n_jobs + 1).
Note
Extension for Scikit-learn* threading doesn’t automatically avoid nested parallelism when used in conjunction with OpenMP and/or python threads.
Warning
If several instances of Extension for Scikit-learn* algorithms are run sequentially and the n_jobs parameter for the first run
is significantly greater than for subsequent ones, it may result in performance degradation due to a known issue
with oneTBB.
Warning
In general, accelerated computations offered by estimators from the Extension for Scikit-learn* do not raise the Python GIL, thus they are not compatible with multi-threading backends that rely on Python threads.
Warning
Internally, the number of threads for calls to estimator methods from
the Extension for Scikit-learn* is managed through global variables - thus, if multiple
calls to estimators with different n_jobs are performed in parallel
through Python threads, there might be threading races that override
one another’s configuration, potentially leading to process-wide crashes.
If concurrent calls are to be performed, process-based parallelism should
be used instead.
Setting the DEBUG verbosity setting will produce logs
indicating when the number of threads used is different from the default
(number of logical threads in the machine).