Non-Scikit-Learn Algorithms

Algorithms not presented in the original scikit-learn are described here. All algorithms are available for both CPU and GPU (including distributed mode)

BasicStatistics

class sklearnex.basic_statistics.BasicStatistics(result_options='all', *, n_jobs=None)[source]

Estimator for basic statistics. Allows to compute basic statistics for provided data.

Parameters:
  • result_options (string or list, default='all') – Used to set statistics to calculate. Possible values are 'min', 'max', 'sum', 'mean', 'variance', 'variation', sum_squares', sum_squares_centered', 'standard_deviation', 'second_order_raw_moment' or a list containing any of these values. If set to 'all' then all possible statistics will be calculated.

  • n_jobs (int, default=None) – The number of jobs to use in parallel for the computation. None means using all physical cores unless in a joblib.parallel_backend context. -1 means using all logical cores. See Glossary for more details.

min_

Minimum of each feature over all samples.

Type:

ndarray of shape (n_features,)

max_

Maximum of each feature over all samples.

Type:

ndarray of shape (n_features,)

sum_

Sum of each feature over all samples.

Type:

ndarray of shape (n_features,)

mean_

Mean of each feature over all samples.

Type:

ndarray of shape (n_features,)

variance_

Variance of each feature over all samples.

Type:

ndarray of shape (n_features,)

variation_

Variation of each feature over all samples.

Type:

ndarray of shape (n_features,)

sum_squares_

Sum of squares for each feature over all samples.

Type:

ndarray of shape (n_features,)

standard_deviation_

Standard deviation of each feature over all samples.

Type:

ndarray of shape (n_features,)

sum_squares_centered_

Centered sum of squares for each feature over all samples.

Type:

ndarray of shape (n_features,)

second_order_raw_moment_

Second order moment of each feature over all samples.

Type:

ndarray of shape (n_features,)

Note

Attribute exists only if corresponding result option has been provided.

Note

Names of attributes without the trailing underscore are supported currently but deprecated in 2025.1 and will be removed in 2026.0

Note

Some results can exhibit small variations due to floating point error accumulation and multithreading.

Examples

>>> import numpy as np
>>> from sklearnex.basic_statistics import BasicStatistics
>>> bs = BasicStatistics(result_options=['sum', 'min', 'max'])
>>> X = np.array([[1, 2], [3, 4]])
>>> bs.fit(X)
>>> bs.sum_
np.array([4., 6.])
>>> bs.min_
np.array([1., 2.])
BasicStatistics.fit(X, y=None, *, sample_weight=None)[source]

Calculate statistics of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Data for compute, where n_samples is the number of samples and n_features is the number of features.

  • y (Ignored) – Not used, present for API consistency by convention.

  • sample_weight (array-like of shape (n_samples,), default=None) – Weights for compute weighted statistics, where n_samples is the number of samples.

Returns:

self – Returns the instance itself.

Return type:

object

IncrementalBasicStatistics

class sklearnex.basic_statistics.IncrementalBasicStatistics(result_options='all', batch_size=None, *, n_jobs=None)[source]

Calculates basic statistics on the given data, allows for computation when the data are split into batches. The user can use partial_fit method to provide a single batch of data or use the fit method to provide the entire dataset.

Parameters:
  • result_options (string or list, default='all') – List of statistics to compute

  • batch_size (int, default=None) – The number of samples to use for each batch. Only used when calling fit. If batch_size is None, then batch_size is inferred from the data and set to 5 * n_features.

  • n_jobs (int, default=None) – The number of jobs to use in parallel for the computation. None means using all physical cores unless in a joblib.parallel_backend context. -1 means using all logical cores. See Glossary for more details.

min_

Minimum of each feature over all samples.

Type:

ndarray of shape (n_features,)

max_

Maximum of each feature over all samples.

Type:

ndarray of shape (n_features,)

sum_

Sum of each feature over all samples.

Type:

ndarray of shape (n_features,)

mean_

Mean of each feature over all samples.

Type:

ndarray of shape (n_features,)

variance_

Variance of each feature over all samples.

Type:

ndarray of shape (n_features,)

variation_

Variation of each feature over all samples.

Type:

ndarray of shape (n_features,)

sum_squares_

Sum of squares for each feature over all samples.

Type:

ndarray of shape (n_features,)

standard_deviation_

Standard deviation of each feature over all samples.

Type:

ndarray of shape (n_features,)

sum_squares_centered_

Centered sum of squares for each feature over all samples.

Type:

ndarray of shape (n_features,)

second_order_raw_moment_

Second order moment of each feature over all samples.

Type:

ndarray of shape (n_features,)

n_samples_seen_

The number of samples processed by the estimator. Will be reset on new calls to fit, but increments across partial_fit calls.

Type:

int

batch_size_

Inferred batch size from batch_size.

Type:

int

n_features_in_

Number of features seen during fit or partial_fit.

Type:

int

Note

Attribute exists only if corresponding result option has been provided.

Note

Names of attributes without the trailing underscore are supported currently but deprecated in 2025.1 and will be removed in 2026.0

Examples

>>> import numpy as np
>>> from sklearnex.basic_statistics import IncrementalBasicStatistics
>>> incbs = IncrementalBasicStatistics(batch_size=1)
>>> X = np.array([[1, 2], [3, 4]])
>>> incbs.partial_fit(X[:1])
>>> incbs.partial_fit(X[1:])
>>> incbs.sum_
np.array([4., 6.])
>>> incbs.min_
np.array([1., 2.])
>>> incbs.fit(X)
>>> incbs.sum_
np.array([4., 6.])
>>> incbs.max_
np.array([3., 4.])
IncrementalBasicStatistics.fit(X, y=None, sample_weight=None)[source]

Calculate statistics of X using minibatches of size batch_size.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Data for compute, where n_samples is the number of samples and n_features is the number of features.

  • y (Ignored) – Not used, present for API consistency by convention.

  • sample_weight (array-like of shape (n_samples,), default=None) – Weights for compute weighted statistics, where n_samples is the number of samples.

Returns:

self – Returns the instance itself.

Return type:

object

IncrementalBasicStatistics.partial_fit(X, sample_weight=None, check_input=True)[source]

Incremental fit with X. All of X is processed as a single batch.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Data for compute, where n_samples is the number of samples and n_features is the number of features.

  • y (Ignored) – Not used, present for API consistency by convention.

  • sample_weight (array-like of shape (n_samples,), default=None) – Weights for compute weighted statistics, where n_samples is the number of samples.

  • check_input (bool, default=True) – Run check_array on X.

Returns:

self – Returns the instance itself.

Return type:

object

IncrementalEmpiricalCovariance

class sklearnex.covariance.IncrementalEmpiricalCovariance(*, store_precision=False, assume_centered=False, batch_size=None, copy=True, n_jobs=None)[source]

Maximum likelihood covariance estimator that allows for the estimation when the data are split into batches. The user can use the partial_fit method to provide a single batch of data or use the fit method to provide the entire dataset.

Parameters:
  • store_precision (bool, default=False) – Specifies if the estimated precision is stored.

  • assume_centered (bool, default=False) – If True, data are not centered before computation. Useful when working with data whose mean is almost, but not exactly zero. If False (default), data are centered before computation.

  • batch_size (int, default=None) – The number of samples to use for each batch. Only used when calling fit. If batch_size is None, then batch_size is inferred from the data and set to 5 * n_features, to provide a balance between approximation accuracy and memory consumption.

  • copy (bool, default=True) – If False, X will be overwritten. copy=False can be used to save memory but is unsafe for general use.

  • n_jobs (int, default=None) – The number of jobs to use in parallel for the computation. None means using all physical cores unless in a joblib.parallel_backend context. -1 means using all logical cores. See Glossary for more details.

location_

Estimated location, i.e. the estimated mean.

Type:

ndarray of shape (n_features,)

covariance_

Estimated covariance matrix

Type:

ndarray of shape (n_features, n_features)

n_samples_seen_

The number of samples processed by the estimator. Will be reset on new calls to fit, but increments across partial_fit calls.

Type:

int

batch_size_

Inferred batch size from batch_size.

Type:

int

n_features_in_

Number of features seen during fit or partial_fit.

Type:

int

Examples

>>> import numpy as np
>>> from sklearnex.covariance import IncrementalEmpiricalCovariance
>>> inccov = IncrementalEmpiricalCovariance(batch_size=1)
>>> X = np.array([[1, 2], [3, 4]])
>>> inccov.partial_fit(X[:1])
>>> inccov.partial_fit(X[1:])
>>> inccov.covariance_
np.array([[1., 1.],[1., 1.]])
>>> inccov.location_
np.array([2., 3.])
>>> inccov.fit(X)
>>> inccov.covariance_
np.array([[1., 1.],[1., 1.]])
>>> inccov.location_
np.array([2., 3.])
IncrementalEmpiricalCovariance.fit(X, y=None)[source]

Fit the model with X, using minibatches of size batch_size.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.

  • y (Ignored) – Not used, present for API consistency by convention.

Returns:

self – Returns the instance itself.

Return type:

object

IncrementalEmpiricalCovariance.partial_fit(X, y=None, check_input=True)[source]

Incremental fit with X. All of X is processed as a single batch.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.

  • y (Ignored) – Not used, present for API consistency by convention.

  • check_input (bool, default=True) – Run check_array on X.

Returns:

self – Returns the instance itself.

Return type:

object

IncrementalLinearRegression

class sklearnex.linear_model.IncrementalLinearRegression(*, fit_intercept=True, copy_X=True, n_jobs=None, batch_size=None)[source]

Trains a linear regression model, allows for computation if the data are split into batches. The user can use the partial_fit method to provide a single batch of data or use the fit method to provide the entire dataset.

Parameters:
  • fit_intercept (bool, default=True)

  • set (Whether to calculate the intercept for this model. If)

  • False (to)

  • calculations (no intercept will be used in)

  • centered). ((i.e. data is expected to be)

  • copy_X (bool, default=True) – If True, X will be copied; else, it may be overwritten.

  • n_jobs (int, default=None) – The number of jobs to use for the computation.

  • batch_size (int, default=None) – The number of samples to use for each batch. Only used when calling fit. If batch_size is None, then batch_size is inferred from the data and set to 5 * n_features.

coef_

Estimated coefficients for the linear regression problem. If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features), while if only one target is passed, this is a 1D array of length n_features.

Type:

array of shape (n_features, ) or (n_targets, n_features)

intercept_

Independent term in the linear model. Set to 0.0 if fit_intercept = False.

Type:

float or array of shape (n_targets,)

n_samples_seen_

The number of samples processed by the estimator. Will be reset on new calls to fit, but increments across partial_fit calls. It should be not less than n_features_in_ if fit_intercept is False and not less than n_features_in_ + 1 if fit_intercept is True to obtain regression coefficients.

Type:

int

batch_size_

Inferred batch size from batch_size.

Type:

int

n_features_in_

Number of features seen during fit or partial_fit.

Type:

int

Examples

>>> import numpy as np
>>> from sklearnex.linear_model import IncrementalLinearRegression
>>> inclr = IncrementalLinearRegression(batch_size=2)
>>> X = np.array([[1, 2], [3, 4], [5, 6], [7, 10]])
>>> y = np.array([1.5, 3.5, 5.5, 8.5])
>>> inclr.partial_fit(X[:2], y[:2])
>>> inclr.partial_fit(X[2:], y[2:])
>>> inclr.coef_
np.array([0.5., 0.5.])
>>> inclr.intercept_
np.array(0.)
>>> inclr.fit(X)
>>> inclr.coef_
np.array([0.5., 0.5.])
>>> inclr.intercept_
np.array(0.)
IncrementalLinearRegression.fit(X, y)[source]

Fit the model with X and y, using minibatches of size batch_size.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features. It is necessary for n_samples to be not less than n_features if fit_intercept is False and not less than n_features + 1 if fit_intercept is True

  • y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values, where n_samples is the number of samples and n_targets is the number of targets.

Returns:

self – Returns the instance itself.

Return type:

object

IncrementalLinearRegression.partial_fit(X, y, check_input=True)[source]

Incremental fit linear model with X and y. All of X and y is processed as a single batch.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.

  • y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values, where n_samples is the number of samples and n_targets is the number of targets.

Returns:

self – Returns the instance itself.

Return type:

object

IncrementalLinearRegression.predict(X, y=None)[source]

Predict using the linear model.

Parameters:
  • X (array-like or sparse matrix, shape (n_samples, n_features)) – Samples.

  • y (Ignored) – Not used, present for API consistency by convention.

Returns:

C – Returns predicted values.

Return type:

array, shape (n_samples, n_targets)