Non-Scikit-Learn Algorithms

Algorithms not presented in the original scikit-learn are described here. All algorithms are available for both CPU and GPU (including distributed mode)

class sklearnex.basic_statistics.BasicStatistics(result_options='all', *, n_jobs=None)[source]

Bases: ExtensionEstimator, BaseEstimator

Estimator for basic statistics. Allows to compute basic statistics for provided data.

Parameters:
  • result_options (string or list, default='all') – Used to set statistics to calculate. Possible values are 'min', 'max', 'sum', 'mean', 'variance', 'variation', sum_squares', sum_squares_centered', 'standard_deviation', 'second_order_raw_moment' or a list containing any of these values. If set to 'all' then all possible statistics will be calculated.

  • n_jobs (int, default=None) – The number of jobs to use in parallel for the computation. None means using all physical cores unless in a joblib.parallel_backend context. -1 means using all logical cores. See Glossary for more details.

min_

Minimum of each feature over all samples.

Type:

ndarray of shape (n_features,)

max_

Maximum of each feature over all samples.

Type:

ndarray of shape (n_features,)

sum_

Sum of each feature over all samples.

Type:

ndarray of shape (n_features,)

mean_

Mean of each feature over all samples.

Type:

ndarray of shape (n_features,)

variance_

Variance of each feature over all samples. Bessel’s correction is used.

Type:

ndarray of shape (n_features,)

variation_

Variation of each feature over all samples. Bessel’s correction is used.

Type:

ndarray of shape (n_features,)

sum_squares_

Sum of squares for each feature over all samples.

Type:

ndarray of shape (n_features,)

standard_deviation_

Unbiased standard deviation of each feature over all samples. Bessel’s correction is used.

Type:

ndarray of shape (n_features,)

sum_squares_centered_

Centered sum of squares for each feature over all samples.

Type:

ndarray of shape (n_features,)

second_order_raw_moment_

Second order moment of each feature over all samples.

Type:

ndarray of shape (n_features,)

Note

Attribute exists only if corresponding result option has been provided.

Note

Names of attributes without the trailing underscore are supported currently but deprecated in 2025.1 and will be removed in 2026.0

Note

Some results can exhibit small variations due to floating point error accumulation and multithreading.

Examples

>>> import numpy as np
>>> from sklearnex.basic_statistics import BasicStatistics
>>> bs = BasicStatistics(result_options=['sum', 'min', 'max'])
>>> X = np.array([[1, 2], [3, 4]])
>>> bs.fit(X)
>>> bs.sum_
np.array([4., 6.])
>>> bs.min_
np.array([1., 2.])
fit(X, y=None, sample_weight=None)[source]

Calculate statistics of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Data for compute, where n_samples is the number of samples and n_features is the number of features.

  • y (Ignored) – Not used, present for API consistency by convention.

  • sample_weight (array-like of shape (n_samples,), default=None) – Weights for compute weighted statistics, where n_samples is the number of samples.

Returns:

self – Returns the instance itself.

Return type:

object

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') BasicStatistics

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

class sklearnex.basic_statistics.IncrementalBasicStatistics(result_options='all', batch_size=None, *, n_jobs=None)[source]

Bases: ExtensionEstimator, BaseEstimator

Calculates basic statistics on the given data, allows for computation when the data are split into batches. The user can use partial_fit method to provide a single batch of data or use the fit method to provide the entire dataset.

Parameters:
  • result_options (string or list, default='all') – List of statistics to compute

  • batch_size (int, default=None) – The number of samples to use for each batch. Only used when calling fit. If batch_size is None, then batch_size is inferred from the data and set to 5 * n_features.

  • n_jobs (int, default=None) – The number of jobs to use in parallel for the computation. None means using all physical cores unless in a joblib.parallel_backend context. -1 means using all logical cores. See Glossary for more details.

min_

Minimum of each feature over all samples.

Type:

ndarray of shape (n_features,)

max_

Maximum of each feature over all samples.

Type:

ndarray of shape (n_features,)

sum_

Sum of each feature over all samples.

Type:

ndarray of shape (n_features,)

mean_

Mean of each feature over all samples.

Type:

ndarray of shape (n_features,)

variance_

Variance of each feature over all samples.

Type:

ndarray of shape (n_features,)

variation_

Variation of each feature over all samples.

Type:

ndarray of shape (n_features,)

sum_squares_

Sum of squares for each feature over all samples.

Type:

ndarray of shape (n_features,)

standard_deviation_

Standard deviation of each feature over all samples.

Type:

ndarray of shape (n_features,)

sum_squares_centered_

Centered sum of squares for each feature over all samples.

Type:

ndarray of shape (n_features,)

second_order_raw_moment_

Second order moment of each feature over all samples.

Type:

ndarray of shape (n_features,)

n_samples_seen_

The number of samples processed by the estimator. Will be reset on new calls to fit, but increments across partial_fit calls.

Type:

int

batch_size_

Inferred batch size from batch_size.

Type:

int

n_features_in_

Number of features seen during fit or partial_fit.

Type:

int

Note

Attribute exists only if corresponding result option has been provided.

Note

Names of attributes without the trailing underscore are supported currently but deprecated in 2025.1 and will be removed in 2026.0

Note

Serializing instances of this class will trigger a forced finalization of calculations when the inputs are in a sycl queue or when using GPUs. Since (internal method) finalize_fit can’t be dispatched without directly provided queue and the dispatching policy can’t be serialized, the computation is finalized during serialization call and the policy is not saved in serialized data.

Examples

>>> import numpy as np
>>> from sklearnex.basic_statistics import IncrementalBasicStatistics
>>> incbs = IncrementalBasicStatistics(batch_size=1)
>>> X = np.array([[1, 2], [3, 4]])
>>> incbs.partial_fit(X[:1])
>>> incbs.partial_fit(X[1:])
>>> incbs.sum_
np.array([4., 6.])
>>> incbs.min_
np.array([1., 2.])
>>> incbs.fit(X)
>>> incbs.sum_
np.array([4., 6.])
>>> incbs.max_
np.array([3., 4.])
fit(X, y=None, sample_weight=None)[source]

Calculate statistics of X using minibatches of size batch_size.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Data for compute, where n_samples is the number of samples and n_features is the number of features.

  • y (Ignored) – Not used, present for API consistency by convention.

  • sample_weight (array-like of shape (n_samples,), default=None) – Weights for compute weighted statistics, where n_samples is the number of samples.

Returns:

self – Returns the instance itself.

Return type:

IncrementalBasicStatistics

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

partial_fit(X, sample_weight=None, check_input=True)[source]

Incremental fit with X. All of X is processed as a single batch.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Data for compute, where n_samples is the number of samples and n_features is the number of features.

  • y (Ignored) – Not used, present for API consistency by convention.

  • sample_weight (array-like of shape (n_samples,), default=None) – Weights for compute weighted statistics, where n_samples is the number of samples.

  • check_input (bool, default=True) – Run check_array on X.

Returns:

self – Returns the instance itself.

Return type:

IncrementalBasicStatistics

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') IncrementalBasicStatistics

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_partial_fit_request(*, check_input: bool | None | str = '$UNCHANGED$', sample_weight: bool | None | str = '$UNCHANGED$') IncrementalBasicStatistics

Request metadata passed to the partial_fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to partial_fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to partial_fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
  • check_input (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for check_input parameter in partial_fit.

  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in partial_fit.

Returns:

self – The updated object.

Return type:

object

class sklearnex.covariance.IncrementalEmpiricalCovariance(*, store_precision=False, assume_centered=False, batch_size=None, copy=True, n_jobs=None)[source]

Bases: ExtensionEstimator, BaseEstimator

Maximum likelihood covariance estimator that allows for the estimation when the data are split into batches. The user can use the partial_fit method to provide a single batch of data or use the fit method to provide the entire dataset.

Parameters:
  • store_precision (bool, default=False) – Specifies if the estimated precision is stored.

  • assume_centered (bool, default=False) – If True, data are not centered before computation. Useful when working with data whose mean is almost, but not exactly zero. If False (default), data are centered before computation.

  • batch_size (int, default=None) – The number of samples to use for each batch. Only used when calling fit. If batch_size is None, then batch_size is inferred from the data and set to 5 * n_features, to provide a balance between approximation accuracy and memory consumption.

  • copy (bool, default=True) – If False, X will be overwritten. copy=False can be used to save memory but is unsafe for general use.

  • n_jobs (int, default=None) – The number of jobs to use in parallel for the computation. None means using all physical cores unless in a joblib.parallel_backend context. -1 means using all logical cores. See Glossary for more details.

location_

Estimated location, i.e. the estimated mean.

Type:

ndarray of shape (n_features,)

covariance_

Estimated covariance matrix

Type:

ndarray of shape (n_features, n_features)

n_samples_seen_

The number of samples processed by the estimator. Will be reset on new calls to fit, but increments across partial_fit calls.

Type:

int

batch_size_

Inferred batch size from batch_size.

Type:

int

n_features_in_

Number of features seen during fit or partial_fit.

Type:

int

Note

Serializing instances of this class will trigger a forced finalization of calculations when the inputs are in a sycl queue or when using GPUs. Since (internal method) finalize_fit can’t be dispatched without directly provided queue and the dispatching policy can’t be serialized, the computation is finalized during serialization call and the policy is not saved in serialized data.

Examples

>>> import numpy as np
>>> from sklearnex.covariance import IncrementalEmpiricalCovariance
>>> inccov = IncrementalEmpiricalCovariance(batch_size=1)
>>> X = np.array([[1, 2], [3, 4]])
>>> inccov.partial_fit(X[:1])
>>> inccov.partial_fit(X[1:])
>>> inccov.covariance_
np.array([[1., 1.],[1., 1.]])
>>> inccov.location_
np.array([2., 3.])
>>> inccov.fit(X)
>>> inccov.covariance_
np.array([[1., 1.],[1., 1.]])
>>> inccov.location_
np.array([2., 3.])
error_norm(comp_cov, norm='frobenius', scaling=True, squared=True)

Compute the Mean Squared Error between two covariance estimators.

Parameters:
  • comp_cov (array-like of shape (n_features, n_features)) – The covariance to compare with.

  • norm ({"frobenius", "spectral"}, default="frobenius") – The type of norm used to compute the error. Available error types: - ‘frobenius’ (default): sqrt(tr(A^t.A)) - ‘spectral’: sqrt(max(eigenvalues(A^t.A)) where A is the error (comp_cov - self.covariance_).

  • scaling (bool, default=True) – If True (default), the squared error norm is divided by n_features. If False, the squared error norm is not rescaled.

  • squared (bool, default=True) – Whether to compute the squared error norm or the error norm. If True (default), the squared error norm is returned. If False, the error norm is returned.

Returns:

result – The Mean Squared Error (in the sense of the Frobenius norm) between self and comp_cov covariance estimators.

Return type:

float

fit(X, y=None)[source]

Fit the model with X, using minibatches of size batch_size.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.

  • y (Ignored) – Not used, present for API consistency by convention.

Returns:

self – Returns the instance itself.

Return type:

IncrementalEmpiricalCovariance

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

get_precision()

Getter for the precision matrix.

Returns:

precision_ – The precision matrix associated to the current covariance object.

Return type:

array-like of shape (n_features, n_features)

mahalanobis(X)[source]

Compute the squared Mahalanobis distances of given observations.

Parameters:

X (array-like of shape (n_samples, n_features)) – The observations, the Mahalanobis distances of the which we compute. Observations are assumed to be drawn from the same distribution than the data used in fit.

Returns:

dist – Squared Mahalanobis distances of the observations.

Return type:

ndarray of shape (n_samples,)

partial_fit(X, y=None, check_input=True)[source]

Incremental fit with X. All of X is processed as a single batch.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.

  • y (Ignored) – Not used, present for API consistency by convention.

  • check_input (bool, default=True) – Run check_array on X.

Returns:

self – Returns the instance itself.

Return type:

IncrementalEmpiricalCovariance

score(X_test, y=None)[source]

Compute the log-likelihood of X_test under the estimated Gaussian model.

The Gaussian model is defined by its mean and covariance matrix which are represented respectively by self.location_ and self.covariance_.

Parameters:
  • X_test (array-like of shape (n_samples, n_features)) – Test data of which we compute the likelihood, where n_samples is the number of samples and n_features is the number of features. X_test is assumed to be drawn from the same distribution than the data used in fit (including centering).

  • y (Ignored) – Not used, present for API consistency by convention.

Returns:

res – The log-likelihood of X_test with self.location_ and self.covariance_ as estimators of the Gaussian model mean and covariance matrix respectively.

Return type:

float

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_partial_fit_request(*, check_input: bool | None | str = '$UNCHANGED$') IncrementalEmpiricalCovariance

Request metadata passed to the partial_fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to partial_fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to partial_fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

check_input (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for check_input parameter in partial_fit.

Returns:

self – The updated object.

Return type:

object

set_score_request(*, X_test: bool | None | str = '$UNCHANGED$') IncrementalEmpiricalCovariance

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

X_test (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for X_test parameter in score.

Returns:

self – The updated object.

Return type:

object

class sklearnex.linear_model.IncrementalLinearRegression(*, fit_intercept=True, copy_X=True, n_jobs=None, batch_size=None)[source]

Bases: ExtensionEstimator, MultiOutputMixin, RegressorMixin, BaseEstimator

Trains a linear regression model, allows for computation if the data are split into batches. The user can use the partial_fit method to provide a single batch of data or use the fit method to provide the entire dataset.

Parameters:
  • fit_intercept (bool, default=True) – Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).

  • copy_X (bool, default=True) – If True, X will be copied; else, it may be overwritten.

  • n_jobs (int, default=None) – The number of jobs to use for the computation.

  • batch_size (int, default=None) – The number of samples to use for each batch. Only used when calling fit. If batch_size is None, then batch_size is inferred from the data and set to 5 * n_features.

coef_

Estimated coefficients for the linear regression problem. If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features), while if only one target is passed, this is a 1D array of length n_features.

Type:

array of shape (n_features, ) or (n_targets, n_features)

intercept_

Independent term in the linear model. Set to 0.0 if fit_intercept = False.

Type:

float or array of shape (n_targets,)

n_samples_seen_

The number of samples processed by the estimator. Will be reset on new calls to fit, but increments across partial_fit calls. It should be not less than n_features_in_ if fit_intercept is False and not less than n_features_in_ + 1 if fit_intercept is True to obtain regression coefficients.

Type:

int

batch_size_

Inferred batch size from batch_size.

Type:

int

n_features_in_

Number of features seen during fit or partial_fit.

Type:

int

Note

Serializing instances of this class will trigger a forced finalization of calculations when the inputs are in a sycl queue or when using GPUs. Since (internal method) finalize_fit can’t be dispatched without directly provided queue and the dispatching policy can’t be serialized, the computation is finalized during serialization call and the policy is not saved in serialized data.

Examples

>>> import numpy as np
>>> from sklearnex.linear_model import IncrementalLinearRegression
>>> inclr = IncrementalLinearRegression(batch_size=2)
>>> X = np.array([[1, 2], [3, 4], [5, 6], [7, 10]])
>>> y = np.array([1.5, 3.5, 5.5, 8.5])
>>> inclr.partial_fit(X[:2], y[:2])
>>> inclr.partial_fit(X[2:], y[2:])
>>> inclr.coef_
np.array([0.5., 0.5.])
>>> inclr.intercept_
np.array(0.)
>>> inclr.fit(X)
>>> inclr.coef_
np.array([0.5., 0.5.])
>>> inclr.intercept_
np.array(0.)
fit(X, y)[source]

Fit the model with X and y, using minibatches of size batch_size.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features. It is necessary for n_samples to be not less than n_features if fit_intercept is False and not less than n_features + 1 if fit_intercept is True

  • y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values, where n_samples is the number of samples and n_targets is the number of targets.

Returns:

self – Returns the instance itself.

Return type:

IncrementalLinearRegression

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

partial_fit(X, y, check_input=True)[source]

Incremental fit linear model with X and y. All of X and y is processed as a single batch.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.

  • y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values, where n_samples is the number of samples and n_targets is the number of targets.

Returns:

self – Returns the instance itself.

Return type:

IncrementalLinearRegression

predict(X, y=None)[source]

Predict using the linear model.

Parameters:

X (array-like or sparse matrix, shape (n_samples, n_features)) – Samples.

Returns:

C – Returns predicted values.

Return type:

array, shape (n_samples,)

score(X, y, sample_weight=None)[source]

Return the coefficient of determination of the prediction.

The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns:

score\(R^2\) of self.predict(X) w.r.t. y.

Return type:

float

Notes

The \(R^2\) score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score(). This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_partial_fit_request(*, check_input: bool | None | str = '$UNCHANGED$') IncrementalLinearRegression

Request metadata passed to the partial_fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to partial_fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to partial_fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

check_input (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for check_input parameter in partial_fit.

Returns:

self – The updated object.

Return type:

object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') IncrementalLinearRegression

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

class sklearnex.linear_model.IncrementalRidge(fit_intercept=True, alpha=1.0, copy_X=True, n_jobs=None, batch_size=None)[source]

Bases: ExtensionEstimator, MultiOutputMixin, RegressorMixin, BaseEstimator

Incremental estimator for Ridge Regression. Allows to train Ridge Regression if data is splitted into batches.

Parameters:
  • fit_intercept (bool, default=True) – Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).

  • alpha (float, default=1.0) – Regularization strength; must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization.

  • copy_X (bool, default=True) – If True, X will be copied; else, it may be overwritten.

  • n_jobs (int, default=None) – The number of jobs to use for the computation.

  • batch_size (int, default=None) – The number of samples to use for each batch. Only used when calling fit. If batch_size is None, then batch_size is inferred from the data and set to 5 * n_features, to provide a balance between approximation accuracy and memory consumption.

coef_

Estimated coefficients for the ridge regression problem. If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features), while if only one target is passed, this is a 1D array of length n_features.

Type:

array of shape (n_features, ) or (n_targets, n_features)

intercept_

Independent term in the linear model. Set to 0.0 if fit_intercept = False.

Type:

float or array of shape (n_targets,)

n_features_in_

Number of features seen during fit.

Type:

int

n_samples_seen_

The number of samples processed by the estimator. Will be reset on new calls to fit, but increments across partial_fit calls. It should be not less than n_features_in_ if fit_intercept is False and not less than n_features_in_ + 1 if fit_intercept is True to obtain regression coefficients.

Type:

int

batch_size_

Inferred batch size from batch_size.

Type:

int

Note

Serializing instances of this class will trigger a forced finalization of calculations when the inputs are in a sycl queue or when using GPUs. Since (internal method) finalize_fit can’t be dispatched without directly provided queue and the dispatching policy can’t be serialized, the computation is finalized during serialization call and the policy is not saved in serialized data.

fit(X, y)[source]

Fit the model with X and y, using minibatches of size batch_size.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features. It is necessary for n_samples to be not less than n_features if fit_intercept is False and not less than n_features + 1 if fit_intercept is True

  • y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values, where n_samples is the number of samples and n_targets is the number of targets.

Returns:

self – Returns the instance itself.

Return type:

IncrementalRidge

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

partial_fit(X, y, check_input=True)[source]

Incrementally fits the linear model with X and y. All of X and y is processed as a single batch.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.

  • y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values, where n_samples is the number of samples and n_targets is the number of targets.

Returns:

self – Returns the instance itself.

Return type:

IncrementalRidge

predict(X, y=None)[source]

Predict using the linear model.

Parameters:

X (array-like or sparse matrix, shape (n_samples, n_features)) – Samples.

Returns:

C – Returns predicted values.

Return type:

array, shape (n_samples,)

score(X, y, sample_weight=None)[source]

Return the coefficient of determination of the prediction.

The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns:

score\(R^2\) of self.predict(X) w.r.t. y.

Return type:

float

Notes

The \(R^2\) score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score(). This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_partial_fit_request(*, check_input: bool | None | str = '$UNCHANGED$') IncrementalRidge

Request metadata passed to the partial_fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to partial_fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to partial_fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

check_input (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for check_input parameter in partial_fit.

Returns:

self – The updated object.

Return type:

object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') IncrementalRidge

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object