Non-Scikit-Learn Algorithms

Algorithms not presented in the original scikit-learn are described here. All algorithms are available for both CPU and GPU (including distributed mode).

Note

If using patching, these classes can be imported either from module sklearn or from module sklearnex.

class sklearnex.basic_statistics.BasicStatistics(result_options='all', *, n_jobs=None)[source]

Bases: oneDALEstimator, BaseEstimator

Estimator for basic statistics.

Compute low order moments and related statistics for given data.

Parameters:

result_options (str or list, default=str('all')) – Used to set statistics to calculate. Possible values are 'min', 'max', 'sum', 'mean', 'variance', 'variation', sum_squares', sum_squares_centered', 'standard_deviation', 'second_order_raw_moment' or a list containing any of these values. If set to 'all' then all possible statistics will be calculated.
n_jobs (int, default=None) – The number of jobs to use in parallel for the computation. None means using all physical cores unless in a joblib.parallel_backend context. -1 means using all logical cores. See Glossary for more details.

min_

Minimum of each feature over all samples.

Type:: ndarray of shape (n_features,)

max_

Maximum of each feature over all samples.

Type:: ndarray of shape (n_features,)

sum_

Sum of each feature over all samples.

Type:: ndarray of shape (n_features,)

mean_

Mean of each feature over all samples.

Type:: ndarray of shape (n_features,)

variance_

Variance of each feature over all samples. Bessel’s correction is used.

Type:: ndarray of shape (n_features,)

variation_

Variation of each feature over all samples. Bessel’s correction is used.

Type:: ndarray of shape (n_features,)

sum_squares_

Sum of squares for each feature over all samples.

Type:: ndarray of shape (n_features,)

standard_deviation_

Unbiased standard deviation of each feature over all samples. Bessel’s correction is used.

Type:: ndarray of shape (n_features,)

sum_squares_centered_

Centered sum of squares for each feature over all samples.

Type:: ndarray of shape (n_features,)

second_order_raw_moment_

Second order moment of each feature over all samples.

Type:: ndarray of shape (n_features,)

Notes

Attribute exists only if corresponding result option has been provided.

Names of attributes without the trailing underscore are supported currently but deprecated in 2025.1 and will be removed in 2026.0

Some results can exhibit small variations due to floating point error accumulation and multithreading.

Examples

>>> import numpy as np
>>> from sklearnex.basic_statistics import BasicStatistics
>>> bs = BasicStatistics(result_options=['sum', 'min', 'max'])
>>> X = np.array([[1, 2], [3, 4]])
>>> bs.fit(X)
>>> bs.sum_
np.array([4., 6.])
>>> bs.min_
np.array([1., 2.])

fit(X, y=None, sample_weight=None)[source]

Calculate statistics of X.

Parameters:

X (array-like of shape (n_samples, n_features)) – Data for compute, where n_samples is the number of samples and n_features is the number of features.
y (Ignored) – Not used, present for API consistency by convention.
sample_weight (array-like of shape (n_samples,), default=None) – Weights for compute weighted statistics, where n_samples is the number of samples.

Returns:

self – Returns the instance itself.

Return type:

object

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:: routing – A MetadataRequest encapsulating routing information.
Return type:: MetadataRequest

get_params(deep=True)

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params – Parameter names mapped to their values.
Return type:: dict

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → BasicStatistics

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.
Returns:: self – The updated object.
Return type:: object

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:: **params (dict) – Estimator parameters.
Returns:: self – Estimator instance.
Return type:: estimator instance

class sklearnex.basic_statistics.IncrementalBasicStatistics(result_options='all', batch_size=None, *, n_jobs=None)[source]

Bases: oneDALEstimator, BaseEstimator

Incremental estimator for basic statistics.

Calculates basic statistics on the given data, allows for computation when the data are split into batches. The user can use partial_fit method to provide a single batch of data or use the fit method to provide the entire dataset.

Parameters:

result_options (str or list, default=str('all')) – List of statistics to compute.
batch_size (int, default=None) – The number of samples to use for each batch. Only used when calling fit. If batch_size is None, then batch_size is inferred from the data and set to 5 * n_features.
n_jobs (int, default=None) – The number of jobs to use in parallel for the computation. None means using all physical cores unless in a joblib.parallel_backend context. -1 means using all logical cores. See Glossary for more details.

min_

Minimum of each feature over all samples.

Type:: ndarray of shape (n_features,)

max_

Maximum of each feature over all samples.

Type:: ndarray of shape (n_features,)

sum_

Sum of each feature over all samples.

Type:: ndarray of shape (n_features,)

mean_

Mean of each feature over all samples.

Type:: ndarray of shape (n_features,)

variance_

Variance of each feature over all samples.

Type:: ndarray of shape (n_features,)

variation_

Variation of each feature over all samples.

Type:: ndarray of shape (n_features,)

sum_squares_

Sum of squares for each feature over all samples.

Type:: ndarray of shape (n_features,)

standard_deviation_

Standard deviation of each feature over all samples.

Type:: ndarray of shape (n_features,)

sum_squares_centered_

Centered sum of squares for each feature over all samples.

Type:: ndarray of shape (n_features,)

second_order_raw_moment_

Second order moment of each feature over all samples.

Type:: ndarray of shape (n_features,)

n_samples_seen_

The number of samples processed by the estimator. Will be reset on new calls to fit, but increments across partial_fit calls.

Type:: int

batch_size_

Inferred batch size from batch_size.

Type:: int

n_features_in_

Number of features seen during fit or partial_fit.

Type:: int

Notes

Attribute exists only if corresponding result option has been provided.

Names of attributes without the trailing underscore are supported currently but deprecated in 2025.1 and will be removed in 2026.0.

Sparse data formats are not supported. Input dtype must be float32 or float64.

Note

Serializing instances of this class will trigger a forced finalization of calculations when the inputs are in a sycl queue or when using GPUs. Since (internal method) finalize_fit can’t be dispatched without directly provided queue and the dispatching policy can’t be serialized, the computation is finalized during serialization call and the policy is not saved in serialized data.

Examples

>>> import numpy as np
>>> from sklearnex.basic_statistics import IncrementalBasicStatistics
>>> incbs = IncrementalBasicStatistics(batch_size=1)
>>> X = np.array([[1, 2], [3, 4]])
>>> incbs.partial_fit(X[:1])
>>> incbs.partial_fit(X[1:])
>>> incbs.sum_
np.array([4., 6.])
>>> incbs.min_
np.array([1., 2.])
>>> incbs.fit(X)
>>> incbs.sum_
np.array([4., 6.])
>>> incbs.max_
np.array([3., 4.])

fit(X, y=None, sample_weight=None)[source]

Calculate statistics of X using minibatches of size batch_size.

Parameters:

X (array-like of shape (n_samples, n_features)) – Data for compute, where n_samples is the number of samples and n_features is the number of features.
y (Ignored) – Not used, present for API consistency by convention.
sample_weight (array-like of shape (n_samples,), default=None) – Weights for compute weighted statistics, where n_samples is the number of samples.

Returns:

self – Returns the instance itself.

Return type:

IncrementalBasicStatistics

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:: routing – A MetadataRequest encapsulating routing information.
Return type:: MetadataRequest

get_params(deep=True)

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params – Parameter names mapped to their values.
Return type:: dict

partial_fit(X, sample_weight=None, check_input=True)[source]

Incremental fit with X. All of X is processed as a single batch.

Parameters:

X (array-like of shape (n_samples, n_features)) – Data for compute, where n_samples is the number of samples and n_features is the number of features.
y (Ignored) – Not used, present for API consistency by convention.
sample_weight (array-like of shape (n_samples,), default=None) – Weights for compute weighted statistics, where n_samples is the number of samples.
check_input (bool, default=True) – Run check_array on X.

Returns:

self – Returns the instance itself.

Return type:

IncrementalBasicStatistics

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → IncrementalBasicStatistics

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.
Returns:: self – The updated object.
Return type:: object

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:: **params (dict) – Estimator parameters.
Returns:: self – Estimator instance.
Return type:: estimator instance

set_partial_fit_request(*, check_input: bool | None | str = '$UNCHANGED$', sample_weight: bool | None | str = '$UNCHANGED$') → IncrementalBasicStatistics

Request metadata passed to the partial_fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to partial_fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to partial_fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

check_input (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for check_input parameter in partial_fit.
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in partial_fit.

Returns:

self – The updated object.

Return type:

object

class sklearnex.covariance.IncrementalEmpiricalCovariance(*, store_precision=False, assume_centered=False, batch_size=None, copy=True, n_jobs=None)[source]

Bases: oneDALEstimator, BaseEstimator

Incremental maximum likelihood covariance estimator.

Estimator that allows for the estimation when the data are split into batches. The user can use the partial_fit method to provide a single batch of data or use the fit method to provide the entire dataset.

Parameters:

store_precision (bool, default=False) – Specifies if the estimated precision is stored.
assume_centered (bool, default=False) – If True, data are not centered before computation. Useful when working with data whose mean is almost, but not exactly zero. If False (default), data are centered before computation.
batch_size (int, default=None) – The number of samples to use for each batch. Only used when calling fit. If batch_size is None, then batch_size is inferred from the data and set to 5 * n_features, to provide a balance between approximation accuracy and memory consumption.
copy (bool, default=True) – If False, X will be overwritten. copy=False can be used to save memory but is unsafe for general use.
n_jobs (int, default=None) – The number of jobs to use in parallel for the computation. None means using all physical cores unless in a joblib.parallel_backend context. -1 means using all logical cores. See Glossary for more details.

location_

Estimated location, i.e. the estimated mean.

Type:: ndarray of shape (n_features,)

covariance_

Estimated covariance matrix

Type:: ndarray of shape (n_features, n_features)

n_samples_seen_

The number of samples processed by the estimator. Will be reset on new calls to fit, but increments across partial_fit calls.

Type:: int

batch_size_

Inferred batch size from batch_size.

Type:: int

n_features_in_

Number of features seen during fit or partial_fit.

Type:: int

Notes

Sparse data formats are not supported. Input dtype must be float32 or float64.

Note

Serializing instances of this class will trigger a forced finalization of calculations when the inputs are in a sycl queue or when using GPUs. Since (internal method) finalize_fit can’t be dispatched without directly provided queue and the dispatching policy can’t be serialized, the computation is finalized during serialization call and the policy is not saved in serialized data.

Examples

>>> import numpy as np
>>> from sklearnex.covariance import IncrementalEmpiricalCovariance
>>> inccov = IncrementalEmpiricalCovariance(batch_size=1)
>>> X = np.array([[1, 2], [3, 4]])
>>> inccov.partial_fit(X[:1])
>>> inccov.partial_fit(X[1:])
>>> inccov.covariance_
np.array([[1., 1.],[1., 1.]])
>>> inccov.location_
np.array([2., 3.])
>>> inccov.fit(X)
>>> inccov.covariance_
np.array([[1., 1.],[1., 1.]])
>>> inccov.location_
np.array([2., 3.])

error_norm(comp_cov, norm='frobenius', scaling=True, squared=True)

Compute the Mean Squared Error between two covariance estimators.

Parameters:

comp_cov (array-like of shape (n_features, n_features)) – The covariance to compare with.
norm ({"frobenius", "spectral"}, default="frobenius") – The type of norm used to compute the error. Available error types: - ‘frobenius’ (default): sqrt(tr(A^t.A)) - ‘spectral’: sqrt(max(eigenvalues(A^t.A)) where A is the error (comp_cov - self.covariance_).
scaling (bool, default=True) – If True (default), the squared error norm is divided by n_features. If False, the squared error norm is not rescaled.
squared (bool, default=True) – Whether to compute the squared error norm or the error norm. If True (default), the squared error norm is returned. If False, the error norm is returned.

Returns:

result – The Mean Squared Error (in the sense of the Frobenius norm) between self and comp_cov covariance estimators.

Return type:

float

fit(X, y=None)[source]

Fit the model with X, using minibatches of size batch_size.

Parameters:

X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.
y (Ignored) – Not used, present for API consistency by convention.

Returns:

self – Returns the instance itself.

Return type:

IncrementalEmpiricalCovariance

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:: routing – A MetadataRequest encapsulating routing information.
Return type:: MetadataRequest

get_params(deep=True)

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params – Parameter names mapped to their values.
Return type:: dict

get_precision()

Getter for the precision matrix.

Returns:: precision_ – The precision matrix associated to the current covariance object.
Return type:: array-like of shape (n_features, n_features)

mahalanobis(X)[source]

Compute the squared Mahalanobis distances of given observations.

Parameters:: X (array-like of shape (n_samples, n_features)) – The observations, the Mahalanobis distances of the which we compute. Observations are assumed to be drawn from the same distribution than the data used in fit.
Returns:: dist – Squared Mahalanobis distances of the observations.
Return type:: ndarray of shape (n_samples,)

partial_fit(X, y=None, check_input=True)[source]

Incremental fit with X. All of X is processed as a single batch.

Parameters:

X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.
y (Ignored) – Not used, present for API consistency by convention.
check_input (bool, default=True) – Run check_array on X.

Returns:

self – Returns the instance itself.

Return type:

IncrementalEmpiricalCovariance

score(X_test, y=None)[source]

Compute the log-likelihood of X_test under the estimated Gaussian model.

The Gaussian model is defined by its mean and covariance matrix which are represented respectively by self.location_ and self.covariance_.

Parameters:

X_test (array-like of shape (n_samples, n_features)) – Test data of which we compute the likelihood, where n_samples is the number of samples and n_features is the number of features. X_test is assumed to be drawn from the same distribution than the data used in fit (including centering).
y (Ignored) – Not used, present for API consistency by convention.

Returns:

res – The log-likelihood of X_test with self.location_ and self.covariance_ as estimators of the Gaussian model mean and covariance matrix respectively.

Return type:

float

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:: **params (dict) – Estimator parameters.
Returns:: self – Estimator instance.
Return type:: estimator instance

set_partial_fit_request(*, check_input: bool | None | str = '$UNCHANGED$') → IncrementalEmpiricalCovariance

Request metadata passed to the partial_fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to partial_fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to partial_fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: check_input (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for check_input parameter in partial_fit.
Returns:: self – The updated object.
Return type:: object

set_score_request(*, X_test: bool | None | str = '$UNCHANGED$') → IncrementalEmpiricalCovariance

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: X_test (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for X_test parameter in score.
Returns:: self – The updated object.
Return type:: object

class sklearnex.linear_model.IncrementalLinearRegression(*, fit_intercept=True, copy_X=True, n_jobs=None, batch_size=None)[source]

Bases: MultiOutputMixin, RegressorMixin, oneDALEstimator, BaseEstimator

Incremental Ordinary least squares Linear Regression.

Trains a linear regression model, allows for computation if the data are split into batches. The user can use the partial_fit method to provide a single batch of data or use the fit method to provide the entire dataset.

Parameters:

fit_intercept (bool, default=True) – Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).
copy_X (bool, default=True) – If True, X will be copied; else, it may be overwritten.
n_jobs (int, default=None) – The number of jobs to use for the computation.
batch_size (int, default=None) – The number of samples to use for each batch. Only used when calling fit. If batch_size is None, then batch_size is inferred from the data and set to 5 * n_features.

coef_

Estimated coefficients for the linear regression problem. If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features), while if only one target is passed, this is a 1D array of length n_features.

Type:: array of shape (n_features, ) or (n_targets, n_features)

intercept_

Independent term in the linear model. Set to 0.0 if fit_intercept = False.

Type:: float or array of shape (n_targets,)

n_samples_seen_

The number of samples processed by the estimator. Will be reset on new calls to fit, but increments across partial_fit calls. It should be not less than n_features_in_ if fit_intercept is False and not less than n_features_in_ + 1 if fit_intercept is True to obtain regression coefficients.

Type:: int

batch_size_

Inferred batch size from batch_size.

Type:: int

n_features_in_

Number of features seen during fit or partial_fit.

Type:: int

Notes

Sparse data formats are not supported. Input dtype must be float32 or float64.

Note

Serializing instances of this class will trigger a forced finalization of calculations when the inputs are in a sycl queue or when using GPUs. Since (internal method) finalize_fit can’t be dispatched without directly provided queue and the dispatching policy can’t be serialized, the computation is finalized during serialization call and the policy is not saved in serialized data.

Examples

>>> import numpy as np
>>> from sklearnex.linear_model import IncrementalLinearRegression
>>> inclr = IncrementalLinearRegression(batch_size=2)
>>> X = np.array([[1, 2], [3, 4], [5, 6], [7, 10]])
>>> y = np.array([1.5, 3.5, 5.5, 8.5])
>>> inclr.partial_fit(X[:2], y[:2])
>>> inclr.partial_fit(X[2:], y[2:])
>>> inclr.coef_
np.array([0.5., 0.5.])
>>> inclr.intercept_
np.array(0.)
>>> inclr.fit(X)
>>> inclr.coef_
np.array([0.5., 0.5.])
>>> inclr.intercept_
np.array(0.)

fit(X, y)[source]

Fit the model with X and y, using minibatches of size batch_size.

Parameters:

X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features. It is necessary for n_samples to be not less than n_features if fit_intercept is False and not less than n_features + 1 if fit_intercept is ‘True’.
y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values, where n_samples is the number of samples and n_targets is the number of targets.

Returns:

self – Returns the instance itself.

Return type:

IncrementalLinearRegression

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:: routing – A MetadataRequest encapsulating routing information.
Return type:: MetadataRequest

get_params(deep=True)

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params – Parameter names mapped to their values.
Return type:: dict

partial_fit(X, y, check_input=True)[source]

Incremental fit with X and y. X and y are processed as a single batch.

Parameters:

X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.
y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values, where n_samples is the number of samples and n_targets is the number of targets.
check_input (bool, default=True) – Run validate_data on X and y.

Returns:

self – Returns the instance itself.

Return type:

IncrementalLinearRegression

predict(X, y=None)[source]

Predict using the linear model.

Parameters:: X (array-like or sparse matrix, shape (n_samples, n_features)) – Samples.
Returns:: C – Returns predicted values.
Return type:: array, shape (n_samples,)

score(X, y, sample_weight=None)[source]

Return coefficient of determination on test data.

The coefficient of determination, $R^2$, is defined as $(1 - \frac{u}{v})$, where $u$ is the residual sum of squares ((y_true - y_pred)** 2).sum() and $v$ is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a $R^2$ score of 0.0.

Parameters:

X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns:

score – $R^2$ of self.predict(X) w.r.t. y.

Return type:

float

Notes

The $R^2$ score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score(). This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:: **params (dict) – Estimator parameters.
Returns:: self – Estimator instance.
Return type:: estimator instance

set_partial_fit_request(*, check_input: bool | None | str = '$UNCHANGED$') → IncrementalLinearRegression

Request metadata passed to the partial_fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to partial_fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to partial_fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: check_input (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for check_input parameter in partial_fit.
Returns:: self – The updated object.
Return type:: object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → IncrementalLinearRegression

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class sklearnex.linear_model.IncrementalRidge(fit_intercept=True, alpha=1.0, copy_X=True, n_jobs=None, batch_size=None)[source]

Bases: MultiOutputMixin, RegressorMixin, oneDALEstimator, BaseEstimator

Incremental estimator for Ridge Regression.

Allows to train Ridge Regression if data is split into batches.

Parameters:

fit_intercept (bool, default=True) – Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).
alpha (float, default=1.0) – Regularization strength; must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization.
copy_X (bool, default=True) – If True, X will be copied; else, it may be overwritten.
n_jobs (int, default=None) – The number of jobs to use for the computation.
batch_size (int, default=None) – The number of samples to use for each batch. Only used when calling fit. If batch_size is None, then batch_size is inferred from the data and set to 5 * n_features, to provide a balance between approximation accuracy and memory consumption.

coef_

Estimated coefficients for the ridge regression problem. If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features), while if only one target is passed, this is a 1D array of length n_features.

Type:: array of shape (n_features, ) or (n_targets, n_features)

intercept_

Independent term in the linear model. Set to 0.0 if fit_intercept = False.

Type:: float or array of shape (n_targets,)

n_features_in_

Number of features seen during fit.

Type:: int

n_samples_seen_

The number of samples processed by the estimator. Will be reset on new calls to fit, but increments across partial_fit calls. It should be not less than n_features_in_ if fit_intercept is False and not less than n_features_in_ + 1 if fit_intercept is True to obtain regression coefficients.

Type:: int

batch_size_

Inferred batch size from batch_size.

Type:: int

Notes

Sparse data formats are not supported. Input dtype must be float32 or float64.

Note

Serializing instances of this class will trigger a forced finalization of calculations when the inputs are in a sycl queue or when using GPUs. Since (internal method) finalize_fit can’t be dispatched without directly provided queue and the dispatching policy can’t be serialized, the computation is finalized during serialization call and the policy is not saved in serialized data.

fit(X, y)[source]

Fit the model with X and y, using minibatches of size batch_size.

Parameters:

X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features. It is necessary for n_samples to be not less than n_features if fit_intercept is False and not less than n_features + 1 if fit_intercept is True.
y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values, where n_samples is the number of samples and n_targets is the number of targets.

Returns:

self – Returns the instance itself.

Return type:

IncrementalRidge

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:: routing – A MetadataRequest encapsulating routing information.
Return type:: MetadataRequest

get_params(deep=True)

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params – Parameter names mapped to their values.
Return type:: dict

partial_fit(X, y, check_input=True)[source]

Incrementally fits with X and y. X and y are processed as a single batch.

Parameters:

X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.
y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values, where n_samples is the number of samples and n_targets is the number of targets.
check_input (bool, default=True) – Run validate_data on X and y.

Returns:

self – Returns the instance itself.

Return type:

IncrementalRidge

predict(X, y=None)[source]

Predict using the linear model.

Parameters:: X (array-like or sparse matrix, shape (n_samples, n_features)) – Samples.
Returns:: C – Returns predicted values.
Return type:: array, shape (n_samples,)

score(X, y, sample_weight=None)[source]

Return coefficient of determination on test data.

The coefficient of determination, $R^2$, is defined as $(1 - \frac{u}{v})$, where $u$ is the residual sum of squares ((y_true - y_pred)** 2).sum() and $v$ is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a $R^2$ score of 0.0.

Parameters:

X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns:

score – $R^2$ of self.predict(X) w.r.t. y.

Return type:

float

Notes

The $R^2$ score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score(). This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:: **params (dict) – Estimator parameters.
Returns:: self – Estimator instance.
Return type:: estimator instance

set_partial_fit_request(*, check_input: bool | None | str = '$UNCHANGED$') → IncrementalRidge

Request metadata passed to the partial_fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to partial_fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to partial_fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: check_input (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for check_input parameter in partial_fit.
Returns:: self – The updated object.
Return type:: object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → IncrementalRidge

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object