Non-Scikit-Learn Algorithms
Algorithms not presented in the original scikit-learn are described here. All algorithms are available for both CPU and GPU (including distributed mode)
- class sklearnex.basic_statistics.BasicStatistics(result_options='all', *, n_jobs=None)[source]
Bases:
ExtensionEstimator
,BaseEstimator
Estimator for basic statistics. Allows to compute basic statistics for provided data.
- Parameters:
result_options (string or list, default='all') – Used to set statistics to calculate. Possible values are
'min'
,'max'
,'sum'
,'mean'
,'variance'
,'variation'
,sum_squares'
,sum_squares_centered'
,'standard_deviation'
,'second_order_raw_moment'
or a list containing any of these values. If set to'all'
then all possible statistics will be calculated.n_jobs (int, default=None) – The number of jobs to use in parallel for the computation.
None
means using all physical cores unless in ajoblib.parallel_backend
context.-1
means using all logical cores. See Glossary for more details.
- min_
Minimum of each feature over all samples.
- Type:
ndarray of shape (n_features,)
- max_
Maximum of each feature over all samples.
- Type:
ndarray of shape (n_features,)
- sum_
Sum of each feature over all samples.
- Type:
ndarray of shape (n_features,)
- mean_
Mean of each feature over all samples.
- Type:
ndarray of shape (n_features,)
- variance_
Variance of each feature over all samples. Bessel’s correction is used.
- Type:
ndarray of shape (n_features,)
- variation_
Variation of each feature over all samples. Bessel’s correction is used.
- Type:
ndarray of shape (n_features,)
- sum_squares_
Sum of squares for each feature over all samples.
- Type:
ndarray of shape (n_features,)
- standard_deviation_
Unbiased standard deviation of each feature over all samples. Bessel’s correction is used.
- Type:
ndarray of shape (n_features,)
- sum_squares_centered_
Centered sum of squares for each feature over all samples.
- Type:
ndarray of shape (n_features,)
- second_order_raw_moment_
Second order moment of each feature over all samples.
- Type:
ndarray of shape (n_features,)
Note
Attribute exists only if corresponding result option has been provided.
Note
Names of attributes without the trailing underscore are supported currently but deprecated in 2025.1 and will be removed in 2026.0
Note
Some results can exhibit small variations due to floating point error accumulation and multithreading.
Examples
>>> import numpy as np >>> from sklearnex.basic_statistics import BasicStatistics >>> bs = BasicStatistics(result_options=['sum', 'min', 'max']) >>> X = np.array([[1, 2], [3, 4]]) >>> bs.fit(X) >>> bs.sum_ np.array([4., 6.]) >>> bs.min_ np.array([1., 2.])
- fit(X, y=None, sample_weight=None)[source]
Calculate statistics of X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Data for compute, where
n_samples
is the number of samples andn_features
is the number of features.y (Ignored) – Not used, present for API consistency by convention.
sample_weight (array-like of shape (n_samples,), default=None) – Weights for compute weighted statistics, where
n_samples
is the number of samples.
- Returns:
self – Returns the instance itself.
- Return type:
- get_metadata_routing()
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequest
encapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)
Get parameters for this estimator.
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') BasicStatistics
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- class sklearnex.basic_statistics.IncrementalBasicStatistics(result_options='all', batch_size=None, *, n_jobs=None)[source]
Bases:
ExtensionEstimator
,BaseEstimator
Calculates basic statistics on the given data, allows for computation when the data are split into batches. The user can use
partial_fit
method to provide a single batch of data or use thefit
method to provide the entire dataset.- Parameters:
result_options (string or list, default='all') – List of statistics to compute
batch_size (int, default=None) – The number of samples to use for each batch. Only used when calling
fit
. Ifbatch_size
isNone
, thenbatch_size
is inferred from the data and set to5 * n_features
.n_jobs (int, default=None) – The number of jobs to use in parallel for the computation.
None
means using all physical cores unless in ajoblib.parallel_backend
context.-1
means using all logical cores. See Glossary for more details.
- min_
Minimum of each feature over all samples.
- Type:
ndarray of shape (n_features,)
- max_
Maximum of each feature over all samples.
- Type:
ndarray of shape (n_features,)
- sum_
Sum of each feature over all samples.
- Type:
ndarray of shape (n_features,)
- mean_
Mean of each feature over all samples.
- Type:
ndarray of shape (n_features,)
- variance_
Variance of each feature over all samples.
- Type:
ndarray of shape (n_features,)
- variation_
Variation of each feature over all samples.
- Type:
ndarray of shape (n_features,)
- sum_squares_
Sum of squares for each feature over all samples.
- Type:
ndarray of shape (n_features,)
- standard_deviation_
Standard deviation of each feature over all samples.
- Type:
ndarray of shape (n_features,)
- sum_squares_centered_
Centered sum of squares for each feature over all samples.
- Type:
ndarray of shape (n_features,)
- second_order_raw_moment_
Second order moment of each feature over all samples.
- Type:
ndarray of shape (n_features,)
- n_samples_seen_
The number of samples processed by the estimator. Will be reset on new calls to
fit
, but increments acrosspartial_fit
calls.- Type:
Note
Attribute exists only if corresponding result option has been provided.
Note
Names of attributes without the trailing underscore are supported currently but deprecated in 2025.1 and will be removed in 2026.0
Note
Serializing instances of this class will trigger a forced finalization of calculations when the inputs are in a sycl queue or when using GPUs. Since (internal method) finalize_fit can’t be dispatched without directly provided queue and the dispatching policy can’t be serialized, the computation is finalized during serialization call and the policy is not saved in serialized data.
Examples
>>> import numpy as np >>> from sklearnex.basic_statistics import IncrementalBasicStatistics >>> incbs = IncrementalBasicStatistics(batch_size=1) >>> X = np.array([[1, 2], [3, 4]]) >>> incbs.partial_fit(X[:1]) >>> incbs.partial_fit(X[1:]) >>> incbs.sum_ np.array([4., 6.]) >>> incbs.min_ np.array([1., 2.]) >>> incbs.fit(X) >>> incbs.sum_ np.array([4., 6.]) >>> incbs.max_ np.array([3., 4.])
- fit(X, y=None, sample_weight=None)[source]
Calculate statistics of X using minibatches of size
batch_size
.- Parameters:
X (array-like of shape (n_samples, n_features)) – Data for compute, where
n_samples
is the number of samples andn_features
is the number of features.y (Ignored) – Not used, present for API consistency by convention.
sample_weight (array-like of shape (n_samples,), default=None) – Weights for compute weighted statistics, where
n_samples
is the number of samples.
- Returns:
self – Returns the instance itself.
- Return type:
- get_metadata_routing()
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequest
encapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)
Get parameters for this estimator.
- partial_fit(X, sample_weight=None, check_input=True)[source]
Incremental fit with X. All of X is processed as a single batch.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Data for compute, where
n_samples
is the number of samples andn_features
is the number of features.y (Ignored) – Not used, present for API consistency by convention.
sample_weight (array-like of shape (n_samples,), default=None) – Weights for compute weighted statistics, where
n_samples
is the number of samples.check_input (bool, default=True) – Run
check_array
on X.
- Returns:
self – Returns the instance itself.
- Return type:
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') IncrementalBasicStatistics
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_partial_fit_request(*, check_input: bool | None | str = '$UNCHANGED$', sample_weight: bool | None | str = '$UNCHANGED$') IncrementalBasicStatistics
Request metadata passed to the
partial_fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed topartial_fit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it topartial_fit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
check_input (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
check_input
parameter inpartial_fit
.sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inpartial_fit
.
- Returns:
self – The updated object.
- Return type:
- class sklearnex.covariance.IncrementalEmpiricalCovariance(*, store_precision=False, assume_centered=False, batch_size=None, copy=True, n_jobs=None)[source]
Bases:
ExtensionEstimator
,BaseEstimator
Maximum likelihood covariance estimator that allows for the estimation when the data are split into batches. The user can use the
partial_fit
method to provide a single batch of data or use thefit
method to provide the entire dataset.- Parameters:
store_precision (bool, default=False) – Specifies if the estimated precision is stored.
assume_centered (bool, default=False) – If True, data are not centered before computation. Useful when working with data whose mean is almost, but not exactly zero. If False (default), data are centered before computation.
batch_size (int, default=None) – The number of samples to use for each batch. Only used when calling
fit
. Ifbatch_size
isNone
, thenbatch_size
is inferred from the data and set to5 * n_features
, to provide a balance between approximation accuracy and memory consumption.copy (bool, default=True) – If False, X will be overwritten.
copy=False
can be used to save memory but is unsafe for general use.n_jobs (int, default=None) – The number of jobs to use in parallel for the computation.
None
means using all physical cores unless in ajoblib.parallel_backend
context.-1
means using all logical cores. See Glossary for more details.
- location_
Estimated location, i.e. the estimated mean.
- Type:
ndarray of shape (n_features,)
- covariance_
Estimated covariance matrix
- Type:
ndarray of shape (n_features, n_features)
- n_samples_seen_
The number of samples processed by the estimator. Will be reset on new calls to
fit
, but increments acrosspartial_fit
calls.- Type:
Note
Serializing instances of this class will trigger a forced finalization of calculations when the inputs are in a sycl queue or when using GPUs. Since (internal method) finalize_fit can’t be dispatched without directly provided queue and the dispatching policy can’t be serialized, the computation is finalized during serialization call and the policy is not saved in serialized data.
Examples
>>> import numpy as np >>> from sklearnex.covariance import IncrementalEmpiricalCovariance >>> inccov = IncrementalEmpiricalCovariance(batch_size=1) >>> X = np.array([[1, 2], [3, 4]]) >>> inccov.partial_fit(X[:1]) >>> inccov.partial_fit(X[1:]) >>> inccov.covariance_ np.array([[1., 1.],[1., 1.]]) >>> inccov.location_ np.array([2., 3.]) >>> inccov.fit(X) >>> inccov.covariance_ np.array([[1., 1.],[1., 1.]]) >>> inccov.location_ np.array([2., 3.])
- error_norm(comp_cov, norm='frobenius', scaling=True, squared=True)
Compute the Mean Squared Error between two covariance estimators.
- Parameters:
comp_cov (array-like of shape (n_features, n_features)) – The covariance to compare with.
norm ({"frobenius", "spectral"}, default="frobenius") – The type of norm used to compute the error. Available error types: - ‘frobenius’ (default): sqrt(tr(A^t.A)) - ‘spectral’: sqrt(max(eigenvalues(A^t.A)) where A is the error
(comp_cov - self.covariance_)
.scaling (bool, default=True) – If True (default), the squared error norm is divided by n_features. If False, the squared error norm is not rescaled.
squared (bool, default=True) – Whether to compute the squared error norm or the error norm. If True (default), the squared error norm is returned. If False, the error norm is returned.
- Returns:
result – The Mean Squared Error (in the sense of the Frobenius norm) between self and comp_cov covariance estimators.
- Return type:
- fit(X, y=None)[source]
Fit the model with X, using minibatches of size
batch_size
.- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.
y (Ignored) – Not used, present for API consistency by convention.
- Returns:
self – Returns the instance itself.
- Return type:
- get_metadata_routing()
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequest
encapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)
Get parameters for this estimator.
- get_precision()
Getter for the precision matrix.
- Returns:
precision_ – The precision matrix associated to the current covariance object.
- Return type:
array-like of shape (n_features, n_features)
- mahalanobis(X)[source]
Compute the squared Mahalanobis distances of given observations.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The observations, the Mahalanobis distances of the which we compute. Observations are assumed to be drawn from the same distribution than the data used in fit.
- Returns:
dist – Squared Mahalanobis distances of the observations.
- Return type:
ndarray of shape (n_samples,)
- partial_fit(X, y=None, check_input=True)[source]
Incremental fit with X. All of X is processed as a single batch.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.
y (Ignored) – Not used, present for API consistency by convention.
check_input (bool, default=True) – Run check_array on X.
- Returns:
self – Returns the instance itself.
- Return type:
- score(X_test, y=None)[source]
Compute the log-likelihood of X_test under the estimated Gaussian model.
The Gaussian model is defined by its mean and covariance matrix which are represented respectively by self.location_ and self.covariance_.
- Parameters:
X_test (array-like of shape (n_samples, n_features)) – Test data of which we compute the likelihood, where n_samples is the number of samples and n_features is the number of features. X_test is assumed to be drawn from the same distribution than the data used in fit (including centering).
y (Ignored) – Not used, present for API consistency by convention.
- Returns:
res – The log-likelihood of X_test with self.location_ and self.covariance_ as estimators of the Gaussian model mean and covariance matrix respectively.
- Return type:
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_partial_fit_request(*, check_input: bool | None | str = '$UNCHANGED$') IncrementalEmpiricalCovariance
Request metadata passed to the
partial_fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed topartial_fit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it topartial_fit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- set_score_request(*, X_test: bool | None | str = '$UNCHANGED$') IncrementalEmpiricalCovariance
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- class sklearnex.linear_model.IncrementalLinearRegression(*, fit_intercept=True, copy_X=True, n_jobs=None, batch_size=None)[source]
Bases:
ExtensionEstimator
,MultiOutputMixin
,RegressorMixin
,BaseEstimator
Trains a linear regression model, allows for computation if the data are split into batches. The user can use the
partial_fit
method to provide a single batch of data or use thefit
method to provide the entire dataset.- Parameters:
fit_intercept (bool, default=True) – Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).
copy_X (bool, default=True) – If True, X will be copied; else, it may be overwritten.
n_jobs (int, default=None) – The number of jobs to use for the computation.
batch_size (int, default=None) – The number of samples to use for each batch. Only used when calling
fit
. Ifbatch_size
isNone
, thenbatch_size
is inferred from the data and set to5 * n_features
.
- coef_
Estimated coefficients for the linear regression problem. If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features), while if only one target is passed, this is a 1D array of length n_features.
- Type:
array of shape (n_features, ) or (n_targets, n_features)
- intercept_
Independent term in the linear model. Set to 0.0 if fit_intercept = False.
- Type:
float or array of shape (n_targets,)
- n_samples_seen_
The number of samples processed by the estimator. Will be reset on new calls to
fit
, but increments acrosspartial_fit
calls. It should be not less than n_features_in_ if fit_intercept is False and not less than n_features_in_ + 1 if fit_intercept is True to obtain regression coefficients.- Type:
Note
Serializing instances of this class will trigger a forced finalization of calculations when the inputs are in a sycl queue or when using GPUs. Since (internal method) finalize_fit can’t be dispatched without directly provided queue and the dispatching policy can’t be serialized, the computation is finalized during serialization call and the policy is not saved in serialized data.
Examples
>>> import numpy as np >>> from sklearnex.linear_model import IncrementalLinearRegression >>> inclr = IncrementalLinearRegression(batch_size=2) >>> X = np.array([[1, 2], [3, 4], [5, 6], [7, 10]]) >>> y = np.array([1.5, 3.5, 5.5, 8.5]) >>> inclr.partial_fit(X[:2], y[:2]) >>> inclr.partial_fit(X[2:], y[2:]) >>> inclr.coef_ np.array([0.5., 0.5.]) >>> inclr.intercept_ np.array(0.) >>> inclr.fit(X) >>> inclr.coef_ np.array([0.5., 0.5.]) >>> inclr.intercept_ np.array(0.)
- fit(X, y)[source]
Fit the model with X and y, using minibatches of size
batch_size
.- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data, where
n_samples
is the number of samples andn_features
is the number of features. It is necessary forn_samples
to be not less thann_features
iffit_intercept
is False and not less thann_features + 1
iffit_intercept
is Truey (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values, where
n_samples
is the number of samples andn_targets
is the number of targets.
- Returns:
self – Returns the instance itself.
- Return type:
- get_metadata_routing()
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequest
encapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)
Get parameters for this estimator.
- partial_fit(X, y, check_input=True)[source]
Incremental fit linear model with X and y. All of X and y is processed as a single batch.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data, where
n_samples
is the number of samples and n_features is the number of features.y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values, where
n_samples
is the number of samples andn_targets
is the number of targets.
- Returns:
self – Returns the instance itself.
- Return type:
- predict(X, y=None)[source]
Predict using the linear model.
- Parameters:
X (array-like or sparse matrix, shape (n_samples, n_features)) – Samples.
- Returns:
C – Returns predicted values.
- Return type:
array, shape (n_samples,)
- score(X, y, sample_weight=None)[source]
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns:
score – \(R^2\) of
self.predict(X)
w.r.t. y.- Return type:
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_partial_fit_request(*, check_input: bool | None | str = '$UNCHANGED$') IncrementalLinearRegression
Request metadata passed to the
partial_fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed topartial_fit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it topartial_fit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') IncrementalLinearRegression
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- class sklearnex.linear_model.IncrementalRidge(fit_intercept=True, alpha=1.0, copy_X=True, n_jobs=None, batch_size=None)[source]
Bases:
ExtensionEstimator
,MultiOutputMixin
,RegressorMixin
,BaseEstimator
Incremental estimator for Ridge Regression. Allows to train Ridge Regression if data is splitted into batches.
- Parameters:
fit_intercept (bool, default=True) – Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).
alpha (float, default=1.0) – Regularization strength; must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization.
copy_X (bool, default=True) – If True, X will be copied; else, it may be overwritten.
n_jobs (int, default=None) – The number of jobs to use for the computation.
batch_size (int, default=None) – The number of samples to use for each batch. Only used when calling
fit
. Ifbatch_size
isNone
, thenbatch_size
is inferred from the data and set to5 * n_features
, to provide a balance between approximation accuracy and memory consumption.
- coef_
Estimated coefficients for the ridge regression problem. If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features), while if only one target is passed, this is a 1D array of length n_features.
- Type:
array of shape (n_features, ) or (n_targets, n_features)
- intercept_
Independent term in the linear model. Set to 0.0 if fit_intercept = False.
- Type:
float or array of shape (n_targets,)
- n_samples_seen_
The number of samples processed by the estimator. Will be reset on new calls to fit, but increments across
partial_fit
calls. It should be not less than n_features_in_ if fit_intercept is False and not less than n_features_in_ + 1 if fit_intercept is True to obtain regression coefficients.- Type:
Note
Serializing instances of this class will trigger a forced finalization of calculations when the inputs are in a sycl queue or when using GPUs. Since (internal method) finalize_fit can’t be dispatched without directly provided queue and the dispatching policy can’t be serialized, the computation is finalized during serialization call and the policy is not saved in serialized data.
- fit(X, y)[source]
Fit the model with X and y, using minibatches of size
batch_size
.- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features. It is necessary for n_samples to be not less than n_features if fit_intercept is False and not less than n_features + 1 if fit_intercept is True
y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values, where n_samples is the number of samples and n_targets is the number of targets.
- Returns:
self – Returns the instance itself.
- Return type:
- get_metadata_routing()
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequest
encapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)
Get parameters for this estimator.
- partial_fit(X, y, check_input=True)[source]
Incrementally fits the linear model with X and y. All of X and y is processed as a single batch.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.
y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values, where n_samples is the number of samples and n_targets is the number of targets.
- Returns:
self – Returns the instance itself.
- Return type:
- predict(X, y=None)[source]
Predict using the linear model.
- Parameters:
X (array-like or sparse matrix, shape (n_samples, n_features)) – Samples.
- Returns:
C – Returns predicted values.
- Return type:
array, shape (n_samples,)
- score(X, y, sample_weight=None)[source]
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns:
score – \(R^2\) of
self.predict(X)
w.r.t. y.- Return type:
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_partial_fit_request(*, check_input: bool | None | str = '$UNCHANGED$') IncrementalRidge
Request metadata passed to the
partial_fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed topartial_fit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it topartial_fit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') IncrementalRidge
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.