Non-Scikit-Learn Algorithms
Algorithms not presented in the original scikit-learn are described here. All algorithms are available for both CPU and GPU (including distributed mode)
BasicStatistics
- class sklearnex.basic_statistics.BasicStatistics(result_options='all', *, n_jobs=None)[source]
Estimator for basic statistics. Allows to compute basic statistics for provided data.
- Parameters:
result_options (string or list, default='all') – Used to set statistics to calculate. Possible values are
'min'
,'max'
,'sum'
,'mean'
,'variance'
,'variation'
,sum_squares'
,sum_squares_centered'
,'standard_deviation'
,'second_order_raw_moment'
or a list containing any of these values. If set to'all'
then all possible statistics will be calculated.n_jobs (int, default=None) – The number of jobs to use in parallel for the computation.
None
means using all physical cores unless in ajoblib.parallel_backend
context.-1
means using all logical cores. See Glossary for more details.
- min_
Minimum of each feature over all samples.
- Type:
ndarray of shape (n_features,)
- max_
Maximum of each feature over all samples.
- Type:
ndarray of shape (n_features,)
- sum_
Sum of each feature over all samples.
- Type:
ndarray of shape (n_features,)
- mean_
Mean of each feature over all samples.
- Type:
ndarray of shape (n_features,)
- variance_
Variance of each feature over all samples.
- Type:
ndarray of shape (n_features,)
- variation_
Variation of each feature over all samples.
- Type:
ndarray of shape (n_features,)
- sum_squares_
Sum of squares for each feature over all samples.
- Type:
ndarray of shape (n_features,)
- standard_deviation_
Standard deviation of each feature over all samples.
- Type:
ndarray of shape (n_features,)
- sum_squares_centered_
Centered sum of squares for each feature over all samples.
- Type:
ndarray of shape (n_features,)
- second_order_raw_moment_
Second order moment of each feature over all samples.
- Type:
ndarray of shape (n_features,)
Note
Attribute exists only if corresponding result option has been provided.
Note
Names of attributes without the trailing underscore are supported currently but deprecated in 2025.1 and will be removed in 2026.0
Note
Some results can exhibit small variations due to floating point error accumulation and multithreading.
Examples
>>> import numpy as np >>> from sklearnex.basic_statistics import BasicStatistics >>> bs = BasicStatistics(result_options=['sum', 'min', 'max']) >>> X = np.array([[1, 2], [3, 4]]) >>> bs.fit(X) >>> bs.sum_ np.array([4., 6.]) >>> bs.min_ np.array([1., 2.])
- BasicStatistics.fit(X, y=None, *, sample_weight=None)[source]
Calculate statistics of X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Data for compute, where
n_samples
is the number of samples andn_features
is the number of features.y (Ignored) – Not used, present for API consistency by convention.
sample_weight (array-like of shape (n_samples,), default=None) – Weights for compute weighted statistics, where
n_samples
is the number of samples.
- Returns:
self – Returns the instance itself.
- Return type:
IncrementalBasicStatistics
- class sklearnex.basic_statistics.IncrementalBasicStatistics(result_options='all', batch_size=None, *, n_jobs=None)[source]
Calculates basic statistics on the given data, allows for computation when the data are split into batches. The user can use
partial_fit
method to provide a single batch of data or use thefit
method to provide the entire dataset.- Parameters:
result_options (string or list, default='all') – List of statistics to compute
batch_size (int, default=None) – The number of samples to use for each batch. Only used when calling
fit
. Ifbatch_size
isNone
, thenbatch_size
is inferred from the data and set to5 * n_features
.n_jobs (int, default=None) – The number of jobs to use in parallel for the computation.
None
means using all physical cores unless in ajoblib.parallel_backend
context.-1
means using all logical cores. See Glossary for more details.
- min_
Minimum of each feature over all samples.
- Type:
ndarray of shape (n_features,)
- max_
Maximum of each feature over all samples.
- Type:
ndarray of shape (n_features,)
- sum_
Sum of each feature over all samples.
- Type:
ndarray of shape (n_features,)
- mean_
Mean of each feature over all samples.
- Type:
ndarray of shape (n_features,)
- variance_
Variance of each feature over all samples.
- Type:
ndarray of shape (n_features,)
- variation_
Variation of each feature over all samples.
- Type:
ndarray of shape (n_features,)
- sum_squares_
Sum of squares for each feature over all samples.
- Type:
ndarray of shape (n_features,)
- standard_deviation_
Standard deviation of each feature over all samples.
- Type:
ndarray of shape (n_features,)
- sum_squares_centered_
Centered sum of squares for each feature over all samples.
- Type:
ndarray of shape (n_features,)
- second_order_raw_moment_
Second order moment of each feature over all samples.
- Type:
ndarray of shape (n_features,)
- n_samples_seen_
The number of samples processed by the estimator. Will be reset on new calls to
fit
, but increments acrosspartial_fit
calls.- Type:
Note
Attribute exists only if corresponding result option has been provided.
Note
Names of attributes without the trailing underscore are supported currently but deprecated in 2025.1 and will be removed in 2026.0
Examples
>>> import numpy as np >>> from sklearnex.basic_statistics import IncrementalBasicStatistics >>> incbs = IncrementalBasicStatistics(batch_size=1) >>> X = np.array([[1, 2], [3, 4]]) >>> incbs.partial_fit(X[:1]) >>> incbs.partial_fit(X[1:]) >>> incbs.sum_ np.array([4., 6.]) >>> incbs.min_ np.array([1., 2.]) >>> incbs.fit(X) >>> incbs.sum_ np.array([4., 6.]) >>> incbs.max_ np.array([3., 4.])
- IncrementalBasicStatistics.fit(X, y=None, sample_weight=None)[source]
Calculate statistics of X using minibatches of size batch_size.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Data for compute, where
n_samples
is the number of samples andn_features
is the number of features.y (Ignored) – Not used, present for API consistency by convention.
sample_weight (array-like of shape (n_samples,), default=None) – Weights for compute weighted statistics, where
n_samples
is the number of samples.
- Returns:
self – Returns the instance itself.
- Return type:
- IncrementalBasicStatistics.partial_fit(X, sample_weight=None, check_input=True)[source]
Incremental fit with X. All of X is processed as a single batch.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Data for compute, where
n_samples
is the number of samples andn_features
is the number of features.y (Ignored) – Not used, present for API consistency by convention.
sample_weight (array-like of shape (n_samples,), default=None) – Weights for compute weighted statistics, where
n_samples
is the number of samples.check_input (bool, default=True) – Run
check_array
on X.
- Returns:
self – Returns the instance itself.
- Return type:
IncrementalEmpiricalCovariance
- class sklearnex.covariance.IncrementalEmpiricalCovariance(*, store_precision=False, assume_centered=False, batch_size=None, copy=True, n_jobs=None)[source]
Maximum likelihood covariance estimator that allows for the estimation when the data are split into batches. The user can use the
partial_fit
method to provide a single batch of data or use thefit
method to provide the entire dataset.- Parameters:
store_precision (bool, default=False) – Specifies if the estimated precision is stored.
assume_centered (bool, default=False) – If True, data are not centered before computation. Useful when working with data whose mean is almost, but not exactly zero. If False (default), data are centered before computation.
batch_size (int, default=None) – The number of samples to use for each batch. Only used when calling
fit
. Ifbatch_size
isNone
, thenbatch_size
is inferred from the data and set to5 * n_features
, to provide a balance between approximation accuracy and memory consumption.copy (bool, default=True) – If False, X will be overwritten.
copy=False
can be used to save memory but is unsafe for general use.n_jobs (int, default=None) – The number of jobs to use in parallel for the computation.
None
means using all physical cores unless in ajoblib.parallel_backend
context.-1
means using all logical cores. See Glossary for more details.
- location_
Estimated location, i.e. the estimated mean.
- Type:
ndarray of shape (n_features,)
- covariance_
Estimated covariance matrix
- Type:
ndarray of shape (n_features, n_features)
- n_samples_seen_
The number of samples processed by the estimator. Will be reset on new calls to
fit
, but increments acrosspartial_fit
calls.- Type:
Examples
>>> import numpy as np >>> from sklearnex.covariance import IncrementalEmpiricalCovariance >>> inccov = IncrementalEmpiricalCovariance(batch_size=1) >>> X = np.array([[1, 2], [3, 4]]) >>> inccov.partial_fit(X[:1]) >>> inccov.partial_fit(X[1:]) >>> inccov.covariance_ np.array([[1., 1.],[1., 1.]]) >>> inccov.location_ np.array([2., 3.]) >>> inccov.fit(X) >>> inccov.covariance_ np.array([[1., 1.],[1., 1.]]) >>> inccov.location_ np.array([2., 3.])
- IncrementalEmpiricalCovariance.fit(X, y=None)[source]
Fit the model with X, using minibatches of size batch_size.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.
y (Ignored) – Not used, present for API consistency by convention.
- Returns:
self – Returns the instance itself.
- Return type:
- IncrementalEmpiricalCovariance.partial_fit(X, y=None, check_input=True)[source]
Incremental fit with X. All of X is processed as a single batch.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.
y (Ignored) – Not used, present for API consistency by convention.
check_input (bool, default=True) – Run check_array on X.
- Returns:
self – Returns the instance itself.
- Return type:
IncrementalLinearRegression
- class sklearnex.linear_model.IncrementalLinearRegression(*, fit_intercept=True, copy_X=True, n_jobs=None, batch_size=None)[source]
Trains a linear regression model, allows for computation if the data are split into batches. The user can use the
partial_fit
method to provide a single batch of data or use thefit
method to provide the entire dataset.- Parameters:
fit_intercept (bool, default=True)
set (Whether to calculate the intercept for this model. If)
False (to)
calculations (no intercept will be used in)
centered). ((i.e. data is expected to be)
copy_X (bool, default=True) – If True, X will be copied; else, it may be overwritten.
n_jobs (int, default=None) – The number of jobs to use for the computation.
batch_size (int, default=None) – The number of samples to use for each batch. Only used when calling
fit
. Ifbatch_size
isNone
, thenbatch_size
is inferred from the data and set to5 * n_features
.
- coef_
Estimated coefficients for the linear regression problem. If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features), while if only one target is passed, this is a 1D array of length n_features.
- Type:
array of shape (n_features, ) or (n_targets, n_features)
- intercept_
Independent term in the linear model. Set to 0.0 if fit_intercept = False.
- Type:
float or array of shape (n_targets,)
- n_samples_seen_
The number of samples processed by the estimator. Will be reset on new calls to
fit
, but increments acrosspartial_fit
calls. It should be not less than n_features_in_ if fit_intercept is False and not less than n_features_in_ + 1 if fit_intercept is True to obtain regression coefficients.- Type:
Examples
>>> import numpy as np >>> from sklearnex.linear_model import IncrementalLinearRegression >>> inclr = IncrementalLinearRegression(batch_size=2) >>> X = np.array([[1, 2], [3, 4], [5, 6], [7, 10]]) >>> y = np.array([1.5, 3.5, 5.5, 8.5]) >>> inclr.partial_fit(X[:2], y[:2]) >>> inclr.partial_fit(X[2:], y[2:]) >>> inclr.coef_ np.array([0.5., 0.5.]) >>> inclr.intercept_ np.array(0.) >>> inclr.fit(X) >>> inclr.coef_ np.array([0.5., 0.5.]) >>> inclr.intercept_ np.array(0.)
- IncrementalLinearRegression.fit(X, y)[source]
Fit the model with X and y, using minibatches of size
batch_size
.- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data, where
n_samples
is the number of samples andn_features
is the number of features. It is necessary forn_samples
to be not less thann_features
iffit_intercept
is False and not less thann_features + 1
iffit_intercept
is Truey (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values, where
n_samples
is the number of samples andn_targets
is the number of targets.
- Returns:
self – Returns the instance itself.
- Return type:
- IncrementalLinearRegression.partial_fit(X, y, check_input=True)[source]
Incremental fit linear model with X and y. All of X and y is processed as a single batch.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data, where
n_samples
is the number of samples and n_features is the number of features.y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values, where
n_samples
is the number of samples andn_targets
is the number of targets.
- Returns:
self – Returns the instance itself.
- Return type:
- IncrementalLinearRegression.predict(X, y=None)[source]
Predict using the linear model.
- Parameters:
X (array-like or sparse matrix, shape (n_samples, n_features)) – Samples.
y (Ignored) – Not used, present for API consistency by convention.
- Returns:
C – Returns predicted values.
- Return type:
array, shape (n_samples, n_targets)