daal4py API Reference

This is the full documentation page for daal4py functions and classes. Note that for the most part, these are simple wrappers over equivalent functions and methods from the oneAPI Data Analytics Library. See also the documentation of DAAL interfaces for more details.

See About daal4py for an example of how to use daal4py algorithms.

Thread control

Documentation for functions that control the global thread settings in daal4py:

daal4py.daalinit(nthreads: int = -1) → None

Set number of threads for daal4py

This modifies the number of threads configured for daal4py, which is a global setting - meaning: it is applied to all subsequent calls to daal4py functions / methods in the Python process.

By default, if not otherwise configured, it will use the full number of threads available in the system.

Parameters:: nthreads (int) – [default: -1] Number of threads to use for further computations in daal4py. If this number is less or equal than zero, then settings will not be changed.
Return type:: None

daal4py.num_threads() → int

Gets number of threads configured for daal4py.

Note

The number of threads for daal4py is a global setting, which can be changed through daalinit.

Return type:: int

daal4py.enable_thread_pinning(enabled: bool = True) → None

Enable or disable thread pinning

This function enables or disables binding of the threads that are used to parallelize algorithms of the library to physical processing units for possible performance improvement. Improper use of the method can result in degradation of the application performance depending on the system (machine) topology, application, and operating system. By default, pinning is disabled.

Note

This is a global setting for daal4py.

Parameters:: enabled (bool) – [default: True] Whether to enable thread pinning or not.
Return type:: None

MPI helpers

Documentation for helper functions that can be used in distributed mode, particularly when using MPI without mpi4py. See Distributed mode (daal4py, CPU) for examples.

daal4py.daalfini() → None

Finalize MPI environment

When using distributed mode without mpi4py, this function must be called after the distributed computation calls before accessing the result object from the algorithm that was executed in distributed mode. It has no effect when the python process is not run through MPI (used for distributed mode).

This is a wrapper over MPI_Finalize. It does not need to be called if mpi4py was imported before, as mpi4py calls this function upon process exit.

Note that software mpi4py calls this function automatically if it is imported, but it only does so upon process exit, so this still needs to be called before accessing the result objects in the process/rank that will use them.

Return type:: None

daal4py.num_procs() → int

Get number of MPI processes (in distributed mode)

If the python process is not run though MPI, this function will always return 1.

This is a wrapper over MPI_Comm_size. Equivalent to mpi4py.MPI.Comm.Get_size, but does not require mpi4py to be installed.

Return type:: int

daal4py.my_procid() → int

Get MPI process rank

If the python process is not being run through MPI (used for distributed mode), this will always return zero.

This is a wrapper over MPI_Comm_rank. Equivalent to mpi4py.MPI.Comm.Get_rank, but does not require mpi4py to be installed.

Return type:: int

Model builders (GBT and LogReg serving)

Documentation for model builders, which allow computing fast predictions from GBT (gradient-boosted decision tree) models produced by other libraries. See article Serving GBT models from other libraries for examples.

daal4py.mb.convert_model(model) → GBTDAALModel | LogisticDAALModel[source]

Convert GBT or LogReg models to Daal4Py

This function can be used to convert machine learning models / estimators created through other libraries to daal4py classes which offer accelerated prediction methods.

It supports gradient-boosted decision tree ensembles (GBT) from the libraries xgboost, lightgbm, catboost, and treelite; and logistic regression (binary and multinomial) models from scikit-learn.

See the documentation of the classes daal4py.mb.GBTDAALModel and daal4py.mb.LogisticDAALModel for more details.

As an alternative to this function, models of a specific type (GBT or LogReg) can also be instantiated by calling those classes directly - for example, logistic regression models can be instantiated directly from fitted coefficients and intercepts, thereby allowing to work with models from libraries beyond scikit-learn.

Parameters:: model (fitted model object) – A fitted model object (either GBT or LogReg) from the supported libraries.
Returns:: obj – A daal4py model object of the corresponding class for the model type, which offers faster prediction methods.
Return type:: GBTDAALModel or LogisticDAALModel

class daal4py.mb.GBTDAALModel(model)[source]

Gradient Boosted Decision Tree Model

Model class offering accelerated predictions for gradient-boosted decision tree models from other libraries.

Objects of this class are meant to be initialized from GBT model objects created through other libraries, returning a different class which can calculate predictions faster than the original library that created said model.

Can be created from model objects that meet all of the following criteria:

Were produced from one of the following libraries: xgboost, lightgbm, catboost, or treelite (with some limitations). It can work with either the base booster classes of those libraries or with their scikit-learn-compatible classes.
Do not use categorical features.
Are for regression or classification (e.g. no ranking). In the case of XGBoost objective binary:logitraw, it will create a classification model out of it, and in the case of objective reg:logistic, will create a regression model.
Are not multi-output models. Note that multi-class classification is supported.
Are not multi-class random forests (multi-class gradient boosters are supported).

Note that while models from packages such as scikit-learn are not supported directly, they can still be converted to this class by first converting them to TreeLite and then converting to GBTDAALModel from that TreeLite model. In such case, note that models corresponding to random forest binary classifiers will be treated as regressors that predict probabilities.

Parameters:: model (booster object from another library) – The fitted GBT model from which this object will be created. See rest of the documentation for supported input types.

is_classifier_

Whether this is a classification model.

Type:: bool

is_regressor_

Whether this is a regression model.

Type:: bool

supports_shap_

Whether the model supports SHAP calculations.

Type:: bool

predict(X, pred_contribs: bool = False, pred_interactions: bool = False) → ndarray[source]

Compute model predictions on new data

Computes the predicted values of the response variable for new data given the features / covariates for each row.

In the case of classification models, this will output the most probable class (see predict_proba() for probability predictions), while in the case of regression models, will output values in the link scale (what XGBoost calls ‘margin’ and LightGBM calls ‘raw’).

Parameters:

X – The features covariates. Should be an array of shape [num_samples, num_features].
pred_contribs (bool) – Whether to predict feature contributions. Result should have shape [num_samples, num_features+1], with the last column corresponding to the intercept. See xgboost.Booster.predict for more details about this type of computation.
pred_interactions (bool) – Whether to predict feature interactions. Result should have shape [num_samples, num_features+1, num_features+1], with the last position across the last two dimensions corresponding to the intercept. See xgboost.Booster.predict for more details about this type of computation.

Return type:

np.ndarray

predict_proba(X) → ndarray[source]

Predict class probabilities

Computes the predicted probabilities of belonging to each class for each row in the input data given the features / covariates. Output shape is [num_samples, num_classes].

Parameters:: X – The features covariates. Should be an array of shape [num_samples, num_features].
Return type:: np.ndarray

class daal4py.mb.LogisticDAALModel(coefs, intercepts, dtype=<class 'numpy.float64'>)[source]

Logistic Regression Predictor

Creates a logistic regression or multionomial logistic regression model object which can calculate fast predictions of different types (classes, probabilities, logarithms of probabilities), from fitted coefficients and intercepts obtained elsewhere (such as from sklearn.linear_model.LogisticRegression), making the predictions either in double (np.float64) or single (np.float32) precision.

Classification

Note

All classification algorithms produce a result object of the same class, containing predicted probabilities, logarithm of the predicted probabilities, and most probable class.

Results class

class daal4py.classifier_prediction_result

Properties:

logProbabilities

Numpy array

Type:: type

prediction

Numpy array

Type:: type

probabilities

Numpy array

Type:: type

Decision Forest Classification

Parameters and semantics are described in oneAPI Data Analytics Library Classification Decision Forest.

Examples:

class daal4py.decision_forest_classification_training

Parameters:

nClasses (size_t) – Number of classes
fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for Decision forest, double or float
method (str) – [optional, default: “defaultDense”] Decision forest computation method
resultsToEvaluate (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).
nTrees (size_t) – [optional, default: -1] Number of trees in the forest. Default is 10
observationsPerTreeFraction (double) – [optional, default: get_nan64()] Fraction of observations used for a training of one tree, 0 to 1. Default is 1 (sampling with replacement)
featuresPerNode (size_t) – [optional, default: -1] Number of features tried as possible splits per node. If 0 then sqrt(p) for classification, p/3 for regression, where p is the total number of features.
maxTreeDepth (size_t) – [optional, default: -1] Maximal tree depth. Default is 0 (unlimited)
minObservationsInLeafNode (size_t) – [optional, default: -1] Minimal number of observations in a leaf node. Default is 1 for classification, 5 for regression.
engine (engines_batchbase__iface__) – [optional, default: None] Engine for the random numbers generator used by the algorithms
impurityThreshold (double) – [optional, default: get_nan64()] Threshold value used as stopping criteria: if the impurity value in the node is smaller than the threshold then the node is not split anymore.
varImportance (str) – [optional, default: “”] Variable importance computation mode
resultsToCompute (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).
memorySavingMode (bool) – [optional, default: False] If true then use memory saving (but slower) mode
bootstrap (bool) – [optional, default: False] If true then training set for a tree is a bootstrap of the whole training set
minObservationsInSplitNode (size_t) – [optional, default: -1] Minimal number of observations in a split node. Default 2
minWeightFractionInLeafNode (double) – [optional, default: get_nan64()] The minimum weighted fraction of the sum total of weights (of all the input observations) required to be at a leaf node, 0.0 to 0.5. Default is 0.0
minImpurityDecreaseInSplitNode (double) – [optional, default: get_nan64()] A node will be split if this split induces a decrease of the impurity greater than or equal to the value, non-negative. Default is 0.0
maxLeafNodes (size_t) – [optional, default: -1] Maximum number of leaf node. Default is 0 (unlimited)
maxBins (size_t) – [optional, default: -1] Used with ‘hist’ split finding method only. Maximal number of discrete bins to bucket continuous features. Default is 256. Increasing the number results in higher computation costs
minBinSize (size_t) – [optional, default: -1] Used with ‘hist’ split finding method only. Minimal number of observations in a bin. Default is 5
splitter (str) – [optional, default: “”] Sets node splitting method. Default is best
binningStrategy (str) – [optional, default: “”] Used with ‘hist’ split finding method only. Selects the strategy to group data points into bins. Allowed values are ‘quantiles’ (default), ‘averages’

compute(data, labels, weights)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Training data set
labels (data_or_file) – Labels of the training data set
weights (data_or_file) – [optional, default: None] Optional. Weights of the observations in the training data set

Return type:

decision_forest_classification_training_result

class daal4py.decision_forest_classification_training_result

Properties:

model

decision_forest_classification_model

Type:: type

outOfBagError

Numpy array

Type:: type

outOfBagErrorAccuracy

Numpy array

Type:: type

outOfBagErrorDecisionFunction

Numpy array

Type:: type

outOfBagErrorPerObservation

Numpy array

Type:: type

variableImportance

Numpy array

Type:: type

class daal4py.decision_forest_classification_prediction

Parameters:

nClasses (size_t) – Number of classes
fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the decision_forest algorithm, double or float
method (str) – [optional, default: “defaultDense”] decision_forest computation method
votingMethod (str) – [optional, default: “”]
resultsToEvaluate (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).

compute(data, model)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data set
model (decision_forest_classification_modelptr) – Input model trained by the classification algorithm

Return type:

classifier_prediction_result

class daal4py.decision_forest_classification_model

Properties:

NFeatures

size_t

Type:: type

NumberOfClasses

size_t

Type:: type

NumberOfFeatures

size_t

Type:: type

NumberOfTrees

size_t

Type:: type

Decision Tree Classification

Parameters and semantics are described in oneAPI Data Analytics Library Classification Decision Tree.

Examples:

Single-Process Decision Tree Classification

class daal4py.decision_tree_classification_training

Parameters:

nClasses (size_t) – Number of classes
fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for Decision tree model-based training, double or float
method (str) – [optional, default: “defaultDense”] Decision tree training method
pruning (str) – [optional, default: “”] Pruning method for Decision tree
maxTreeDepth (size_t) – [optional, default: -1] Maximum tree depth. 0 means unlimited depth.
minObservationsInLeafNodes (size_t) – [optional, default: -1] Minimum number of observations in the leaf node. Can be any positive number.
nBins (size_t) – [optional, default: -1] The number of bins used to compute probabilities of the observations belonging to the class. The only supported value for current version of the library is 1.
splitCriterion (str) – [optional, default: “”] Split criterion for Decision tree classification
resultsToEvaluate (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).

compute(data, labels, dataForPruning, labelsForPruning, weights)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Training data set
labels (data_or_file) – Labels of the training data set
dataForPruning (data_or_file) – Pruning data set
labelsForPruning (data_or_file) – Labels of the pruning data set
weights (data_or_file) – [optional, default: None] Optional. Weights of the observations in the training data set

Return type:

decision_tree_classification_training_result

class daal4py.decision_tree_classification_training_result

Properties:

model

decision_tree_classification_model

Type:: type

class daal4py.decision_tree_classification_prediction

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for Decision tree model-based prediction
method (str) – [optional, default: “defaultDense”] Computation method in the batch processing mode
pruning (str) – [optional, default: “”] Pruning method for Decision tree
maxTreeDepth (size_t) – [optional, default: -1] Maximum tree depth. 0 means unlimited depth.
minObservationsInLeafNodes (size_t) – [optional, default: -1] Minimum number of observations in the leaf node. Can be any positive number.
nBins (size_t) – [optional, default: -1] The number of bins used to compute probabilities of the observations belonging to the class. The only supported value for current version of the library is 1.
splitCriterion (str) – [optional, default: “”] Split criterion for Decision tree classification
nClasses (size_t) – [optional, default: -1] Number of classes
resultsToEvaluate (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).

compute(data, model)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data set
model (decision_tree_classification_modelptr) – Input model trained by the classification algorithm

Return type:

classifier_prediction_result

class daal4py.decision_tree_classification_model

Properties:

NFeatures

size_t

Type:: type

NumberOfFeatures

size_t

Type:: type

Gradient Boosted Classification

Parameters and semantics are described in oneAPI Data Analytics Library Classification Gradient Boosted Tree.

Examples:

Single-Process Gradient Boosted Classification

class daal4py.gbt_classification_training

Parameters:

nClasses (size_t) – Number of classes
fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for Gradient Boosted Trees, double or float
method (str) – [optional, default: “defaultDense”] Gradient Boosted Trees computation method
loss (str) – [optional, default: “”] Loss function type
varImportance (str) – [optional, default: “”] 64 bit integer flag VariableImportanceModes that indicates the variable importance computation modes
resultsToEvaluate (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).
splitMethod (str) – [optional, default: “”] Split finding method. Default is exact
maxIterations (size_t) – [optional, default: -1] Maximal number of iterations of the gradient boosted trees training algorithm. Default is 50
maxTreeDepth (size_t) – [optional, default: -1] Maximal tree depth, 0 for unlimited. Default is 6
shrinkage (double) – [optional, default: get_nan64()] Learning rate of the boosting procedure. Scales the contribution of each tree by a factor (0, 1]. Default is 0.3
minSplitLoss (double) – [optional, default: get_nan64()] Loss regularization parameter. Min loss reduction required to make a further partition on a leaf node of the tree. Range: [0, inf). Default is 0
lambda (double) – [optional, default: get_nan64()] L2 regularization parameter on weights. Range: [0, inf). Default is 1
observationsPerTreeFraction (double) – [optional, default: get_nan64()] Fraction of observations used for a training of one tree, sampling without replacement. Range: (0, 1]. Default is 1 (no sampling, entire dataset is used)
featuresPerNode (size_t) – [optional, default: -1] Number of features tried as possible splits per node. Range : [0, p] where p is the total number of features. Default is 0 (use all features)
minObservationsInLeafNode (size_t) – [optional, default: -1] Minimal number of observations in a leaf node. Default is 5.
memorySavingMode (bool) – [optional, default: False] If true then use memory saving (but slower) mode. Default is false
engine (engines_batchbase__iface__) – [optional, default: None] Engine for the random numbers generator used by the algorithms
maxBins (size_t) – [optional, default: -1] Used with ‘inexact’ split finding method only. Maximal number of discrete bins to bucket continuous features. Default is 256. Increasing the number results in higher computation costs
minBinSize (size_t) – [optional, default: -1] Used with ‘inexact’ split finding method only. Minimal number of observations in a bin. Default is 5
internalOptions (int) – [optional, default: -1] Internal options

compute(data, labels, weights)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Training data set
labels (data_or_file) – Labels of the training data set
weights (data_or_file) – [optional, default: None] Optional. Weights of the observations in the training data set

Return type:

gbt_classification_training_result

class daal4py.gbt_classification_training_result

Properties:

model

gbt_classification_model

Type:: type

variableImportanceByCover

Numpy array

Type:: type

variableImportanceByGain

Numpy array

Type:: type

variableImportanceByTotalCover

Numpy array

Type:: type

variableImportanceByTotalGain

Numpy array

Type:: type

variableImportanceByWeight

Numpy array

Type:: type

class daal4py.gbt_classification_prediction

Parameters:

nClasses (size_t) – Number of classes
fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the gbt algorithm, double or float
method (str) – [optional, default: “defaultDense”] gradient boosted trees computation method
nIterations (size_t) – [optional, default: -1] Number of iterations of the trained model to be used for prediction
resultsToCompute (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).
resultsToEvaluate (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).

compute(data, model)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data set
model (gbt_classification_modelptr) – Trained gradient boosted trees model

Return type:

gbt_classification_prediction_result

class daal4py.gbt_classification_model

Properties:

NFeatures

size_t

Type:: type

NumberOfFeatures

size_t

Type:: type

NumberOfTrees

size_t

Type:: type

PredictionBias

double

Type:: type

k-Nearest Neighbors (kNN)

Parameters and semantics are described in oneAPI Data Analytics Library k-Nearest Neighbors (kNN).

Examples:

Single-Process kNN

class daal4py.kdtree_knn_classification_training

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for KD-tree based kNN model-based training, double or float
method (str) – [optional, default: “defaultDense”] KD-tree based kNN training method
k (size_t) – [optional, default: -1] Number of neighbors
dataUseInModel (str) – [optional, default: “”] The option to enable/disable an usage of the input dataset in kNN model
engine (engines_batchbase__iface__) – [optional, default: None] Engine for random choosing elements from training dataset
resultsToCompute (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).
voteWeights (str) – [optional, default: “”] Weight function used in prediction
nClasses (size_t) – [optional, default: -1] Number of classes
resultsToEvaluate (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).

compute(data, labels, weights)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Training data set
labels (data_or_file) – [optional, default: None] Labels of the training data set
weights (data_or_file) – [optional, default: None] Optional. Weights of the observations in the training data set

Return type:

kdtree_knn_classification_training_result

class daal4py.kdtree_knn_classification_training_result

Properties:

model

kdtree_knn_classification_model

Type:: type

class daal4py.kdtree_knn_classification_prediction

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for KD-tree based kNN model-based prediction
method (str) – [optional, default: “defaultDense”] Computation method in the batch processing mode
k (size_t) – [optional, default: -1] Number of neighbors
dataUseInModel (str) – [optional, default: “”] The option to enable/disable an usage of the input dataset in kNN model
engine (engines_batchbase__iface__) – [optional, default: None] Engine for random choosing elements from training dataset
resultsToCompute (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).
voteWeights (str) – [optional, default: “”] Weight function used in prediction
nClasses (size_t) – [optional, default: -1] Number of classes
resultsToEvaluate (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).

compute(data, model)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data set
model (kdtree_knn_classification_modelptr) – Input model trained by the classification algorithm

Return type:

kdtree_knn_classification_prediction_result

class daal4py.kdtree_knn_classification_model

Properties:

NFeatures

size_t

Type:: type

NumberOfFeatures

size_t

Type:: type

Brute-force k-Nearest Neighbors (kNN)

Parameters and semantics are described in oneAPI Data Analytics Library k-Nearest Neighbors (kNN).

class daal4py.bf_knn_classification_training

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for BF kNN model-based training, double or float
method (str) – [optional, default: “defaultDense”] BF kNN training method
k (size_t) – [optional, default: -1] Number of neighbors
dataUseInModel (str) – [optional, default: “”] The option to enable/disable an usage of the input dataset in kNN model
resultsToCompute (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).
voteWeights (str) – [optional, default: “”] Weight function used in prediction
engine (engines_batchbase__iface__) – [optional, default: None] Engine for random choosing elements from training dataset
nClasses (size_t) – [optional, default: -1] Number of classes
resultsToEvaluate (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).

compute(data, labels, weights)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Training data set
labels (data_or_file) – [optional, default: None] Labels of the training data set
weights (data_or_file) – [optional, default: None] Optional. Weights of the observations in the training data set

Return type:

bf_knn_classification_training_result

class daal4py.bf_knn_classification_training_result

Properties:

model

bf_knn_classification_model

Type:: type

class daal4py.bf_knn_classification_prediction

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for BF kNN model-based prediction
method (str) – [optional, default: “defaultDense”] Computation method in the batch processing mode
k (size_t) – [optional, default: -1] Number of neighbors
dataUseInModel (str) – [optional, default: “”] The option to enable/disable an usage of the input dataset in kNN model
resultsToCompute (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).
voteWeights (str) – [optional, default: “”] Weight function used in prediction
engine (engines_batchbase__iface__) – [optional, default: None] Engine for random choosing elements from training dataset
nClasses (size_t) – [optional, default: -1] Number of classes
resultsToEvaluate (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).

compute(data, model)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data set
model (bf_knn_classification_modelptr) – Input model trained by the classification algorithm

Return type:

bf_knn_classification_prediction_result

class daal4py.bf_knn_classification_model

Properties:

NFeatures

size_t

Type:: type

NumberOfFeatures

size_t

Type:: type

AdaBoost Classification

Parameters and semantics are described in oneAPI Data Analytics Library Classification AdaBoost.

Examples:

Single-Process AdaBoost Classification

class daal4py.adaboost_training

Parameters:

nClasses (size_t) – Number of classes
fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the AdaBoost, double or float
method (str) – [optional, default: “defaultDense”] AdaBoost computation method
weakLearnerTraining (classifier_training_batch__iface__) – [optional, default: None] The algorithm for weak learner model training
weakLearnerPrediction (classifier_prediction_batch__iface__) – [optional, default: None] The algorithm for prediction based on a weak learner model
accuracyThreshold (double) – [optional, default: get_nan64()] Accuracy of the AdaBoost training algorithm
maxIterations (size_t) – [optional, default: -1] Maximal number of iterations of the AdaBoost training algorithm
learningRate (double) – [optional, default: get_nan64()] Multiplier for each classifier to shrink its contribution
resultsToCompute (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).
resultsToEvaluate (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).

compute(data, labels, weights)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Training data set
labels (data_or_file) – Labels of the training data set
weights (data_or_file) – [optional, default: None] Optional. Weights of the observations in the training data set

Return type:

adaboost_training_result

class daal4py.adaboost_training_result

Properties:

model

adaboost_model

Type:: type

weakLearnersErrors

Numpy array

Type:: type

class daal4py.adaboost_prediction

Parameters:

nClasses (size_t) – Number of classes
fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the AdaBoost, double or float
method (str) – [optional, default: “defaultDense”] AdaBoost computation method
weakLearnerTraining (classifier_training_batch__iface__) – [optional, default: None] The algorithm for weak learner model training
weakLearnerPrediction (classifier_prediction_batch__iface__) – [optional, default: None] The algorithm for prediction based on a weak learner model
accuracyThreshold (double) – [optional, default: get_nan64()] Accuracy of the AdaBoost training algorithm
maxIterations (size_t) – [optional, default: -1] Maximal number of iterations of the AdaBoost training algorithm
learningRate (double) – [optional, default: get_nan64()] Multiplier for each classifier to shrink its contribution
resultsToCompute (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).
resultsToEvaluate (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).

compute(data, model)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data set
model (adaboost_modelptr) – Input model trained by the classification algorithm

Return type:

classifier_prediction_result

class daal4py.adaboost_model

Properties:

Alpha

Numpy array

Type:: type

NFeatures

size_t

Type:: type

NumberOfFeatures

size_t

Type:: type

NumberOfWeakLearners

size_t

Type:: type

WeakLearnerModel(idx)

Type:: classifier_model (or derived)

BrownBoost Classification

Parameters and semantics are described in oneAPI Data Analytics Library Classification BrownBoost.

Examples:

Single-Process BrownBoost Classification

class daal4py.brownboost_training

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for BrownBoost, double or float
method (str) – [optional, default: “defaultDense”] BrownBoost computation method
weakLearnerTraining (classifier_training_batch__iface__) – [optional, default: None] The algorithm for weak learner model training
weakLearnerPrediction (classifier_prediction_batch__iface__) – [optional, default: None] The algorithm for prediction based on a weak learner model
accuracyThreshold (double) – [optional, default: get_nan64()] Accuracy of the BrownBoost training algorithm
maxIterations (size_t) – [optional, default: -1] Maximal number of iterations of the BrownBoost training algorithm
newtonRaphsonAccuracyThreshold (double) – [optional, default: get_nan64()] Accuracy threshold for Newton-Raphson iterations in the BrownBoost training algorithm
newtonRaphsonMaxIterations (size_t) – [optional, default: -1] Maximal number of Newton-Raphson iterations in the BrownBoost training algorithm
degenerateCasesThreshold (double) – [optional, default: get_nan64()] Threshold needed to avoid degenerate cases in the BrownBoost training algorithm
nClasses (size_t) – [optional, default: -1] Number of classes
resultsToEvaluate (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).

compute(data, labels, weights)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Training data set
labels (data_or_file) – Labels of the training data set
weights (data_or_file) – [optional, default: None] Optional. Weights of the observations in the training data set

Return type:

brownboost_training_result

class daal4py.brownboost_training_result

Properties:

model

brownboost_model

Type:: type

class daal4py.brownboost_prediction

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the BrownBoost algorithm, double or float
method (str) – [optional, default: “defaultDense”] BrownBoost computation method
weakLearnerTraining (classifier_training_batch__iface__) – [optional, default: None] The algorithm for weak learner model training
weakLearnerPrediction (classifier_prediction_batch__iface__) – [optional, default: None] The algorithm for prediction based on a weak learner model
accuracyThreshold (double) – [optional, default: get_nan64()] Accuracy of the BrownBoost training algorithm
maxIterations (size_t) – [optional, default: -1] Maximal number of iterations of the BrownBoost training algorithm
newtonRaphsonAccuracyThreshold (double) – [optional, default: get_nan64()] Accuracy threshold for Newton-Raphson iterations in the BrownBoost training algorithm
newtonRaphsonMaxIterations (size_t) – [optional, default: -1] Maximal number of Newton-Raphson iterations in the BrownBoost training algorithm
degenerateCasesThreshold (double) – [optional, default: get_nan64()] Threshold needed to avoid degenerate cases in the BrownBoost training algorithm
nClasses (size_t) – [optional, default: -1] Number of classes
resultsToEvaluate (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).

compute(data, model)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data set
model (brownboost_modelptr) – Input model trained by the classification algorithm

Return type:

classifier_prediction_result

class daal4py.brownboost_model

Properties:

Alpha

Numpy array

Type:: type

NFeatures

size_t

Type:: type

NumberOfFeatures

size_t

Type:: type

NumberOfWeakLearners

size_t

Type:: type

WeakLearnerModel(idx)

Type:: classifier_model (or derived)

LogitBoost Classification

Parameters and semantics are described in oneAPI Data Analytics Library Classification LogitBoost.

Examples:

Single-Process LogitBoost Classification

class daal4py.logitboost_training

Parameters:

nClasses (size_t) – Number of classes
fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for LogitBoost, double or float
method (str) – [optional, default: “friedman”] LogitBoost computation method
weakLearnerTraining (regression_training_batch__iface__) – [optional, default: None] The algorithm for weak learner model training
weakLearnerPrediction (regression_prediction_batch__iface__) – [optional, default: None] The algorithm for prediction based on a weak learner model
accuracyThreshold (double) – [optional, default: get_nan64()] Accuracy of the LogitBoost training algorithm
maxIterations (size_t) – [optional, default: -1] Maximal number of terms in additive regression
weightsDegenerateCasesThreshold (double) – [optional, default: get_nan64()] Threshold to avoid degenerate cases when calculating weights W
responsesDegenerateCasesThreshold (double) – [optional, default: get_nan64()] Threshold to avoid degenerate cases when calculating responses Z
resultsToEvaluate (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).

compute(data, labels, weights)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Training data set
labels (data_or_file) – Labels of the training data set
weights (data_or_file) – [optional, default: None] Optional. Weights of the observations in the training data set

Return type:

logitboost_training_result

class daal4py.logitboost_training_result

Properties:

model

logitboost_model

Type:: type

class daal4py.logitboost_prediction

Parameters:

nClasses (size_t) – Number of classes
fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the LogitBoost algorithm, double or float
method (str) – [optional, default: “defaultDense”] LogitBoost computation method
weakLearnerTraining (regression_training_batch__iface__) – [optional, default: None] The algorithm for weak learner model training
weakLearnerPrediction (regression_prediction_batch__iface__) – [optional, default: None] The algorithm for prediction based on a weak learner model
accuracyThreshold (double) – [optional, default: get_nan64()] Accuracy of the LogitBoost training algorithm
maxIterations (size_t) – [optional, default: -1] Maximal number of terms in additive regression
weightsDegenerateCasesThreshold (double) – [optional, default: get_nan64()] Threshold to avoid degenerate cases when calculating weights W
responsesDegenerateCasesThreshold (double) – [optional, default: get_nan64()] Threshold to avoid degenerate cases when calculating responses Z
resultsToEvaluate (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).

compute(data, model)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data set
model (logitboost_modelptr) – Input model trained by the classification algorithm

Return type:

classifier_prediction_result

class daal4py.logitboost_model

Properties:

Iterations

size_t

Type:: type

NFeatures

size_t

Type:: type

NumberOfFeatures

size_t

Type:: type

NumberOfWeakLearners

size_t

Type:: type

WeakLearnerModel(idx)

Type:: regression_model (or derived)

Stump Weak Learner Classification

Parameters and semantics are described in oneAPI Data Analytics Library Classification Weak Learner Stump.

Examples:

Single-Process Stump Weak Learner Classification

class daal4py.stump_classification_training

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the the decision stump training method, double or float
method (str) – [optional, default: “defaultDense”] Decision stump training method
splitCriterion (str) – [optional, default: “”] Split criterion for stump classification
varImportance (str) – [optional, default: “”] Variable importance computation mode
nClasses (size_t) – [optional, default: -1] Number of classes
resultsToEvaluate (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).

compute(data, labels, weights)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Training data set
labels (data_or_file) – Labels of the training data set
weights (data_or_file) – [optional, default: None] Optional. Weights of the observations in the training data set

Return type:

stump_classification_training_result

class daal4py.stump_classification_training_result

Properties:

model

stump_classification_model

Type:: type

variableImportance

Numpy array

Type:: type

class daal4py.stump_classification_prediction

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the decision stump prediction algorithm, double or float
method (str) – [optional, default: “defaultDense”] Decision stump model-based prediction method
nClasses (size_t) – [optional, default: -1] Number of classes
resultsToEvaluate (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).

compute(data, model)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data set
model (stump_classification_modelptr) – Input model trained by the classification algorithm

Return type:

classifier_prediction_result

class daal4py.stump_classification_model

Properties:

LeftValue

double

Type:: type

NFeatures

size_t

Type:: type

NumberOfFeatures

size_t

Type:: type

RightValue

double

Type:: type

SplitFeature

size_t

Type:: type

SplitValue

double

Type:: type

Multinomial Naive Bayes

Parameters and semantics are described in oneAPI Data Analytics Library Naive Bayes.

Examples:

class daal4py.multinomial_naive_bayes_training

Parameters:

nClasses (size_t) – Number of classes
fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for multinomial naive Bayes training, double or float
method (str) – [optional, default: “defaultDense”] Computation method
priorClassEstimates (array) – [optional, default: None] Prior class estimates
alpha (array) – [optional, default: None] Imagined occurrences of the each word
resultsToEvaluate (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).
distributed (bool) – [optional, default: False] enable distributed computation (SPMD)
streaming (bool) – [optional, default: False] enable streaming

compute(data, labels, weights)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Training data set
labels (data_or_file) – Labels of the training data set
weights (data_or_file) – [optional, default: None] Optional. Weights of the observations in the training data set

Return type:

multinomial_naive_bayes_training_result

class daal4py.multinomial_naive_bayes_training_result

Properties:

model

multinomial_naive_bayes_model

Type:: type

class daal4py.multinomial_naive_bayes_prediction

Parameters:

nClasses (size_t) – Number of classes
fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for prediction based on the multinomial naive Bayes model, double or float
method (str) – [optional, default: “defaultDense”] Multinomial naive Bayes prediction method
priorClassEstimates (array) – [optional, default: None] Prior class estimates
alpha (array) – [optional, default: None] Imagined occurrences of the each word
resultsToEvaluate (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).

compute(data, model)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data set
model (multinomial_naive_bayes_modelptr) – Input model trained by the classification algorithm

Return type:

classifier_prediction_result

class daal4py.multinomial_naive_bayes_model

Properties:

AuxTable

Numpy array

Type:: type

LogP

Numpy array

Type:: type

LogTheta

Numpy array

Type:: type

NFeatures

size_t

Type:: type

NumberOfFeatures

size_t

Type:: type

Support Vector Machine (SVM)

Parameters and semantics are described in oneAPI Data Analytics Library SVM.

Note: For the labels parameter, data is formatted as -1s and 1s

Examples:

Single-Process SVM

class daal4py.svm_training

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the SVM training algorithm, double or float
method (str) – [optional, default: “boser”] SVM training method
C (double) – [optional, default: get_nan64()] Upper bound in constraints of the quadratic optimization problem
accuracyThreshold (double) – [optional, default: get_nan64()] Training accuracy
tau (double) – [optional, default: get_nan64()] Tau parameter of the working set selection scheme
maxIterations (size_t) – [optional, default: -1] Maximal number of iterations for the algorithm
cacheSize (size_t) – [optional, default: -1] Size of cache in bytes to store values of the kernel matrix. A non-zero value enables use of a cache optimization technique
doShrinking (bool) – [optional, default: False] Flag that enables use of the shrinking optimization technique
shrinkingStep (size_t) – [optional, default: -1] Number of iterations between the steps of shrinking optimization technique
kernel (kernel_function_kerneliface__iface__) – [optional, default: None] Kernel function
nClasses (size_t) – [optional, default: -1] Number of classes
resultsToEvaluate (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).

compute(data, labels, weights)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Training data set
labels (data_or_file) – Labels of the training data set
weights (data_or_file) – [optional, default: None] Optional. Weights of the observations in the training data set

Return type:

svm_training_result

class daal4py.svm_training_result

Properties:

model

svm_model

Type:: type

class daal4py.svm_prediction

Parameters:

fptype (str) – [optional, default: “double”]
method (str) – [optional, default: “defaultDense”]
C (double) – [optional, default: get_nan64()] Upper bound in constraints of the quadratic optimization problem
accuracyThreshold (double) – [optional, default: get_nan64()] Training accuracy
tau (double) – [optional, default: get_nan64()] Tau parameter of the working set selection scheme
maxIterations (size_t) – [optional, default: -1] Maximal number of iterations for the algorithm
cacheSize (size_t) – [optional, default: -1] Size of cache in bytes to store values of the kernel matrix. A non-zero value enables use of a cache optimization technique
doShrinking (bool) – [optional, default: False] Flag that enables use of the shrinking optimization technique
shrinkingStep (size_t) – [optional, default: -1] Number of iterations between the steps of shrinking optimization technique
kernel (kernel_function_kerneliface__iface__) – [optional, default: None] Kernel function
nClasses (size_t) – [optional, default: -1] Number of classes
resultsToEvaluate (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).

compute(data, model)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data set
model (svm_modelptr) – Input model trained by the classification algorithm

Return type:

classifier_prediction_result

class daal4py.svm_model

Properties:

Bias

double

Type:: type

ClassificationCoefficients

Numpy array

Type:: type

NFeatures

size_t

Type:: type

NumberOfFeatures

size_t

Type:: type

SupportIndices

Numpy array

Type:: type

SupportVectors

Numpy array

Type:: type

Logistic Regression

Parameters and semantics are described in oneAPI Data Analytics Library Logistic Regression.

Examples:

class daal4py.logistic_regression_training

Parameters:

nClasses (size_t) – Number of classes
fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for logistic regression, double or float
method (str) – [optional, default: “defaultDense”] logistic regression computation method
interceptFlag (bool) – [optional, default: False] Whether the intercept needs to be computed
penaltyL1 (float) – [optional, default: get_nan32()] L1 regularization coefficient. Default is 0 (not applied)
penaltyL2 (float) – [optional, default: get_nan32()] L2 regularization coefficient. Default is 0 (not applied)
optimizationSolver (optimization_solver_iterative_solver_batch__iface__) – [optional, default: None] Default is sgd momentum solver
resultsToEvaluate (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).

compute(data, labels, weights)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Training data set
labels (data_or_file) – Labels of the training data set
weights (data_or_file) – [optional, default: None] Optional. Weights of the observations in the training data set

Return type:

logistic_regression_training_result

class daal4py.logistic_regression_training_result

Properties:

model

logistic_regression_model

Type:: type

class daal4py.logistic_regression_prediction

Parameters:

nClasses (size_t) – Number of classes
fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the logistic regression algorithm, double or float
method (str) – [optional, default: “defaultDense”] logistic regression computation method
resultsToEvaluate (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).

compute(data, model)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data set
model (logistic_regression_modelptr) – Input model trained by the classification algorithm

Return type:

classifier_prediction_result

class daal4py.logistic_regression_model

Properties:

Beta

Numpy array

Type:: type

InterceptFlag

bool

Type:: type

NFeatures

size_t

Type:: type

NumberOfBetas

size_t

Type:: type

NumberOfFeatures

size_t

Type:: type

Regression

Decision Forest Regression

Parameters and semantics are described in oneAPI Data Analytics Library Regression Decision Forest.

Examples:

class daal4py.decision_forest_regression_training

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for decision forest model-based training, double or float
method (str) – [optional, default: “defaultDense”] decision forest training method
nTrees (size_t) – [optional, default: -1] Number of trees in the forest. Default is 10
observationsPerTreeFraction (double) – [optional, default: get_nan64()] Fraction of observations used for a training of one tree, 0 to 1. Default is 1 (sampling with replacement)
featuresPerNode (size_t) – [optional, default: -1] Number of features tried as possible splits per node. If 0 then sqrt(p) for classification, p/3 for regression, where p is the total number of features.
maxTreeDepth (size_t) – [optional, default: -1] Maximal tree depth. Default is 0 (unlimited)
minObservationsInLeafNode (size_t) – [optional, default: -1] Minimal number of observations in a leaf node. Default is 1 for classification, 5 for regression.
engine (engines_batchbase__iface__) – [optional, default: None] Engine for the random numbers generator used by the algorithms
impurityThreshold (double) – [optional, default: get_nan64()] Threshold value used as stopping criteria: if the impurity value in the node is smaller than the threshold then the node is not split anymore.
varImportance (str) – [optional, default: “”] Variable importance computation mode
resultsToCompute (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).
memorySavingMode (bool) – [optional, default: False] If true then use memory saving (but slower) mode
bootstrap (bool) – [optional, default: False] If true then training set for a tree is a bootstrap of the whole training set
minObservationsInSplitNode (size_t) – [optional, default: -1] Minimal number of observations in a split node. Default 2
minWeightFractionInLeafNode (double) – [optional, default: get_nan64()] The minimum weighted fraction of the sum total of weights (of all the input observations) required to be at a leaf node, 0.0 to 0.5. Default is 0.0
minImpurityDecreaseInSplitNode (double) – [optional, default: get_nan64()] A node will be split if this split induces a decrease of the impurity greater than or equal to the value, non-negative. Default is 0.0
maxLeafNodes (size_t) – [optional, default: -1] Maximum number of leaf node. Default is 0 (unlimited)
maxBins (size_t) – [optional, default: -1] Used with ‘hist’ split finding method only. Maximal number of discrete bins to bucket continuous features. Default is 256. Increasing the number results in higher computation costs
minBinSize (size_t) – [optional, default: -1] Used with ‘hist’ split finding method only. Minimal number of observations in a bin. Default is 5
splitter (str) – [optional, default: “”] Sets node splitting method. Default is best
binningStrategy (str) – [optional, default: “”] Used with ‘hist’ split finding method only. Selects the strategy to group data points into bins. Allowed values are ‘quantiles’ (default), ‘averages’

compute(data, dependentVariable, weights)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table
dependentVariable (data_or_file) – Values of the dependent variable for the input data
weights (data_or_file) – [optional, default: None] Optional. Weights of the observations in the training data set

Return type:

decision_forest_regression_training_result

class daal4py.decision_forest_regression_training_result

Properties:

model

decision_forest_regression_model

Type:: type

outOfBagError

Numpy array

Type:: type

outOfBagErrorPerObservation

Numpy array

Type:: type

outOfBagErrorPrediction

Numpy array

Type:: type

outOfBagErrorR2

Numpy array

Type:: type

variableImportance

Numpy array

Type:: type

class daal4py.decision_forest_regression_prediction

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for decision forest model-based prediction
method (str) – [optional, default: “defaultDense”] Computation method in the batch processing mode

compute(data, model)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table
model (decision_forest_regression_modelptr) – Trained decision tree model

Return type:

decision_forest_regression_prediction_result

class daal4py.decision_forest_regression_prediction_result

Properties:

prediction

Numpy array

Type:: type

class daal4py.decision_forest_regression_model

Properties:

NumberOfFeatures

size_t

Type:: type

NumberOfTrees

size_t

Type:: type

Decision Tree Regression

Parameters and semantics are described in oneAPI Data Analytics Library Regression Decision Tree.

Examples:

Single-Process Decision Tree Regression

class daal4py.decision_tree_regression_training

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for Decision tree model-based training, double or float
method (str) – [optional, default: “defaultDense”] Decision tree training method
pruning (str) – [optional, default: “”] Pruning method for Decision tree
maxTreeDepth (size_t) – [optional, default: -1] Maximum tree depth. 0 means unlimited depth.
minObservationsInLeafNodes (size_t) – [optional, default: -1] Minimum number of observations in the leaf node. Can be any positive number.

compute(data, dependentVariables, dataForPruning, dependentVariablesForPruning, weights)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table
dependentVariables (data_or_file) – Values of the dependent variable for the input data
dataForPruning (data_or_file) – Pruning data set
dependentVariablesForPruning (data_or_file) – Labels of the pruning data set
weights (data_or_file) – [optional, default: None] Optional. Weights of the observations in the training data set

Return type:

decision_tree_regression_training_result

class daal4py.decision_tree_regression_training_result

Properties:

model

decision_tree_regression_model

Type:: type

class daal4py.decision_tree_regression_prediction

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for Decision tree model-based prediction
method (str) – [optional, default: “defaultDense”] Computation method in the batch processing mode
pruning (str) – [optional, default: “”] Pruning method for Decision tree
maxTreeDepth (size_t) – [optional, default: -1] Maximum tree depth. 0 means unlimited depth.
minObservationsInLeafNodes (size_t) – [optional, default: -1] Minimum number of observations in the leaf node. Can be any positive number.

compute(data, model)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table
model (decision_tree_regression_modelptr) – Trained decision tree model

Return type:

decision_tree_regression_prediction_result

class daal4py.decision_tree_regression_prediction_result

Properties:

prediction

Numpy array

Type:: type

class daal4py.decision_tree_regression_model

Properties:

NumberOfFeatures

size_t

Type:: type

Gradient Boosted Regression

Parameters and semantics are described in oneAPI Data Analytics Library Regression Gradient Boosted Tree.

Examples:

Single-Process Boosted Regression Regression

class daal4py.gbt_regression_training

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for model-based training, double or float
method (str) – [optional, default: “defaultDense”] gradient boosted trees training method
loss (str) – [optional, default: “”] Loss function type
varImportance (str) – [optional, default: “”] 64 bit integer flag VariableImportanceModes that indicates the variable importance computation modes
splitMethod (str) – [optional, default: “”] Split finding method. Default is exact
maxIterations (size_t) – [optional, default: -1] Maximal number of iterations of the gradient boosted trees training algorithm. Default is 50
maxTreeDepth (size_t) – [optional, default: -1] Maximal tree depth, 0 for unlimited. Default is 6
shrinkage (double) – [optional, default: get_nan64()] Learning rate of the boosting procedure. Scales the contribution of each tree by a factor (0, 1]. Default is 0.3
minSplitLoss (double) – [optional, default: get_nan64()] Loss regularization parameter. Min loss reduction required to make a further partition on a leaf node of the tree. Range: [0, inf). Default is 0
lambda (double) – [optional, default: get_nan64()] L2 regularization parameter on weights. Range: [0, inf). Default is 1
observationsPerTreeFraction (double) – [optional, default: get_nan64()] Fraction of observations used for a training of one tree, sampling without replacement. Range: (0, 1]. Default is 1 (no sampling, entire dataset is used)
featuresPerNode (size_t) – [optional, default: -1] Number of features tried as possible splits per node. Range : [0, p] where p is the total number of features. Default is 0 (use all features)
minObservationsInLeafNode (size_t) – [optional, default: -1] Minimal number of observations in a leaf node. Default is 5.
memorySavingMode (bool) – [optional, default: False] If true then use memory saving (but slower) mode. Default is false
engine (engines_batchbase__iface__) – [optional, default: None] Engine for the random numbers generator used by the algorithms
maxBins (size_t) – [optional, default: -1] Used with ‘inexact’ split finding method only. Maximal number of discrete bins to bucket continuous features. Default is 256. Increasing the number results in higher computation costs
minBinSize (size_t) – [optional, default: -1] Used with ‘inexact’ split finding method only. Minimal number of observations in a bin. Default is 5
internalOptions (int) – [optional, default: -1] Internal options

compute(data, dependentVariable)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table
dependentVariable (data_or_file) – Values of the dependent variable for the input data

Return type:

gbt_regression_training_result

class daal4py.gbt_regression_training_result

Properties:

model

gbt_regression_model

Type:: type

variableImportanceByCover

Numpy array

Type:: type

variableImportanceByGain

Numpy array

Type:: type

variableImportanceByTotalCover

Numpy array

Type:: type

variableImportanceByTotalGain

Numpy array

Type:: type

variableImportanceByWeight

Numpy array

Type:: type

class daal4py.gbt_regression_prediction

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for model-based prediction
method (str) – [optional, default: “defaultDense”] Computation method in the batch processing mode
nIterations (size_t) – [optional, default: -1] Number of iterations of the trained model to be uses for prediction
resultsToCompute (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).

compute(data, model)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table
model (gbt_regression_modelptr) – Trained gradient boosted trees model

Return type:

gbt_regression_prediction_result

class daal4py.gbt_regression_prediction_result

Properties:

prediction

Numpy array

Type:: type

class daal4py.gbt_regression_model

Properties:

NumberOfFeatures

size_t

Type:: type

NumberOfTrees

size_t

Type:: type

PredictionBias

double

Type:: type

Linear Regression

Parameters and semantics are described in oneAPI Data Analytics Library Linear Regression.

Examples:

class daal4py.linear_regression_training

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for linear regression model-based training, double or float
method (str) – [optional, default: “normEqDense”] Linear regression training method
interceptFlag (bool) – [optional, default: False] Flag that indicates whether the intercept needs to be computed
distributed (bool) – [optional, default: False] enable distributed computation (SPMD)
streaming (bool) – [optional, default: False] enable streaming

compute(data, dependentVariables)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table
dependentVariables (data_or_file) – Values of the dependent variable for the input data

Return type:

linear_regression_training_result

class daal4py.linear_regression_training_result

Properties:

model

linear_regression_model

Type:: type

class daal4py.linear_regression_prediction

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for linear regression model-based prediction
method (str) – [optional, default: “defaultDense”] Computation method in the batch processing mode

compute(data, model)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table
model (linear_regression_modelptr) – Trained linear regression model

Return type:

linear_regression_prediction_result

class daal4py.linear_regression_prediction_result

Properties:

prediction

Numpy array

Type:: type

class daal4py.linear_regression_model

Properties:

Beta

Numpy array

Type:: type

InterceptFlag

bool

Type:: type

NumberOfBetas

size_t

Type:: type

NumberOfFeatures

size_t

Type:: type

NumberOfResponses

size_t

Type:: type

LASSO Regression

Parameters and semantics are described in oneAPI Data Analytics Library Least Absolute Shrinkage and Selection Operator.

Examples:

Single-Process LASSO Regression

class daal4py.lasso_regression_training

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for lasso regression model-based training, double or float
method (str) – [optional, default: “defaultDense”] LASSO regression training method
lassoParameters (array) – [optional, default: None] Numeric table that contains values of lasso parameters
optimizationSolver (optimization_solver_iterative_solver_batch__iface__) – [optional, default: None] Default is coordinate descent solver
dataUseInComputation (str) – [optional, default: “”] The flag allows to corrupt input data
optResultToCompute (str) – [optional, default: “”] 64 bit integer flag that indicates the optional results to compute
interceptFlag (bool) – [optional, default: False] Flag that indicates whether the intercept needs to be computed

compute(data, dependentVariables, weights, gramMatrix)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table
dependentVariables (data_or_file) – Values of the dependent variable for the input data
weights (data_or_file) – [optional, default: None] NumericTable of size 1 x n with weights of samples. Applied for all method
gramMatrix (data_or_file) – [optional, default: None] NumericTable of size p x p with last iteration number. Applied for all method

Return type:

lasso_regression_training_result

class daal4py.lasso_regression_training_result

Properties:

gramMatrixId

Numpy array

Type:: type

model

lasso_regression_model

Type:: type

class daal4py.lasso_regression_prediction

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for lasso regression model-based prediction
method (str) – [optional, default: “defaultDense”]

compute(data, model)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table
model (lasso_regression_modelptr) – Trained lasso regression model

Return type:

lasso_regression_prediction_result

class daal4py.lasso_regression_prediction_result

Properties:

prediction

Numpy array

Type:: type

class daal4py.lasso_regression_model

Properties:

Beta

Numpy array

Type:: type

InterceptFlag

bool

Type:: type

NumberOfBetas

size_t

Type:: type

NumberOfFeatures

size_t

Type:: type

NumberOfResponses

size_t

Type:: type

Ridge Regression

Parameters and semantics are described in oneAPI Data Analytics Library Ridge Regression.

Examples:

class daal4py.ridge_regression_training

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for ridge regression model-based training, double or float
method (str) – [optional, default: “normEqDense”] Ridge regression training method
ridgeParameters (array) – [optional, default: None] Numeric table that contains values of ridge parameters
interceptFlag (bool) – [optional, default: False] Flag that indicates whether the intercept needs to be computed
distributed (bool) – [optional, default: False] enable distributed computation (SPMD)
streaming (bool) – [optional, default: False] enable streaming

compute(data, dependentVariables)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table
dependentVariables (data_or_file) – Values of the dependent variable for the input data

Return type:

ridge_regression_training_result

class daal4py.ridge_regression_training_result

Properties:

model

ridge_regression_model

Type:: type

class daal4py.ridge_regression_prediction

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for ridge regression model-based prediction
method (str) – [optional, default: “defaultDense”] Computation method in the batch processing mode

compute(data, model)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table
model (ridge_regression_modelptr) – Trained ridge regression model

Return type:

ridge_regression_prediction_result

class daal4py.ridge_regression_prediction_result

Properties:

prediction

Numpy array

Type:: type

class daal4py.ridge_regression_model

Properties:

Beta

Numpy array

Type:: type

InterceptFlag

bool

Type:: type

NumberOfBetas

size_t

Type:: type

NumberOfFeatures

size_t

Type:: type

NumberOfResponses

size_t

Type:: type

Stump Regression

Parameters and semantics are described in oneAPI Data Analytics Library Regression Stump.

Examples:

Single-Process Stump Regression

class daal4py.stump_regression_training

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the the decision stump training method, double or float
method (str) – [optional, default: “defaultDense”] Decision stump training method
varImportance (str) – [optional, default: “”] Variable importance mode. Variable importance computation is not supported for current version of the library

compute(data, dependentVariables, weights)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table
dependentVariables (data_or_file) – Values of the dependent variable for the input data
weights (data_or_file) – [optional, default: None] Optional. Weights of the observations in the training data set. Some values are skipped for backward compatibility.

Return type:

stump_regression_training_result

class daal4py.stump_regression_training_result

Properties:

model

stump_regression_model

Type:: type

variableImportance

Numpy array

Type:: type

class daal4py.stump_regression_prediction

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the decision stump prediction algorithm, double or float
method (str) – [optional, default: “defaultDense”] Decision stump model-based prediction method
varImportance (str) – [optional, default: “”] Variable importance mode. Variable importance computation is not supported for current version of the library

compute(data, model)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table
model (stump_regression_modelptr) – Trained regression model

Return type:

stump_regression_prediction_result

class daal4py.stump_regression_prediction_result

Properties:

prediction

Numpy array

Type:: type

class daal4py.stump_regression_model

Properties:

LeftValue

double

Type:: type

NumberOfFeatures

size_t

Type:: type

RightValue

double

Type:: type

SplitFeature

size_t

Type:: type

SplitValue

double

Type:: type

Clustering

K-Means Clustering

Parameters and semantics are described in oneAPI Data Analytics Library K-Means Clustering.

Examples:

K-Means Initialization

Parameters and semantics are described in oneAPI Data Analytics Library K-Means Initialization.

class daal4py.kmeans_init

Parameters:

nClusters (size_t) – Number of clusters
fptype (str) – [optional, default: “double”] Data type to use in intermediate computations of initial clusters for K-Means algorithm, double or float
method (str) – [optional, default: “defaultDense”] Method of computing initial clusters for the algorithm
nTrials (size_t) – [optional, default: -1] Kmeans++ only. The number of trials to generate all clusters but the first initial cluster.
oversamplingFactor (double) – [optional, default: get_nan64()] Kmeans|| only. A fraction of nClusters being chosen in each of nRounds of kmeans||.L = nClusters* oversamplingFactor points are sampled in a round.
nRounds (size_t) – [optional, default: -1] Kmeans|| only. Number of rounds for k-means||. (oversamplingFactor*nRounds) > 1 is a requirement.
engine (engines_batchbase__iface__) – [optional, default: None] Engine to be used for generating random numbers for the initialization
distributed (bool) – [optional, default: False] enable distributed computation (SPMD)

compute(data)

Do the actual computation on provided input data.

Parameters:: data (data_or_file) – Input data table
Return type:: kmeans_init_result

class daal4py.kmeans_init_result

Properties:

centroids

Numpy array

Type:: type

K-Means

Parameters and semantics are described in oneAPI Data Analytics Library K-Means Computation.

class daal4py.kmeans

Parameters:

nClusters (size_t) – Number of clusters
maxIterations (size_t) – Number of iterations
fptype (str) – [optional, default: “double”] Data type to use in intermediate computations of K-Means, double or float
method (str) – [optional, default: “lloydDense”] Computation method of the algorithm
accuracyThreshold (double) – [optional, default: get_nan64()] Threshold for the termination of the algorithm
gamma (double) – [optional, default: get_nan64()] Weight used in distance computation for categorical features
distanceType (str) – [optional, default: “”] Distance used in the algorithm
resultsToEvaluate (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).
assignFlag (bool) – [optional, default: False] Do data points assignment
distributed (bool) – [optional, default: False] enable distributed computation (SPMD)

compute(data, inputCentroids)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table
inputCentroids (data_or_file) – Initial centroids for the algorithm

Return type:

kmeans_result

class daal4py.kmeans_result

Properties:

assignments

Numpy array

Type:: type

centroids

Numpy array

Type:: type

nIterations

Numpy array

Type:: type

objectiveFunction

Numpy array

Type:: type

DBSCAN

Parameters and semantics are described in oneAPI Data Analytics Library Density-Based Spatial Clustering of Applications with Noise.

Examples:

Single-Process DBSCAN

class daal4py.dbscan

Parameters:

epsilon (double) – Radius of neighborhood
minObservations (size_t) – Minimal total weight of observations in neighborhood of core observation
fptype (str) – [optional, default: “double”] Data type to use in intermediate computations of DBSCAN, double or float
method (str) – [optional, default: “defaultDense”] Computation method of the algorithm
memorySavingMode (bool) – [optional, default: False] If true then use memory saving (but slower) mode
resultsToCompute (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).
blockIndex (size_t) – [optional, default: -1] Unique identifier of block initially passed for computation on the local node
nBlocks (size_t) – [optional, default: -1] Number of blocks initially passed for computation on all nodes
leftBlocks (size_t) – [optional, default: -1] Number of blocks that will process observations with value of selected split feature lesser than selected split value
rightBlocks (size_t) – [optional, default: -1] Number of blocks that will process observations with value of selected split feature greater than selected split value
distributed (bool) – [optional, default: False] enable distributed computation (SPMD)

compute(data, weights)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table
weights (data_or_file) – [optional, default: None] Input weights of observations

Return type:

dbscan_result

class daal4py.dbscan_result

Properties:

assignments

Numpy array

Type:: type

coreIndices

Numpy array

Type:: type

coreObservations

Numpy array

Type:: type

nClusters

Numpy array

Type:: type

Gaussian Mixtures

Parameters and semantics are described in oneAPI Data Analytics Library Expectation-Maximization.

Initialization for the Gaussian Mixture Model

Parameters and semantics are described in oneAPI Data Analytics Library Expectation-Maximization Initialization.

Examples:

Single-Process Expectation-Maximization

class daal4py.em_gmm_init

Parameters:

nComponents (size_t) – Number of components in the Gaussian mixture model
fptype (str) – [optional, default: “double”] Data type to use in intermediate computations of initial values for the EM for GMM algorithm, double or float
method (str) – [optional, default: “defaultDense”]
nTrials (size_t) – [optional, default: -1] Number of trials of short EM runs
nIterations (size_t) – [optional, default: -1] Number of iterations in every short EM run
accuracyThreshold (double) – [optional, default: get_nan64()] Threshold for the termination of the algorithm
covarianceStorage (str) – [optional, default: “”] Type of covariance in the Gaussian mixture model.
engine (engines_batchbase__iface__) – [optional, default: None] Engine to be used for randomly generating data points to start the initialization of short EM

compute(data)

Do the actual computation on provided input data.

Parameters:: data (data_or_file) – Input data table
Return type:: em_gmm_init_result

class daal4py.em_gmm_init_result

Properties:

covariances

Numpy array

Type:: type

means

Numpy array

Type:: type

weights

Numpy array

Type:: type

EM algorithm for the Gaussian Mixture Model

Parameters and semantics are described in oneAPI Data Analytics Library Expectation-Maximization for the Gaussian Mixture Model.

Examples:

Single-Process Expectation-Maximization

class daal4py.em_gmm

Parameters:

nComponents (size_t) – Number of components in the Gaussian mixture model
fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the EM for GMM algorithm, double or float
method (str) – [optional, default: “defaultDense”] EM for GMM computation method
maxIterations (size_t) – [optional, default: -1] Maximal number of iterations of the algorithm.
accuracyThreshold (double) – [optional, default: get_nan64()] Threshold for the termination of the algorithm.
regularizationFactor (double) – [optional, default: get_nan64()] Factor for covariance regularization in case of ill-conditional data
covarianceStorage (str) – [optional, default: “”] Type of covariance in the Gaussian mixture model.

compute(data, inputWeights, inputMeans, inputCovariances)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table
inputWeights (data_or_file) – Input weights
inputMeans (data_or_file) – Input means
inputCovariances (list_numerictableptr) – Collection of input covariances

Return type:

em_gmm_result

class daal4py.em_gmm_result

Properties:

covariances

Numpy array

Type:: type

goalFunction

Numpy array

Type:: type

means

Numpy array

Type:: type

nIterations

Numpy array

Type:: type

weights

Numpy array

Type:: type

Dimensionality reduction

Principal Component Analysis (PCA)

Parameters and semantics are described in oneAPI Data Analytics Library PCA.

Examples:

class daal4py.pca

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for PCA, double or float
method (str) – [optional, default: “correlationDense”] PCA computation method
resultsToCompute (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).
nComponents (size_t) – [optional, default: -1] number of components for reduced implementation (applicable for batch mode only)
isDeterministic (bool) – [optional, default: False] sign flip if required
doScale (bool) – [optional, default: False] scaling if required
isCorrelation (bool) – [optional, default: False] correlation is provided
normalization (normalization_zscore_batchimpl__iface__) – [optional, default: None] Pointer to batch covariance
distributed (bool) – [optional, default: False] enable distributed computation (SPMD)

compute(data, correlation)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table
correlation (data_or_file) – [optional, default: None] Input correlation table

Return type:

pca_result

class daal4py.pca_result

Properties:

dataForTransform

Numpy array

Type:: type

eigenvalues

Numpy array

Type:: type

eigenvectors

Numpy array

Type:: type

means

Numpy array

Type:: type

variances

Numpy array

Type:: type

Principal Component Analysis (PCA) Transform

Parameters and semantics are described in oneAPI Data Analytics Library PCA Transform.

Examples:

Single-Process PCA Transform

class daal4py.pca_transform

Parameters:

fptype (str) – [optional, default: “double”]
method (str) – [optional, default: “defaultDense”]
nComponents (size_t) – [optional, default: -1]

compute(data, eigenvectors, dataForTransform)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table
eigenvectors (data_or_file) – Transformation matrix of eigenvectors
dataForTransform (dict_numerictableptr) – Data for transform

Return type:

pca_transform_result

class daal4py.pca_transform_result

Properties:

transformedData

Numpy array

Type:: type

Outlier detection

Multivariate Outlier Detection

Parameters and semantics are described in oneAPI Data Analytics Library Multivariate Outlier Detection.

Examples:

Single-Process Multivariate Outlier Detection

class daal4py.multivariate_outlier_detection

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the multivariate outlier detection, double or float
method (str) – [optional, default: “defaultDense”] Multivariate outlier detection computation method

compute(data, location, scatter, threshold)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table
location (data_or_file) – [optional, default: None] Vector of mean estimates of size 1 x p
scatter (data_or_file) – [optional, default: None] Measure of spread, the variance-covariance matrix of size p x p
threshold (data_or_file) – [optional, default: None] Limit that defines the outlier region, the array of size 1 x 1 containing a non-negative number

Return type:

multivariate_outlier_detection_result

class daal4py.multivariate_outlier_detection_result

Properties:

weights

Numpy array

Type:: type

Univariate Outlier Detection

Parameters and semantics are described in oneAPI Data Analytics Library Univariate Outlier Detection.

Examples:

Single-Process Univariate Outlier Detection

class daal4py.univariate_outlier_detection

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the univariate outlier detection algorithm, double or float
method (str) – [optional, default: “defaultDense”] univariate outlier detection computation method

compute(data, location, scatter, threshold)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table
location (data_or_file) – [optional, default: None] Vector of mean estimates of size 1 x p
scatter (data_or_file) – [optional, default: None] Measure of spread, the array of standard deviations of size 1 x p
threshold (data_or_file) – [optional, default: None] Limit that defines the outlier region, the array of non-negative numbers of size 1 x p

Return type:

univariate_outlier_detection_result

class daal4py.univariate_outlier_detection_result

Properties:

weights

Numpy array

Type:: type

Multivariate Bacon Outlier Detection

Parameters and semantics are described in oneAPI Data Analytics Library Multivariate Bacon Outlier Detection.

Examples:

Single-Process Bacon Outlier Detection

class daal4py.bacon_outlier_detection

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the BACON outlier detection, double or float
method (str) – [optional, default: “defaultDense”] BACON outlier detection computation method
initMethod (str) – [optional, default: “”] Initialization method
alpha (double) – [optional, default: get_nan64()] One-tailed probability that defines the (1 - lpha) quantile of the chi^2 distribution with p degrees of freedom. Recommended value: lpha / n, where n is the number of observations.
toleranceToConverge (double) – [optional, default: get_nan64()] Stopping criterion: the algorithm is terminated if the size of the basic subset is changed by less than the threshold

compute(data)

Do the actual computation on provided input data.

Parameters:: data (data_or_file) – Input data table
Return type:: bacon_outlier_detection_result

class daal4py.bacon_outlier_detection_result

Properties:

weights

Numpy array

Type:: type

Optimization Solvers

Objective Functions

Mean Squared Error Algorithm (MSE)

Parameters and semantics are described in oneAPI Data Analytics Library MSE.

Examples:

class daal4py.optimization_solver_mse

Parameters:

numberOfTerms (size_t) – The number of terms in the function
fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the Mean squared error objective function, double or float
method (str) – [optional, default: “defaultDense”] The Mean squared error objective function computation method
interceptFlag (bool) – [optional, default: False] Whether the intercept needs to be computed. Default is true
penaltyL1 (array) – [optional, default: None] L1 regularization coefficients. Default is 0 (not applied)
penaltyL2 (array) – [optional, default: None] L2 regularization coefficients. Default is 0 (not applied)
batchIndices (array) – [optional, default: None] Numeric table of size 1 x m where m is batch size that represent a batch of indices used to compute the function results, e.g., value of the sum of the functions. If no indices are provided, all terms will be used in the computations.
featureId (size_t) – [optional, default: -1] The feature index to compute part of gradient/hessian/proximal projection
resultsToCompute (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).

compute(data, dependentVariables, argument, weights, gramMatrix)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Numeric table of size n x p with data
dependentVariables (data_or_file) – Numeric table of size n x 1 with dependent variables
argument (data_or_file) – Numeric table of size 1 x p with input argument of the objective function
weights (data_or_file) – NumericTable of size 1 x n with samples weights. Applied for all method
gramMatrix (data_or_file) – NumericTable of size p x p with last iteration number. Applied for all method

Return type:

optimization_solver_objective_function_result

setup(data, dependentVariables, argument, weights, gramMatrix)

Setup (partial) input data for using algorithm object in other algorithms.

Parameters:

data (data_or_file) – Numeric table of size n x p with data
dependentVariables (data_or_file) – Numeric table of size n x 1 with dependent variables
argument (data_or_file) – Numeric table of size 1 x p with input argument of the objective function
weights (data_or_file) – NumericTable of size 1 x n with samples weights. Applied for all method
gramMatrix (data_or_file) – NumericTable of size p x p with last iteration number. Applied for all method

Return type:

None

daal4py.optimization_solver_mse_result: alias of optimization_solver_objective_function_result

Logistic Loss

Parameters and semantics are described in oneAPI Data Analytics Library Logistic Loss.

Examples:

In SGD

class daal4py.optimization_solver_logistic_loss

Parameters:

numberOfTerms (size_t) – The number of terms in the function
fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the Logistic loss objective function, double or float
method (str) – [optional, default: “defaultDense”] The Logistic loss objective function computation method
interceptFlag (bool) – [optional, default: False] Whether the intercept needs to be computed. Default is true
penaltyL1 (float) – [optional, default: get_nan32()] L1 regularization coefficient. Default is 0 (not applied)
penaltyL2 (float) – [optional, default: get_nan32()] L2 regularization coefficient. Default is 0 (not applied)
batchIndices (array) – [optional, default: None] Numeric table of size 1 x m where m is batch size that represent a batch of indices used to compute the function results, e.g., value of the sum of the functions. If no indices are provided, all terms will be used in the computations.
featureId (size_t) – [optional, default: -1] The feature index to compute part of gradient/hessian/proximal projection
resultsToCompute (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).

compute(data, dependentVariables, argument)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Numeric table of size n x p with data
dependentVariables (data_or_file) – Numeric table of size n x 1 with dependent variables
argument (data_or_file) – Numeric table of size 1 x p with input argument of the objective function

Return type:

optimization_solver_objective_function_result

setup(data, dependentVariables, argument)

Setup (partial) input data for using algorithm object in other algorithms.

Parameters:

data (data_or_file) – Numeric table of size n x p with data
dependentVariables (data_or_file) – Numeric table of size n x 1 with dependent variables
argument (data_or_file) – Numeric table of size 1 x p with input argument of the objective function

Return type:

None

daal4py.optimization_solver_logistic_loss_result: alias of optimization_solver_objective_function_result

Cross-entropy Loss

Parameters and semantics are described in oneAPI Data Analytics Library Cross Entropy Loss.

Examples:

In LBFGS

class daal4py.optimization_solver_cross_entropy_loss

Parameters:

nClasses (size_t) – Number of classes (different values of dependent variable)
numberOfTerms (size_t) – The number of terms in the function
fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the Cross-entropy loss objective function, double or float
method (str) – [optional, default: “defaultDense”] The Cross-entropy loss objective function computation method
interceptFlag (bool) – [optional, default: False] Whether the intercept needs to be computed. Default is true
penaltyL1 (float) – [optional, default: get_nan32()] L1 regularization coefficient. Default is 0 (not applied)
penaltyL2 (float) – [optional, default: get_nan32()] L2 regularization coefficient. Default is 0 (not applied)
batchIndices (array) – [optional, default: None] Numeric table of size 1 x m where m is batch size that represent a batch of indices used to compute the function results, e.g., value of the sum of the functions. If no indices are provided, all terms will be used in the computations.
featureId (size_t) – [optional, default: -1] The feature index to compute part of gradient/hessian/proximal projection
resultsToCompute (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).

compute(data, dependentVariables, argument)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Numeric table of size n x p with data
dependentVariables (data_or_file) – Numeric table of size n x 1 with dependent variables
argument (data_or_file) – Numeric table of size 1 x p with input argument of the objective function

Return type:

optimization_solver_objective_function_result

setup(data, dependentVariables, argument)

Setup (partial) input data for using algorithm object in other algorithms.

Parameters:

data (data_or_file) – Numeric table of size n x p with data
dependentVariables (data_or_file) – Numeric table of size n x 1 with dependent variables
argument (data_or_file) – Numeric table of size 1 x p with input argument of the objective function

Return type:

None

daal4py.optimization_solver_cross_entropy_loss_result: alias of optimization_solver_objective_function_result

Sum of Functions

daal4py.optimization_solver_sum_of_functions_result: alias of optimization_solver_objective_function_result

Iterative Solvers

Stochastic Gradient Descent Algorithm

Parameters and semantics are described in oneAPI Data Analytics Library SGD.

Examples:

class daal4py.optimization_solver_sgd

Parameters:

function (optimization_solver_sum_of_functions_batch__iface__) – Objective function represented as sum of functions
fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the Stochastic gradient descent algorithm,
method (str) – [optional, default: “defaultDense”] Stochastic gradient descent computation method
batchIndices (array) – [optional, default: None] Numeric table that represents 32 bit integer indices of terms in the objective function. If no indices are provided, the implementation will generate random indices.
learningRateSequence (array) – [optional, default: None] Numeric table that contains values of the learning rate sequence
engine (engines_batchbase__iface__) – [optional, default: None] Engine for random generation of 32 bit integer indices of terms in the objective function.
nIterations (size_t) – [optional, default: -1] Maximal number of iterations of the algorithm
accuracyThreshold (double) – [optional, default: get_nan64()] Accuracy of the algorithm. The algorithm terminates when this accuracy is achieved
optionalResultRequired (bool) – [optional, default: False] Indicates whether optional result is required
batchSize (size_t) – [optional, default: -1] Number of batch indices to compute the stochastic gradient. If batchSize is equal to the number of terms in objective function then no random sampling is performed, and all terms are used to calculate the gradient. This parameter is ignored if batchIndices is provided.
conservativeSequence (array) – [optional, default: None] Numeric table of values of the conservative coefficient sequence
innerNIterations (size_t) – [optional, default: -1]
momentum (double) – [optional, default: get_nan64()] Momentum value

compute(inputArgument)

Do the actual computation on provided input data.

Parameters:: inputArgument (data_or_file) – Initial value to start optimization
Return type:: optimization_solver_sgd_result

class daal4py.optimization_solver_sgd_result

Properties:

minimum

Numpy array

Type:: type

nIterations

Numpy array

Type:: type

Limited-Memory Broyden-Fletcher-Goldfarb-Shanno Algorithm

Parameters and semantics are described in oneAPI Data Analytics Library LBFGS.

Examples:

Using MSE

class daal4py.optimization_solver_lbfgs

Parameters:

function (optimization_solver_sum_of_functions_batch__iface__) – Objective function represented as sum of functions
fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the LBFGS algorithm,
method (str) – [optional, default: “defaultDense”] LBFGS computation method
m (size_t) – [optional, default: -1] Memory parameter of LBFGS. The maximum number of correction pairs that define the approximation of inverse Hessian matrix.
L (size_t) – [optional, default: -1] The number of iterations between the curvature estimates calculations
engine (engines_batchbase__iface__) – [optional, default: None] Engine for random choosing terms from objective function.
batchIndices (array) – [optional, default: None]
correctionPairBatchSize (size_t) – [optional, default: -1] Number of observations to compute the sub-sampled Hessian for correction pairs computation
correctionPairBatchIndices (array) – [optional, default: None]
stepLengthSequence (array) – [optional, default: None]
nIterations (size_t) – [optional, default: -1] Maximal number of iterations of the algorithm
accuracyThreshold (double) – [optional, default: get_nan64()] Accuracy of the algorithm. The algorithm terminates when this accuracy is achieved
optionalResultRequired (bool) – [optional, default: False] Indicates whether optional result is required
batchSize (size_t) – [optional, default: -1] Number of batch indices to compute the stochastic gradient. If batchSize is equal to the number of terms in objective function then no random sampling is performed, and all terms are used to calculate the gradient. This parameter is ignored if batchIndices is provided.

compute(inputArgument)

Do the actual computation on provided input data.

Parameters:: inputArgument (data_or_file) – Initial value to start optimization
Return type:: optimization_solver_lbfgs_result

class daal4py.optimization_solver_lbfgs_result

Properties:

minimum

Numpy array

Type:: type

nIterations

Numpy array

Type:: type

Adaptive Subgradient Method

Parameters and semantics are described in oneAPI Data Analytics Library AdaGrad.

Examples:

Using MSE

class daal4py.optimization_solver_adagrad

Parameters:

function (optimization_solver_sum_of_functions_batch__iface__) – Objective function represented as sum of functions
fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the Adaptive gradient descent algorithm,
method (str) – [optional, default: “defaultDense”] Adaptive gradient descent computation method
batchIndices (array) – [optional, default: None] Numeric table that represents 32 bit integer indices of terms in the objective function. If no indices are provided, the implementation will generate random indices.
learningRate (array) – [optional, default: None] Numeric table that contains value of the learning rate
degenerateCasesThreshold (double) – [optional, default: get_nan64()] Value needed to avoid degenerate cases in square root computing.
engine (engines_batchbase__iface__) – [optional, default: None] Engine for random generation of 32 bit integer indices of terms in the objective function.
nIterations (size_t) – [optional, default: -1] Maximal number of iterations of the algorithm
accuracyThreshold (double) – [optional, default: get_nan64()] Accuracy of the algorithm. The algorithm terminates when this accuracy is achieved
optionalResultRequired (bool) – [optional, default: False] Indicates whether optional result is required
batchSize (size_t) – [optional, default: -1] Number of batch indices to compute the stochastic gradient. If batchSize is equal to the number of terms in objective function then no random sampling is performed, and all terms are used to calculate the gradient. This parameter is ignored if batchIndices is provided.

compute(inputArgument)

Do the actual computation on provided input data.

Parameters:: inputArgument (data_or_file) – Initial value to start optimization
Return type:: optimization_solver_adagrad_result

class daal4py.optimization_solver_adagrad_result

Properties:

minimum

Numpy array

Type:: type

nIterations

Numpy array

Type:: type

Stochastic Average Gradient Descent

Parameters and semantics are described in oneAPI Data Analytics Library Stochastic Average Gradient Descent SAGA.

Examples:

Single Process saga-logistc_loss

class daal4py.optimization_solver_saga

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the Stochastic average gradient descent algorithm,
method (str) – [optional, default: “defaultDense”] Stochastic average gradient descent computation method
batchIndices (array) – [optional, default: None] Numeric table that represents 32 bit integer indices of terms in the objective function. If no indices are provided, the implementation will generate random indices.
learningRateSequence (array) – [optional, default: None] Numeric table that contains value of the learning rate
engine (engines_batchbase__iface__) – [optional, default: None] Engine for random generation of 32 bit integer indices of terms in the objective function.
function (optimization_solver_sum_of_functions_batch__iface__) – [optional, default: None] Objective function represented as sum of functions
nIterations (size_t) – [optional, default: -1] Maximal number of iterations of the algorithm
accuracyThreshold (double) – [optional, default: get_nan64()] Accuracy of the algorithm. The algorithm terminates when this accuracy is achieved
optionalResultRequired (bool) – [optional, default: False] Indicates whether optional result is required
batchSize (size_t) – [optional, default: -1] Number of batch indices to compute the stochastic gradient. If batchSize is equal to the number of terms in objective function then no random sampling is performed, and all terms are used to calculate the gradient. This parameter is ignored if batchIndices is provided.

compute(inputArgument, gradientsTable)

Do the actual computation on provided input data.

Parameters:

inputArgument (data_or_file) – Initial value to start optimization
gradientsTable (data_or_file) – Numeric table of size p x 1 with the values of G, where each value is an accumulated sum of squares of corresponding gradient’s coordinate values.

Return type:

optimization_solver_saga_result

class daal4py.optimization_solver_saga_result

Properties:

gradientsTable

Numpy array

Type:: type

minimum

Numpy array

Type:: type

nIterations

Numpy array

Type:: type

Coordinate Descent

Parameters and semantics are described in oneAPI Data Analytics Library Coordinate Descent Algorithm.

class daal4py.optimization_solver_coordinate_descent

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the Coordinate descent algorithm,
method (str) – [optional, default: “defaultDense”] Coordinate descent computation method
seed (size_t) – [optional, default: -1] Seed for random generation of 32 bit integer indices of terms in the objective function. DAAL_DEPRECATED_USE{ engine }
engine (engines_batchbase__iface__) – [optional, default: None] Engine for random generation of 32 bit integer indices of terms in the objective function.
selection (str) – [optional, default: “”]
positive (bool) – [optional, default: False]
skipTheFirstComponents (bool) – [optional, default: False]
function (optimization_solver_sum_of_functions_batch__iface__) – [optional, default: None] Objective function represented as sum of functions
nIterations (size_t) – [optional, default: -1] Maximal number of iterations of the algorithm
accuracyThreshold (double) – [optional, default: get_nan64()] Accuracy of the algorithm. The algorithm terminates when this accuracy is achieved
optionalResultRequired (bool) – [optional, default: False] Indicates whether optional result is required
batchSize (size_t) – [optional, default: -1] Number of batch indices to compute the stochastic gradient. If batchSize is equal to the number of terms in objective function then no random sampling is performed, and all terms are used to calculate the gradient. This parameter is ignored if batchIndices is provided.

compute(inputArgument)

Do the actual computation on provided input data.

Parameters:: inputArgument (data_or_file) – Initial value to start optimization
Return type:: optimization_solver_coordinate_descent_result

class daal4py.optimization_solver_coordinate_descent_result

Properties:

minimum

Numpy array

Type:: type

nIterations

Numpy array

Type:: type

Precomputed Function

Parameters and semantics are described in oneAPI Data Analytics Library Objective Function with Precomputed Characteristics.

class daal4py.optimization_solver_precomputed

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the objective function with precomputed characteristics, double or float
method (str) – [optional, default: “defaultDense”] The objective function with precomputed characteristics method
numberOfTerms (size_t) – [optional, default: -1] The number of terms in the function
batchIndices (array) – [optional, default: None] Numeric table of size 1 x m where m is batch size that represent a batch of indices used to compute the function results, e.g., value of the sum of the functions. If no indices are provided, all terms will be used in the computations.
featureId (size_t) – [optional, default: -1] The feature index to compute part of gradient/hessian/proximal projection
resultsToCompute (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).

compute(argument)

Do the actual computation on provided input data.

Parameters:: argument (data_or_file) – Numeric table of size 1 x p with input argument of the objective function
Return type:: optimization_solver_objective_function_result

daal4py.optimization_solver_precomputed_result: alias of optimization_solver_objective_function_result

Recommender systems

Association Rules

Parameters and semantics are described in oneAPI Data Analytics Library Association Rules.

Examples:

Single-Process Association Rules

class daal4py.association_rules

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the association rules algorithm, double or float
method (str) – [optional, default: “apriori”] Association rules algorithm computation method
minSupport (double) – [optional, default: get_nan64()] Minimum support 0.0 <= minSupport < 1.0
minConfidence (double) – [optional, default: get_nan64()] Minimum confidence 0.0 <= minConfidence < 1.0
nUniqueItems (size_t) – [optional, default: -1] Number of unique items
nTransactions (size_t) – [optional, default: -1] Number of transactions
discoverRules (bool) – [optional, default: False] Flag. If true, association rules are built from large itemsets
itemsetsOrder (str) – [optional, default: “”] Format of the resulting itemsets
rulesOrder (str) – [optional, default: “”] Format of the resulting association rules
minItemsetSize (size_t) – [optional, default: -1] Minimum number of items in a large itemset
maxItemsetSize (size_t) – [optional, default: -1] Maximum number of items in a large itemset. Set to zero to not limit the upper boundary for the size of large itemsets

compute(data)

Do the actual computation on provided input data.

Parameters:: data (data_or_file) – Input data table
Return type:: association_rules_result

class daal4py.association_rules_result

Properties:

antecedentItemsets

Numpy array

Type:: type

confidence

Numpy array

Type:: type

consequentItemsets

Numpy array

Type:: type

largeItemsets

Numpy array

Type:: type

largeItemsetsSupport

Numpy array

Type:: type

Implicit Alternating Least Squares (implicit ALS)

Parameters and semantics are described in oneAPI Data Analytics Library Implicit Alternating Least Squares.

Examples:

Single-Process implicit ALS

class daal4py.implicit_als_training

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for implicit ALS model training, double or float
method (str) – [optional, default: “defaultDense”] Implicit ALS training method
nFactors (size_t) – [optional, default: -1] Number of factors
maxIterations (size_t) – [optional, default: -1] Maximum number of iterations of the implicit ALS training algorithm
alpha (double) – [optional, default: get_nan64()] Confidence parameter of the implicit ALS training algorithm
lambda (double) – [optional, default: get_nan64()] Regularization parameter
preferenceThreshold (double) – [optional, default: get_nan64()] Threshold used to define preference values

compute(data, inputModel)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table that contains ratings
inputModel (implicit_als_modelptr) – Initial model that contains initialized factors

Return type:

implicit_als_training_result

class daal4py.implicit_als_training_result

Properties:

model

implicit_als_model

Type:: type

class daal4py.implicit_als_model

Properties:

ItemsFactors

Numpy array

Type:: type

UsersFactors

Numpy array

Type:: type

class daal4py.implicit_als_prediction_ratings

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for implicit ALS model-based prediction, double or float
method (str) – [optional, default: “defaultDense”] Implicit ALS prediction method
nFactors (size_t) – [optional, default: -1] Number of factors
maxIterations (size_t) – [optional, default: -1] Maximum number of iterations of the implicit ALS training algorithm
alpha (double) – [optional, default: get_nan64()] Confidence parameter of the implicit ALS training algorithm
lambda (double) – [optional, default: get_nan64()] Regularization parameter
preferenceThreshold (double) – [optional, default: get_nan64()] Threshold used to define preference values

compute(model)

Do the actual computation on provided input data.

Parameters:: model (implicit_als_modelptr) – Input model trained by the ALS algorithm
Return type:: implicit_als_prediction_ratings_result

class daal4py.implicit_als_prediction_ratings_result

Properties:

prediction

Numpy array

Type:: type

Covariance, correlation, and distances

Cosine Distance Matrix

Parameters and semantics are described in oneAPI Data Analytics Library Cosine Distance.

Examples:

Single-Process Cosine Distance

class daal4py.cosine_distance

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the cosine distance, double or float
method (str) – [optional, default: “defaultDense”] Cosine distance computation method

compute(data)

Do the actual computation on provided input data.

Parameters:: data (data_or_file) – Input data table
Return type:: cosine_distance_result

class daal4py.cosine_distance_result

Properties:

cosineDistance

Numpy array

Type:: type

Correlation Distance Matrix

Parameters and semantics are described in oneAPI Data Analytics Library Correlation Distance.

Examples:

Single-Process Correlation Distance

class daal4py.correlation_distance

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the correlation distance algorithm, double or float
method (str) – [optional, default: “defaultDense”] Correlation distance computation method

compute(data)

Do the actual computation on provided input data.

Parameters:: data (data_or_file) – Input data table
Return type:: correlation_distance_result

class daal4py.correlation_distance_result

Properties:

correlationDistance

Numpy array

Type:: type

Correlation and Variance-Covariance Matrices

Parameters and semantics are described in oneAPI Data Analytics Library Correlation and Variance-Covariance Matrices.

Examples:

class daal4py.covariance

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations of the correlation or variance-covariance matrix, double or float
method (str) – [optional, default: “defaultDense”] Computation method
outputMatrixType (str) – [optional, default: “”] Type of the computed matrix
distributed (bool) – [optional, default: False] enable distributed computation (SPMD)
streaming (bool) – [optional, default: False] enable streaming

compute(data)

Do the actual computation on provided input data.

Parameters:: data (data_or_file) – Input data table
Return type:: covariance_result

class daal4py.covariance_result

Properties:

correlation

Numpy array

Type:: type

covariance

Numpy array

Type:: type

mean

Numpy array

Type:: type

Data pre-processing

Normalization

Parameters and semantics are described in oneAPI Data Analytics Library Normalization.

Z-Score

Parameters and semantics are described in oneAPI Data Analytics Library Z-Score.

Examples:

Single-Process Z-Score Normalization

class daal4py.normalization_zscore

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the z-score normalization, double or float
method (str) – [optional, default: “defaultDense”] Z-score normalization computation method
resultsToCompute (str) – [optional, default: “”] Type of results to compute or to evaluate. Can pass one of "computeClassLabels", "computeClassProbabilities", "computeClassLogProbabilities"; or more than one by joining them with separator bars (e.g. "computeClassLabels|computeClassProbabilities"). Note that not all of these are supported on every class/method accepting this argument (see docs for oneDAL for details on what this specific class/method supports).
doScale (bool) – [optional, default: False] boolean flag that indicates the mode of computation. If true both centering and scaling, otherwise only centering.

compute(data)

Do the actual computation on provided input data.

Parameters:: data (data_or_file) – Input data table
Return type:: normalization_zscore_result

class daal4py.normalization_zscore_result

Properties:

means

Numpy array

Type:: type

normalizedData

Numpy array

Type:: type

variances

Numpy array

Type:: type

Min-Max

Parameters and semantics are described in oneAPI Data Analytics Library Min-Max.

Examples:

Single-Process Min-Max Normalization

class daal4py.normalization_minmax

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the min-max normalization, double or float
method (str) – [optional, default: “defaultDense”] Min-max normalization computation method
lowerBound (double) – [optional, default: get_nan64()] The lower bound of the features value will be obtained during normalization.
upperBound (double) – [optional, default: get_nan64()] The upper bound of the features value will be obtained during normalization.

compute(data)

Do the actual computation on provided input data.

Parameters:: data (data_or_file) – Input data table
Return type:: normalization_minmax_result

class daal4py.normalization_minmax_result

Properties:

normalizedData

Numpy array

Type:: type

Statistics

Moments of Low Order

Parameters and semantics are described in oneAPI Data Analytics Library Moments of Low Order.

Examples:

class daal4py.low_order_moments

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations of the low order moments, double or float
method (str) – [optional, default: “defaultDense”] Computation method of the algorithm
estimatesToCompute (str) – [optional, default: “”] Estimates to be computed by the algorithm
distributed (bool) – [optional, default: False] enable distributed computation (SPMD)
streaming (bool) – [optional, default: False] enable streaming

compute(data)

Do the actual computation on provided input data.

Parameters:: data (data_or_file) – Input data table
Return type:: low_order_moments_result

class daal4py.low_order_moments_result

Properties:

maximum

Numpy array

Type:: type

mean

Numpy array

Type:: type

minimum

Numpy array

Type:: type

secondOrderRawMoment

Numpy array

Type:: type

standardDeviation

Numpy array

Type:: type

sum

Numpy array

Type:: type

sumSquares

Numpy array

Type:: type

sumSquaresCentered

Numpy array

Type:: type

variance

Numpy array

Type:: type

variation

Numpy array

Type:: type

Quantiles

Parameters and semantics are described in oneAPI Data Analytics Library Quantiles.

Examples:

Single-Process Quantiles

class daal4py.quantiles

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the quantile algorithms, double or float
method (str) – [optional, default: “defaultDense”] Quantiles computation method
quantileOrders (array) – [optional, default: None] Numeric table with quantile orders. Default value is 0.5 (median)

compute(data)

Do the actual computation on provided input data.

Parameters:: data (data_or_file) – Input data table
Return type:: quantiles_result

class daal4py.quantiles_result

Properties:

quantiles

Numpy array

Type:: type

Linear algebra

Cholesky Decomposition

Parameters and semantics are described in oneAPI Data Analytics Library Cholesky Decomposition.

Examples:

Single-Process Cholesky

class daal4py.cholesky

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the Cholesky decomposition algorithm,
method (str) – [optional, default: “defaultDense”] Cholesky decomposition computation method

compute(data)

Do the actual computation on provided input data.

Parameters:: data (data_or_file) – Input data table
Return type:: cholesky_result

class daal4py.cholesky_result

Properties:

choleskyFactor

Numpy array

Type:: type

QR Decomposition

Parameters and semantics are described in oneAPI Data Analytics Library QR Decomposition.

QR Decomposition (without pivoting)

Parameters and semantics are described in oneAPI Data Analytics Library QR Decomposition without pivoting.

Examples:

class daal4py.qr

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the QR decomposition algorithm, double or float
method (str) – [optional, default: “defaultDense”] Computation method of the algorithm
distributed (bool) – [optional, default: False] enable distributed computation (SPMD)
streaming (bool) – [optional, default: False] enable streaming

compute(data)

Do the actual computation on provided input data.

Parameters:: data (data_or_file) – Input data table
Return type:: qr_result

class daal4py.qr_result

Properties:

matrixQ

Numpy array

Type:: type

matrixR

Numpy array

Type:: type

Pivoted QR Decomposition

Parameters and semantics are described in oneAPI Data Analytics Library Pivoted QR Decomposition.

Examples:

Single-Process Pivoted QR

class daal4py.pivoted_qr

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations of the pivoted QR algorithm, double or float
method (str) – [optional, default: “defaultDense”] Computation method
permutedColumns (array) – [optional, default: None] On entry, if i-th element of permutedColumns != 0, * the i-th column of input matrix is moved to the beginning of Data * P before * the computation, and fixed in place during the computation. * If i-th element of permutedColumns = 0, the i-th column of input data * is a free column (that is, it may be interchanged during the * computation with any other free column).

compute(data)

Do the actual computation on provided input data.

Parameters:: data (data_or_file) – Input data table
Return type:: pivoted_qr_result

class daal4py.pivoted_qr_result

Properties:

matrixQ

Numpy array

Type:: type

matrixR

Numpy array

Type:: type

permutationMatrix

Numpy array

Type:: type

Singular Value Decomposition (SVD)

Parameters and semantics are described in oneAPI Data Analytics Library SVD.

Examples:

class daal4py.svd

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the SVD algorithm, double or float
method (str) – [optional, default: “defaultDense”] SVD computation method
leftSingularMatrix (str) – [optional, default: “”] Format of the matrix of left singular vectors >
rightSingularMatrix (str) – [optional, default: “”] Format of the matrix of right singular vectors >
distributed (bool) – [optional, default: False] enable distributed computation (SPMD)
streaming (bool) – [optional, default: False] enable streaming

compute(data)

Do the actual computation on provided input data.

Parameters:: data (data_or_file) – Input data table
Return type:: svd_result

class daal4py.svd_result

Properties:

leftSingularMatrix

Numpy array

Type:: type

rightSingularMatrix

Numpy array

Type:: type

singularValues

Numpy array

Type:: type

Random number generation

Random Number Engines

Parameters and semantics are described in oneAPI Data Analytics Library Engines.

class daal4py.engines_result

Properties:

randomNumbers

Numpy array

Type:: type

mt19937

Parameters and semantics are described in oneAPI Data Analytics Library mt19937.

class daal4py.engines_mt19937

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations of mt19937 engine, double or float
method (str) – [optional, default: “defaultDense”] Computation method of the engine
seed (size_t) – [optional, default: -1] seed

compute(tableToFill)

Do the actual computation on provided input data.

Parameters:: tableToFill (data_or_file) – Input table to fill with random numbers
Return type:: engines_result

daal4py.engines_mt19937_result: alias of engines_result

mt2203

Parameters and semantics are described in oneAPI Data Analytics Library mt2203.

class daal4py.engines_mt2203

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations of mt2203 engine, double or float
method (str) – [optional, default: “defaultDense”] Computation method of the engine
seed (size_t) – [optional, default: -1] seed

compute(tableToFill)

Do the actual computation on provided input data.

Parameters:: tableToFill (data_or_file) – Input table to fill with random numbers
Return type:: engines_result

daal4py.engines_mt2203_result: alias of engines_result

mcg59

Parameters and semantics are described in oneAPI Data Analytics Library mcg59.

class daal4py.engines_mcg59

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations of mcg59 engine, double or float
method (str) – [optional, default: “defaultDense”] Computation method of the engine
seed (size_t) – [optional, default: -1] seed

compute(tableToFill)

Do the actual computation on provided input data.

Parameters:: tableToFill (data_or_file) – Input table to fill with random numbers
Return type:: engines_result

daal4py.engines_mcg59_result: alias of engines_result

mrg32k3a

class daal4py.engines_mrg32k3a

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations of mrg32k3a engine, double or float
method (str) – [optional, default: “defaultDense”] Computation method of the engine
seed (size_t) – [optional, default: -1] seed

compute(tableToFill)

Do the actual computation on provided input data.

Parameters:: tableToFill (data_or_file) – Input table to fill with random numbers
Return type:: engines_result

daal4py.engines_mrg32k3a_result: alias of engines_result

philox4x32x10

class daal4py.engines_philox4x32x10

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations of philox4x32x10 engine, double or float
method (str) – [optional, default: “defaultDense”] Computation method of the engine
seed (size_t) – [optional, default: -1] seed

compute(tableToFill)

Do the actual computation on provided input data.

Parameters:: tableToFill (data_or_file) – Input table to fill with random numbers
Return type:: engines_result

daal4py.engines_philox4x32x10_result: alias of engines_result

Distributions

Parameters and semantics are described in oneAPI Data Analytics Library Distributions.

Bernoulli

Parameters and semantics are described in oneAPI Data Analytics Library Bernoulli Distribution.

Examples:

Single-Process Bernoulli Distribution

class daal4py.distributions_bernoulli

Parameters:

p (double) – Success probability of a trial, value from [0.0; 1.0]
fptype (str) – [optional, default: “double”] Data type to use in intermediate computations of bernoulli distribution, double or float
method (str) – [optional, default: “defaultDense”] Computation method of the distribution
engine (engines_batchbase__iface__) – [optional, default: None] Pointer to the engine

compute(tableToFill)

Do the actual computation on provided input data.

Parameters:: tableToFill (data_or_file) – Input table to fill with random numbers
Return type:: distributions_result

daal4py.distributions_bernoulli_result: alias of distributions_result

Normal

Parameters and semantics are described in oneAPI Data Analytics Library Normal Distribution.

Examples:

Single-Process Normal Distribution

class daal4py.distributions_normal

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations of normal distribution, double or float
method (str) – [optional, default: “defaultDense”] Computation method of the distribution
a (double) – [optional, default: get_nan64()] Mean
sigma (double) – [optional, default: get_nan64()] Standard deviation
engine (engines_batchbase__iface__) – [optional, default: None] Pointer to the engine

compute(tableToFill)

Do the actual computation on provided input data.

Parameters:: tableToFill (data_or_file) – Input table to fill with random numbers
Return type:: distributions_result

daal4py.distributions_normal_result: alias of distributions_result

Uniform

Parameters and semantics are described in oneAPI Data Analytics Library Uniform Distribution.

Examples:

Single-Process Uniform Distribution

class daal4py.distributions_uniform

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations of uniform distribution, double or float
method (str) – [optional, default: “defaultDense”] Computation method of the distribution
a (double) – [optional, default: get_nan64()] Left bound a
b (double) – [optional, default: get_nan64()] Right bound b
engine (engines_batchbase__iface__) – [optional, default: None] Pointer to the engine

compute(tableToFill)

Do the actual computation on provided input data.

Parameters:: tableToFill (data_or_file) – Input table to fill with random numbers
Return type:: distributions_result

daal4py.distributions_uniform_result: alias of distributions_result

Sorting

Parameters and semantics are described in oneAPI Data Analytics Library Sorting.

Examples:

Single-Process Sorting

class daal4py.sorting

Parameters:

fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the sorting, double or float
method (str) – [optional, default: “defaultDense”] Sorting computation method

compute(data)

Do the actual computation on provided input data.

Parameters:: data (data_or_file) – Input data table
Return type:: sorting_result

class daal4py.sorting_result

Properties:

sortedData

Numpy array

Type:: type