Principal Components Analysis (PCA)#

Principal Component Analysis (PCA) is an algorithm for exploratory data analysis and dimensionality reduction. PCA transforms a set of feature vectors of possibly correlated features to a new set of uncorrelated features, called principal components. Principal components are the directions of the largest variance, that is, the directions where the data is mostly spread out.

Operation	Computational methods		Programming Interface
Training	Covariance	SVD	train(…)	train_input	train_result
Inference	Covariance	SVD	infer(…)	infer_input	infer_result
Partial Training	Covariance	SVD	partial_train(…)	partial_train_input	partial_train_result
Finalize Training	Covariance	SVD	finalize_train(…)	partial_train_result	train_result

Mathematical formulation#

Refer to Developer Guide: Principal Components Analysis.

Programming Interface#

All types and functions in this section are declared in the oneapi::dal::pca namespace and be available via inclusion of the oneapi/dal/algo/pca.hpp header file.

Enum classes#

enum class normalization#

normalization::none: No normalization is necessary or data is not normalized.
normalization::mean_center: Just mean centered is necessary, or data is already centered.
normalization::zscore: Normalization is necessary, or data is already normalized.

Descriptor#

template<typename Float = float, typename Method = method::by_default, typename Task = task::by_default> class descriptor#

Template Parameters:

Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.
Method – Tag-type that specifies an implementation of algorithm. Can be method::cov or method::svd.
Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.

Constructors

descriptor(std::int64_t component_count = 0)#: Creates a new instance of the class with the given component_count property value.

Public Methods

bool whiten() const#

auto &set_whiten(bool value)#

Properties

result_option_id result_options#

Choose which results should be computed and returned.

Getter & Setter: result_option_id get_result_options() const

auto & set_result_options(const result_option_id &value)

std::int64_t component_count#

The number of principal components \(r\). If it is zero, the algorithm computes the eigenvectors for all features, \(r = p\). Default value: 0.

Getter & Setter: std::int64_t get_component_count() const

auto & set_component_count(std::int64_t value)
Invariants: component_count >= 0

normalization data_normalization#

. Default value: normalization::none.

Getter & Setter: normalization get_data_normalization() const

auto & set_data_normalization(normalization value)

bool deterministic#

Specifies whether the algorithm applies the sign-flip technique. If it is true, the directions of the eigenvectors must be deterministic. Default value: true.

Getter & Setter: bool get_deterministic() const

auto & set_deterministic(bool value)

normalization normalization_mode#

. Default value: normalization::zscore.

Getter & Setter: normalization get_normalization_mode() const

auto & set_normalization_mode(normalization value)

Method tags#

struct cov#: Tag-type that denotes Covariance computational method.

struct precomputed#

struct svd#: Tag-type that denotes SVD computational method.

using by_default = cov #: Alias tag-type for Covariance computational method.

Task tags#

struct dim_reduction#: Tag-type that parameterizes entities used for solving dimensionality reduction problem.

using by_default = dim_reduction #: Alias tag-type for dimensionality reduction task.

Model#

template<typename Task = task::by_default> class model#

Template Parameters:: Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.

Constructors

model()#: Creates a new instance of the class with the default property values.

Properties

const table &variances#

Variances. Default value: table{}.

Getter & Setter: const table & get_variances() const

auto & set_variances(const table &value)

const table &means#

Means. Default value: table{}.

Getter & Setter: const table & get_means() const

auto & set_means(const table &value)

const table &eigenvectors#

An \(r \times p\) table with the eigenvectors. Each row contains one eigenvector. Default value: table{}.

Getter & Setter: const table & get_eigenvectors() const

auto & set_eigenvectors(const table &value)

const table &eigenvalues#

Eigenvalues. Default value: table{}.

Getter & Setter: const table & get_eigenvalues() const

auto & set_eigenvalues(const table &value)

Training train(...)#

Input#

template<typename Task = task::by_default> class train_input#

Template Parameters:: Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.

Constructors

train_input()#

train_input(const table &data)#: Creates a new instance of the class with the given data property value.

Properties

const table &data#

An \(n \times p\) table with the training data, where each row stores one feature vector. Default value: table{}.

Getter & Setter: const table & get_data() const

auto & set_data(const table &data)

Result and Finalize Result#

template<typename Task = task::by_default> class train_result#

Template Parameters:: Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.

Constructors

train_result()#: Creates a new instance of the class with the default property values.

Properties

const table &variances#

A \(1 \times r\) table that contains the variances for the first r features. Default value: table{}.

Getter & Setter: const table & get_variances() const

auto & set_variances(const table &value)

const result_option_id &result_options#

Result options that indicates availability of the properties. Default value: default_result_options<Task>.

Getter & Setter: const result_option_id & get_result_options() const

auto & set_result_options(const result_option_id &value)

const table &singular_values#

A \(1 \times r\) table that contains the singular values for the first r features. Default value: table{}.

Getter & Setter: const table & get_singular_values() const

auto & set_singular_values(const table &value)

const table &means#

A \(1 \times r\) table that contains the mean values for the first r features. Default value: table{}.

Getter & Setter: const table & get_means() const

auto & set_means(const table &value)

const model<Task> &model#

The trained PCA model. Default value: model<Task>{}.

Getter & Setter: const model< Task > & get_model() const

auto & set_model(const model< Task > &value)

const table &eigenvectors#

An \(r \times p\) table with the eigenvectors. Each row contains one eigenvector. Default value: table{}.

Getter & Setter: const table & get_eigenvectors() const

auto & set_eigenvectors(const table &value)
Invariants: eigenvectors == model.eigenvectors

const table &explained_variances_ratio#

A \(1 \times r\) table that contains the explained variances values for the first r features. Default value: table{}.

Getter & Setter: const table & get_explained_variances_ratio() const

auto & set_explained_variances_ratio(const table &value)

const table &eigenvalues#

A \(1 \times r\) table that contains the eigenvalues for for the first r features. Default value: table{}.

Getter & Setter: const table & get_eigenvalues() const

auto & set_eigenvalues(const table &value)

Operation#

template<typename Descriptor> pca::train_result train(const Descriptor &desc, const pca::train_input &input)#

Parameters:

desc – PCA algorithm descriptor pca::descriptor
input – Input data for the training operation

Preconditions: input.data.has_data == true

input.data.column_count >= desc.component_count
Postconditions: result.means.row_count == 1

result.means.column_count == desc.component_count

result.variances.row_count == 1

result.variances.column_count == desc.component_count

result.variances[i] >= 0.0

result.eigenvalues.row_count == 1

result.eigenvalues.column_count == desc.component_count

result.model.eigenvectors.row_count == 1

result.model.eigenvectors.column_count == desc.component_count

Partial Training#

Partial Input#

template<typename Task = task::by_default> class partial_train_input#

Constructors

partial_train_input()#

partial_train_input(const table &data)#

partial_train_input(const partial_train_result<Task> &prev, const table &data)#

Properties

const partial_train_result<Task> &prev#

Getter & Setter: const partial_train_result< Task > & get_prev() const

auto & set_prev(const partial_train_result< Task > &value)

const table &data#

Getter & Setter: const table & get_data() const

auto & set_data(const table &value)

Partial Result and Finalize Input#

template<typename Task = task::by_default> class partial_train_result#

Constructors

partial_train_result()#

Public Methods

std::int64_t get_auxiliary_table_count() const#

Properties

const table &partial_sum#

Sums. Default value: table{}.

Getter & Setter: const table & get_partial_sum() const

auto & set_partial_sum(const table &value)

const table &partial_n_rows#

The nobs value. Default value: table{}.

Getter & Setter: const table & get_partial_n_rows() const

auto & set_partial_n_rows(const table &value)

const table &partial_crossproduct#

The crossproduct matrix. Default value: table{}.

Getter & Setter: const table & get_partial_crossproduct() const

auto & set_partial_crossproduct(const table &value)

const table &auxiliary_table#

Getter & Setter: const table & get_auxiliary_table(const std::int64_t) const

auto & set_auxiliary_table(const table &value)

Finalize Training#

Inference infer(...)#

Input#

template<typename Task = task::by_default> class infer_input#

Template Parameters:: Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.

Constructors

infer_input(const model<Task> &trained_model, const table &data)#: Creates a new instance of the class with the given model and data property values.

Properties

const model<Task> &model#

The trained PCA model. Default value: model<Task>{}.

Getter & Setter: const model< Task > & get_model() const

auto & set_model(const model< Task > &value)

const table &data#

The dataset for inference \(X'\). Default value: table{}.

Getter & Setter: const table & get_data() const

auto & set_data(const table &value)

Result#

template<typename Task = task::by_default> class infer_result#

Template Parameters:: Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.

Constructors

infer_result()#: Creates a new instance of the class with the default property values.

Properties

const table &transformed_data#

An \(n \times r\) table that contains data projected to the r principal components. Default value: table{}.

Getter & Setter: const table & get_transformed_data() const

auto & set_transformed_data(const table &value)

Operation#

template<typename Descriptor> pca::infer_result infer(const Descriptor &desc, const pca::infer_input &input)#

Parameters:

desc – PCA algorithm descriptor pca::descriptor
input – Input data for the inference operation

Preconditions: input.data.has_data == true

input.model.eigenvectors.row_count == desc.component_count

input.model.eigenvectors.column_count == input.data.column_count
Postconditions: result.transformed_data.row_count == input.data.row_count

result.transformed_data.column_count == desc.component_count

Usage Example#

Training#

pca::model<> run_training(const table& data) {
   const auto pca_desc = pca::descriptor<float>{}
      .set_component_count(5)
      .set_deterministic(true);

   const auto result = train(pca_desc, data);

   print_table("means", result.get_means());
   print_table("variances", result.get_variances());
   print_table("eigenvalues", result.get_eigenvalues());
   print_table("eigenvectors", result.get_eigenvectors());

   return result.get_model();
}

Inference#

table run_inference(const pca::model<>& model,
                  const table& new_data) {
   const auto pca_desc = pca::descriptor<float>{}
      .set_component_count(model.get_component_count());

   const auto result = infer(pca_desc, model, new_data);

   print_table("labels", result.get_transformed_data());
}

Examples#

Batch Processing:

pca_cor_dense_batch.cpp

Online Processing:

pca_cor_dense_online.cpp

Principal Components Analysis (PCA)

Contents

Principal Components Analysis (PCA)#

Mathematical formulation#

Programming Interface#

Enum classes#

Descriptor#

Method tags#

Task tags#

Model#

Training train(...)#

Input#

Result and Finalize Result#

Operation#

Partial Training#

Partial Input#

Partial Result and Finalize Input#

Finalize Training#

Inference infer(...)#

Input#

Result#

Operation#

Usage Example#

Training#

Inference#

Examples#