Principal Components Analysis (PCA)

Contents

Principal Components Analysis (PCA)#

Principal Component Analysis (PCA) is an algorithm for exploratory data analysis and dimensionality reduction. PCA transforms a set of feature vectors of possibly correlated features to a new set of uncorrelated features, called principal components. Principal components are the directions of the largest variance, that is, the directions where the data is mostly spread out.

Mathematical formulation#

Refer to Developer Guide: Principal Components Analysis.

Programming Interface#

All types and functions in this section are declared in the oneapi::dal::pca namespace and be available via inclusion of the oneapi/dal/algo/pca.hpp header file.

Enum classes#

enum class normalization#
normalization::none

No normalization is necessary or data is not normalized.

normalization::mean_center

Just mean centered is necessary, or data is already centered.

normalization::zscore

Normalization is necessary, or data is already normalized.

Descriptor#

template<typename Float = float, typename Method = method::by_default, typename Task = task::by_default>
class descriptor#
Template Parameters:
  • Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

  • Method – Tag-type that specifies an implementation of algorithm. Can be method::cov or method::svd.

  • Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.

Constructors

descriptor(std::int64_t component_count = 0)#

Creates a new instance of the class with the given component_count property value.

Public Methods

bool whiten() const#
auto &set_whiten(bool value)#

Properties

normalization normalization_mode#

. Default value: normalization::zscore.

Getter & Setter
normalization get_normalization_mode() const
auto & set_normalization_mode(normalization value)
normalization data_normalization#

. Default value: normalization::none.

Getter & Setter
normalization get_data_normalization() const
auto & set_data_normalization(normalization value)
bool deterministic#

Specifies whether the algorithm applies the sign-flip technique. If it is true, the directions of the eigenvectors must be deterministic. Default value: true.

Getter & Setter
bool get_deterministic() const
auto & set_deterministic(bool value)
std::int64_t component_count#

The number of principal components \(r\). If it is zero, the algorithm computes the eigenvectors for all features, \(r = p\). Default value: 0.

Getter & Setter
std::int64_t get_component_count() const
auto & set_component_count(std::int64_t value)
Invariants
result_option_id result_options#

Choose which results should be computed and returned.

Getter & Setter
result_option_id get_result_options() const
auto & set_result_options(const result_option_id &value)

Method tags#

struct cov#

Tag-type that denotes Covariance computational method.

struct precomputed#
struct svd#

Tag-type that denotes SVD computational method.

using by_default = cov#

Alias tag-type for Covariance computational method.

Task tags#

struct dim_reduction#

Tag-type that parameterizes entities used for solving dimensionality reduction problem.

using by_default = dim_reduction#

Alias tag-type for dimensionality reduction task.

Model#

template<typename Task = task::by_default>
class model#
Template Parameters:

Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.

Constructors

model()#

Creates a new instance of the class with the default property values.

Properties

const table &eigenvectors#

An \(r \times p\) table with the eigenvectors. Each row contains one eigenvector. Default value: table{}.

Getter & Setter
const table & get_eigenvectors() const
auto & set_eigenvectors(const table &value)
const table &eigenvalues#

Eigenvalues. Default value: table{}.

Getter & Setter
const table & get_eigenvalues() const
auto & set_eigenvalues(const table &value)
const table &variances#

Variances. Default value: table{}.

Getter & Setter
const table & get_variances() const
auto & set_variances(const table &value)
const table &means#

Means. Default value: table{}.

Getter & Setter
const table & get_means() const
auto & set_means(const table &value)

Training train(...)#

Input#

template<typename Task = task::by_default>
class train_input#
Template Parameters:

Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.

Constructors

train_input()#
train_input(const table &data)#

Creates a new instance of the class with the given data property value.

Properties

const table &data#

An \(n \times p\) table with the training data, where each row stores one feature vector. Default value: table{}.

Getter & Setter
const table & get_data() const
auto & set_data(const table &data)

Result and Finalize Result#

template<typename Task = task::by_default>
class train_result#
Template Parameters:

Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.

Constructors

train_result()#

Creates a new instance of the class with the default property values.

Properties

const table &eigenvectors#

An \(r \times p\) table with the eigenvectors. Each row contains one eigenvector. Default value: table{}.

Getter & Setter
const table & get_eigenvectors() const
auto & set_eigenvectors(const table &value)
Invariants
eigenvectors == model.eigenvectors
const table &singular_values#

A \(1 \times r\) table that contains the singular values for the first r features. Default value: table{}.

Getter & Setter
const table & get_singular_values() const
auto & set_singular_values(const table &value)
const table &explained_variances_ratio#

A \(1 \times r\) table that contains the explained variances values for the first r features. Default value: table{}.

Getter & Setter
const table & get_explained_variances_ratio() const
auto & set_explained_variances_ratio(const table &value)
const table &variances#

A \(1 \times r\) table that contains the variances for the first r features. Default value: table{}.

Getter & Setter
const table & get_variances() const
auto & set_variances(const table &value)
const table &means#

A \(1 \times r\) table that contains the mean values for the first r features. Default value: table{}.

Getter & Setter
const table & get_means() const
auto & set_means(const table &value)
const model<Task> &model#

The trained PCA model. Default value: model<Task>{}.

Getter & Setter
const model< Task > & get_model() const
auto & set_model(const model< Task > &value)
const table &eigenvalues#

A \(1 \times r\) table that contains the eigenvalues for for the first r features. Default value: table{}.

Getter & Setter
const table & get_eigenvalues() const
auto & set_eigenvalues(const table &value)
const result_option_id &result_options#

Result options that indicates availability of the properties. Default value: default_result_options<Task>.

Getter & Setter
const result_option_id & get_result_options() const
auto & set_result_options(const result_option_id &value)

Operation#

template<typename Descriptor>
pca::train_result train(const Descriptor &desc, const pca::train_input &input)#
Parameters:
  • desc – PCA algorithm descriptor pca::descriptor

  • input – Input data for the training operation

Preconditions
input.data.has_data == true
input.data.column_count >= desc.component_count
Postconditions
result.means.row_count == 1
result.means.column_count == desc.component_count
result.variances.row_count == 1
result.variances.column_count == desc.component_count
result.variances[i] >= 0.0
result.eigenvalues.row_count == 1
result.eigenvalues.column_count == desc.component_count
result.model.eigenvectors.row_count == 1
result.model.eigenvectors.column_count == desc.component_count

Partial Training#

Partial Input#

template<typename Task = task::by_default>
class partial_train_input#

Constructors

partial_train_input()#
partial_train_input(const table &data)#
partial_train_input(const partial_train_result<Task> &prev, const table &data)#

Properties

const table &data#
Getter & Setter
const table & get_data() const
auto & set_data(const table &value)
const partial_train_result<Task> &prev#
Getter & Setter
const partial_train_result< Task > & get_prev() const
auto & set_prev(const partial_train_result< Task > &value)

Partial Result and Finalize Input#

template<typename Task = task::by_default>
class partial_train_result#

Constructors

partial_train_result()#

Public Methods

std::int64_t get_auxiliary_table_count() const#

Properties

const table &partial_crossproduct#

The crossproduct matrix. Default value: table{}.

Getter & Setter
const table & get_partial_crossproduct() const
auto & set_partial_crossproduct(const table &value)
const table &partial_sum#

Sums. Default value: table{}.

Getter & Setter
const table & get_partial_sum() const
auto & set_partial_sum(const table &value)
const table &partial_n_rows#

The nobs value. Default value: table{}.

Getter & Setter
const table & get_partial_n_rows() const
auto & set_partial_n_rows(const table &value)
const table &auxiliary_table#
Getter & Setter
const table & get_auxiliary_table(const std::int64_t) const
auto & set_auxiliary_table(const table &value)

Finalize Training#

Inference infer(...)#

Input#

template<typename Task = task::by_default>
class infer_input#
Template Parameters:

Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.

Constructors

infer_input(const model<Task> &trained_model, const table &data)#

Creates a new instance of the class with the given model and data property values.

Properties

const table &data#

The dataset for inference \(X'\). Default value: table{}.

Getter & Setter
const table & get_data() const
auto & set_data(const table &value)
const model<Task> &model#

The trained PCA model. Default value: model<Task>{}.

Getter & Setter
const model< Task > & get_model() const
auto & set_model(const model< Task > &value)

Result#

template<typename Task = task::by_default>
class infer_result#
Template Parameters:

Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.

Constructors

infer_result()#

Creates a new instance of the class with the default property values.

Properties

const table &transformed_data#

An \(n \times r\) table that contains data projected to the r principal components. Default value: table{}.

Getter & Setter
const table & get_transformed_data() const
auto & set_transformed_data(const table &value)

Operation#

template<typename Descriptor>
pca::infer_result infer(const Descriptor &desc, const pca::infer_input &input)#
Parameters:
  • desc – PCA algorithm descriptor pca::descriptor

  • input – Input data for the inference operation

Preconditions
input.data.has_data == true
input.model.eigenvectors.row_count == desc.component_count
input.model.eigenvectors.column_count == input.data.column_count
Postconditions
result.transformed_data.row_count == input.data.row_count
result.transformed_data.column_count == desc.component_count

Usage Example#

Training#

pca::model<> run_training(const table& data) {
   const auto pca_desc = pca::descriptor<float>{}
      .set_component_count(5)
      .set_deterministic(true);

   const auto result = train(pca_desc, data);

   print_table("means", result.get_means());
   print_table("variances", result.get_variances());
   print_table("eigenvalues", result.get_eigenvalues());
   print_table("eigenvectors", result.get_eigenvectors());

   return result.get_model();
}

Inference#

table run_inference(const pca::model<>& model,
                  const table& new_data) {
   const auto pca_desc = pca::descriptor<float>{}
      .set_component_count(model.get_component_count());

   const auto result = infer(pca_desc, model, new_data);

   print_table("labels", result.get_transformed_data());
}

Examples#

Batch Processing:

Online Processing: