Principal Components Analysis (PCA)#
Principal Component Analysis (PCA) is an algorithm for exploratory data analysis and dimensionality reduction. PCA transforms a set of feature vectors of possibly correlated features to a new set of uncorrelated features, called principal components. Principal components are the directions of the largest variance, that is, the directions where the data is mostly spread out.
| Operation | Computational methods | Programming Interface | |||
Mathematical formulation#
Programming Interface#
All types and functions in this section are declared in the
oneapi::dal::pca namespace and be available via inclusion of the
oneapi/dal/algo/pca.hpp header file.
Enum classes#
- 
enum class normalization#
- normalization::none
- No normalization is necessary or data is not normalized. 
- normalization::mean_center
- Just mean centered is necessary, or data is already centered. 
- normalization::zscore
- Normalization is necessary, or data is already normalized. 
 
Descriptor#
- 
template<typename Float = float, typename Method = method::by_default, typename Task = task::by_default>
 class descriptor#
- Template Parameters:
- Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double. 
- Method – Tag-type that specifies an implementation of algorithm. Can be method::cov or method::svd. 
- Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction. 
 
 - Constructors - 
descriptor(std::int64_t component_count = 0)#
- Creates a new instance of the class with the given - component_countproperty value.
 - Public Methods - 
bool whiten() const#
 - 
auto &set_whiten(bool value)#
 - Properties - 
result_option_id result_options#
- Choose which results should be computed and returned. - Getter & Setter
- result_option_id get_result_options() const- auto & set_result_options(const result_option_id &value)
 
 - 
std::int64_t component_count#
- The number of principal components \(r\). If it is zero, the algorithm computes the eigenvectors for all features, \(r = p\). Default value: 0. - Getter & Setter
- std::int64_t get_component_count() const- auto & set_component_count(std::int64_t value)
- Invariants
- component_count >= 0
 
 - 
normalization data_normalization#
- . Default value: normalization::none. - Getter & Setter
- normalization get_data_normalization() const- auto & set_data_normalization(normalization value)
 
 - 
bool deterministic#
- Specifies whether the algorithm applies the sign-flip technique. If it is true, the directions of the eigenvectors must be deterministic. Default value: true. - Getter & Setter
- bool get_deterministic() const- auto & set_deterministic(bool value)
 
 - 
normalization normalization_mode#
- . Default value: normalization::zscore. - Getter & Setter
- normalization get_normalization_mode() const- auto & set_normalization_mode(normalization value)
 
 
Model#
- 
template<typename Task = task::by_default>
 class model#
- Template Parameters:
- Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction. 
 - Constructors - 
model()#
- Creates a new instance of the class with the default property values. 
 - Properties - 
const table &variances#
- Variances. Default value: table{}. - Getter & Setter
- const table & get_variances() const- auto & set_variances(const table &value)
 
 - 
const table &means#
- Means. Default value: table{}. - Getter & Setter
- const table & get_means() const- auto & set_means(const table &value)
 
 
Training train(...)#
Input#
- 
template<typename Task = task::by_default>
 class train_input#
- Template Parameters:
- Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction. 
 - Constructors - 
train_input()#
 - 
train_input(const table &data)#
- Creates a new instance of the class with the given - dataproperty value.
 - Properties 
Result and Finalize Result#
- 
template<typename Task = task::by_default>
 class train_result#
- Template Parameters:
- Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction. 
 - Constructors - 
train_result()#
- Creates a new instance of the class with the default property values. 
 - Properties - 
const table &variances#
- A \(1 \times r\) table that contains the variances for the first - rfeatures. Default value: table{}.- Getter & Setter
- const table & get_variances() const- auto & set_variances(const table &value)
 
 - 
const result_option_id &result_options#
- Result options that indicates availability of the properties. Default value: default_result_options<Task>. - Getter & Setter
- const result_option_id & get_result_options() const- auto & set_result_options(const result_option_id &value)
 
 - 
const table &singular_values#
- A \(1 \times r\) table that contains the singular values for the first - rfeatures. Default value: table{}.- Getter & Setter
- const table & get_singular_values() const- auto & set_singular_values(const table &value)
 
 - 
const table &means#
- A \(1 \times r\) table that contains the mean values for the first - rfeatures. Default value: table{}.- Getter & Setter
- const table & get_means() const- auto & set_means(const table &value)
 
 - 
const model<Task> &model#
- The trained PCA model. Default value: model<Task>{}. - Getter & Setter
- const model< Task > & get_model() const- auto & set_model(const model< Task > &value)
 
 - 
const table &eigenvectors#
- An \(r \times p\) table with the eigenvectors. Each row contains one eigenvector. Default value: table{}. - Getter & Setter
- const table & get_eigenvectors() const- auto & set_eigenvectors(const table &value)
- Invariants
- eigenvectors == model.eigenvectors
 
 
Operation#
- 
template<typename Descriptor>
 pca::train_result train(const Descriptor &desc, const pca::train_input &input)#
- Parameters:
- desc – PCA algorithm descriptor pca::descriptor 
- input – Input data for the training operation 
 
 - Preconditions
- Postconditions
- result.means.row_count == 1result.means.column_count == desc.component_countresult.variances.row_count == 1result.variances.column_count == desc.component_countresult.variances[i] >= 0.0result.eigenvalues.row_count == 1result.eigenvalues.column_count == desc.component_countresult.model.eigenvectors.row_count == 1result.model.eigenvectors.column_count == desc.component_count
 
Partial Training#
Partial Input#
- 
template<typename Task = task::by_default>
 class partial_train_input#
- Constructors - 
partial_train_input()#
 - 
partial_train_input(const partial_train_result<Task> &prev, const table &data)#
 - Properties - 
const partial_train_result<Task> &prev#
- Getter & Setter
- const partial_train_result< Task > & get_prev() const- auto & set_prev(const partial_train_result< Task > &value)
 
 
- 
partial_train_input()#
Partial Result and Finalize Input#
- 
template<typename Task = task::by_default>
 class partial_train_result#
- Constructors - 
partial_train_result()#
 - Public Methods - 
std::int64_t get_auxiliary_table_count() const#
 - Properties - 
const table &partial_sum#
- Sums. Default value: table{}. - Getter & Setter
- const table & get_partial_sum() const- auto & set_partial_sum(const table &value)
 
 - 
const table &partial_n_rows#
- The nobs value. Default value: table{}. - Getter & Setter
- const table & get_partial_n_rows() const- auto & set_partial_n_rows(const table &value)
 
 
- 
partial_train_result()#
Finalize Training#
Inference infer(...)#
Input#
- 
template<typename Task = task::by_default>
 class infer_input#
- Template Parameters:
- Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction. 
 - Constructors - 
infer_input(const model<Task> &trained_model, const table &data)#
- Creates a new instance of the class with the given - modeland- dataproperty values.
 - Properties 
Result#
- 
template<typename Task = task::by_default>
 class infer_result#
- Template Parameters:
- Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction. 
 - Constructors - 
infer_result()#
- Creates a new instance of the class with the default property values. 
 - Properties 
Operation#
- 
template<typename Descriptor>
 pca::infer_result infer(const Descriptor &desc, const pca::infer_input &input)#
- Parameters:
- desc – PCA algorithm descriptor pca::descriptor 
- input – Input data for the inference operation 
 
 - Preconditions
- Postconditions
 
Usage Example#
Training#
pca::model<> run_training(const table& data) {
   const auto pca_desc = pca::descriptor<float>{}
      .set_component_count(5)
      .set_deterministic(true);
   const auto result = train(pca_desc, data);
   print_table("means", result.get_means());
   print_table("variances", result.get_variances());
   print_table("eigenvalues", result.get_eigenvalues());
   print_table("eigenvectors", result.get_eigenvectors());
   return result.get_model();
}
Inference#
table run_inference(const pca::model<>& model,
                  const table& new_data) {
   const auto pca_desc = pca::descriptor<float>{}
      .set_component_count(model.get_component_count());
   const auto result = infer(pca_desc, model, new_data);
   print_table("labels", result.get_transformed_data());
}
