Batch Processing#
Input#
Centroid initialization for K-Means clustering accepts the input
described below. Pass the Input ID as a parameter to the methods
that provide input for your algorithm.
Input ID |
Input |
|---|---|
|
Pointer to the \(n \times p\) numeric table with the data to be clustered. |
Note
The input can be an object of any class derived from NumericTable.
Parameters#
The following table lists parameters of centroid initialization for K-Means clustering, which depend on the initialization method parameter method.
Parameter |
method |
Default Value |
Description |
|---|---|---|---|
|
any |
|
The floating-point type that the algorithm uses for intermediate computations. Can be |
|
Not applicable |
|
Available initialization methods for K-Means clustering: For CPU:
For GPU:
|
|
any |
Not applicable |
The number of clusters. Required. |
|
|
\(1\) |
The number of trails to generate all clusters but the first initial cluster. For details, see [Arthur2007], section 5 |
|
|
\(0.5\) |
A fraction of nClusters in each of nRounds of parallel K-Means++. L=nClusters*oversamplingFactor points are sampled in a round. For details, see [Bahmani2012], section 3.3. |
|
|
\(5\) |
The number of rounds for parallel K-Means++. (L*nRounds) must be greater than nClusters. For details, see [Bahmani2012], section 3.3. |
|
any |
SharePtr< engines:: mt19937:: Batch>() |
Pointer to the random number generator engine that is used internally for random numbers generation. |
Output#
Centroid initialization for K-Means clustering calculates the
result described below. Pass the Result ID as a parameter to the
methods that access the results of your algorithm.
Result ID |
Result |
|---|---|
|
Pointer to the \(nClusters \times p\) numeric table with the cluster centroids. |
Note
By default, this result is an object of the HomogenNumericTable class,
but you can define the result as an object of any class derived from NumericTable
except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.