Supported input types
Just like scikit-learn estimators, estimators from the Extension for Scikit-learn* are able to work with different classes of input data, including:
-
Note: masked arrays are also supported, but just like in stock scikit-learn, the underlying array values are used without the mask.
Other array classes implementing the Array API protocol (see Array API support for details).
SciPy sparse arrays and sparse matrices (depending on the estimator - see Supported Algorithms).
Pandas DataFrame and Series classes.
Other DataFrame classes recognize by data validators from scikit-learn, such as the ones from Polars.
In addition, Extension for Scikit-learn* also supports dpnp.ndarray arrays (with and without array API mode) in estimators with GPU support (see also Array API support).
Extension for Scikit-learn* currently does not offer accelerated routines for input types not listed here - when receiving an unsupported class, estimators will either convert to a supported class under some circumstances (e.g. PyArrow tables might get converted to NumPy arrays when passed to data validators from stock scikit-learn), throw an error (e.g. when passing some data format not that’s not recognized by scikit-learn), or fall back to stock scikit-learn to handle it (when array API is enabled but the input is unsupported).
Warning
In some cases, data passed to estimators might be copied/duplicated during calls to methods such as fit/predict. The affected cases are listed below:
Non-contiguous NumPy array - i.e. where strides are wider than one element across both rows and columns.
For SciPy CSR matrix / array, index arrays are always copied. Note that sparse matrices in formats other than CSR will be converted to CSR, which implies more than just data copying.
Heterogeneous NumPy array.
If a SYCL queue is used for Target offload option for a device without
float64support but data isfloat64, data will be converted tofloat32.If a dpnp.ndarray array on GPU is used as input without Array API support being enabled, then data will be transferred to CPU for validations and then back to GPU for computations (see GPU support for details).
If a dpnp.ndarray array on GPU is passed to an estimator with GPU support but the requested operation is not supported on GPU, and if array API is not enabled or scikit-learn does not support array API for the requested operation, then data might be transferred to CPU and the operation done there (see Configuration Contexts and Global Options).