Supported input types
Just like scikit-learn estimators, estimators from the Extension for Scikit-learn* are able to accept and work with different classes of input data, including:
-
Note: masked arrays are also supported, but just like in stock scikit-learn, the underlying array values are used without the mask.
Other array classes implementing the Array API protocol (see Array API support for details).
SciPy sparse arrays and sparse matrices (depending on the estimator).
Pandas DataFrame and Series classes.
In addition, Extension for Scikit-learn* also supports dpnp.ndarray arrays, which are particularly useful for GPU computations.
Stock Scikit-Learn estimators, depending on the version, might offer support for additional
input types beyond this list, such as DataFrame and Series classes from other libraries
like Polars.
Extension for Scikit-learn* currently does not offer accelerated routines for input types not listed here - when receiving an unsupported class, estimators will either convert to a supported class under some circumstances (e.g. PyArrow tables might get converted to NumPy arrays when passed to data validators from stock scikit-learn), throw an error (e.g. when passing some data format not that’s not recognized by scikit-learn), or fall back to stock scikit-learn to handle it (when array API is enabled the input is unsupported).
Warning
In some cases data passed to estimators might be copied/duplicated during calls to methods such as fit/predict. The affected cases are listed below.
Non-contiguous NumPy array - i.e. where strides are wider than one element across both rows and columns
For SciPy CSR matrix / array, index arrays are always copied. Note that sparse matrices in formats other than CSR will be converted to CSR, which implies more than just data copying.
Heterogeneous NumPy array
If SYCL queue is provided for device without
float64support but data arefloat64, data are copied with reduced precision.If Array API is not enabled then data from GPU devices are always copied to the host device and then result table (for applicable methods) is copied to the source device.