MissingIndicator#

class sklearn.impute.MissingIndicator(*, missing_values=nan, features='missing-only', sparse='auto', error_on_new=True)[source]#

Binary indicators for missing values.

Note that this component typically should not be used in a vanilla Pipeline consisting of transformers and a classifier, but rather could be added using a FeatureUnion or ColumnTransformer.

See also

SimpleImputer: Univariate imputation of missing values.
IterativeImputer: Multivariate imputation of missing values.

Examples

>>> import numpy as np
>>> from sklearn.impute import MissingIndicator
>>> X1 = np.array([[np.nan, 1, 3],
...                [4, 0, np.nan],
...                [8, 1, 0]])
>>> X2 = np.array([[5, 1, np.nan],
...                [np.nan, 2, 3],
...                [2, 4, 0]])
>>> indicator = MissingIndicator()
>>> indicator.fit(X1)
MissingIndicator()
>>> X2_tr = indicator.transform(X2)
>>> X2_tr
array([[False,  True],
       [ True, False],
       [False, False]])

fit(X, y=None)[source]#

Fit the transformer on X.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): Input data, where n_samples is the number of samples and n_features is the number of features.
yIgnored: Not used, present for API consistency by convention.

Returns:

selfobject: Fitted estimator.

fit_transform(X, y=None)[source]#

Generate missing values indicator for X.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The input data to complete.
yIgnored: Not used, present for API consistency by convention.

Returns:

Xt{ndarray, sparse matrix} of shape (n_samples, n_features) or (n_samples, n_features_with_missing): The missing indicator for input data. The data type of Xt will be boolean.

get_feature_names_out(input_features=None)[source]#

Get output feature names for transformation.

Parameters:

input_featuresarray-like of str or None, default=None

Input features.

If input_features is None, then feature_names_in_ is used as feature names in. If feature_names_in_ is not defined, then the following input feature names are generated: ["x0", "x1", ..., "x(n_features_in_ - 1)"].
If input_features is an array-like, then input_features must match feature_names_in_ if feature_names_in_ is defined.

Returns:

feature_names_outndarray of str objects: Transformed feature names.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:

transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

"default": Default output format of a transformer
"pandas": DataFrame output
"polars": Polars output
None: Transform configuration is unchanged

Added in version 1.4: "polars" option was added.

Returns:

selfestimator instance: Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

transform(X)[source]#

Generate missing values indicator for X.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The input data to complete.

Returns:

Xt{ndarray, sparse matrix} of shape (n_samples, n_features) or (n_samples, n_features_with_missing): The missing indicator for input data. The data type of Xt will be boolean.