SimpleImputer#

class sklearn.impute.SimpleImputer(*, missing_values=nan, strategy='mean', fill_value=None, copy=True, add_indicator=False, keep_empty_features=False)[source]#

Univariate imputer for completing missing values with simple strategies.

Replace missing values using a descriptive statistic (e.g. mean, median, or most frequent) along each column, or using a constant value.

See also

IterativeImputer: Multivariate imputer that estimates values to impute for each feature with missing values from all the others.
KNNImputer: Multivariate imputer that estimates missing features using nearest samples.

Notes

Columns which only contained missing values at fit are discarded upon transform if strategy is not "constant".

In a prediction context, simple imputation usually performs poorly when associated with a weak learner. However, with a powerful learner, it can lead to as good or better performance than complex imputation such as IterativeImputer or KNNImputer.

Examples

&gt;&gt;&gt; import numpy as np
&gt;&gt;&gt; from sklearn.impute import SimpleImputer
&gt;&gt;&gt; imp_mean = SimpleImputer(missing_values=np.nan, strategy='mean')
&gt;&gt;&gt; imp_mean.fit([[7, 2, 3], [4, np.nan, 6], [10, 5, 9]])
SimpleImputer()
&gt;&gt;&gt; X = [[np.nan, 2, 3], [4, np.nan, 6], [10, np.nan, 9]]
&gt;&gt;&gt; print(imp_mean.transform(X))
[[ 7.   2.   3. ]
 [ 4.   3.5  6. ]
 [10.   3.5  9. ]]

For a more detailed example see Imputing missing values before building an estimator.

fit(X, y=None)[source]#

Fit the imputer on X.

Parameters:

X{array-like, sparse matrix}, shape (n_samples, n_features): Input data, where n_samples is the number of samples and n_features is the number of features.
yIgnored: Not used, present here for API consistency by convention.

Returns:

selfobject: Fitted estimator.

fit_transform(X, y=None, **fit_params)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:

Xarray-like of shape (n_samples, n_features): Input samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None: Target values (None for unsupervised transformations).
**fit_paramsdict: Additional fit parameters.

Returns:

X_newndarray array of shape (n_samples, n_features_new): Transformed array.

get_feature_names_out(input_features=None)[source]#

get output feature names for transformation.

Parameters:

input_featuresarray-like of str or None, default=None

Input features.

If input_features is None, then feature_names_in_ is used as feature names in. If feature_names_in_ is not defined, then the following input feature names are generated: ["x0", "x1", ..., "x(n_features_in_ - 1)"].
If input_features is an array-like, then input_features must match feature_names_in_ if feature_names_in_ is defined.

Returns:

feature_names_outndarray of str objects: Transformed feature names.

get_metadata_routing()[source]#

get metadata routing of this object.

Please check User guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

inverse_transform(X)[source]#

Convert the data back to the original representation.

Inverts the transform operation performed on an array. This operation can only be performed after SimpleImputer is instantiated with add_indicator=True.

Note that inverse_transform can only invert the transform in features that have binary indicators for missing values. If a feature has no missing values at fit time, the feature won’t have a binary indicator, and the imputation done at transform time won’t be inverted.

Added in version 0.24.

Parameters:

Xarray-like of shape (n_samples, n_features + n_features_missing_indicator): The imputed data to be reverted to original data. It has to be an augmented array of imputed data and the missing indicator mask.

Returns:

X_originalndarray of shape (n_samples, n_features): The original X with missing values as it was prior to imputation.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:

transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

"default": Default output format of a transformer
"pandas": DataFrame output
"polars": Polars output
None: Transform configuration is unchanged

Added in version 1.4: "polars" option was added.

Returns:

selfestimator instance: Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

transform(X)[source]#

Impute all missing values in X.

Parameters:

X{array-like, sparse matrix}, shape (n_samples, n_features): The input data to complete.

Returns:

X_imputed{ndarray, sparse matrix} of shape (n_samples, n_features_out): X with imputed values.

gallery examples#

Release Highlights for scikit-learn 1.5

Release Highlights for scikit-learn 1.1

Release Highlights for scikit-learn 0.23

Combine predictors using stacking

Permutation Importance vs Random Forest Feature Importance (MDI)

Displaying Pipelines

Displaying estimators and complex pipelines

Introducing the set_output API

Imputing missing values before building an estimator

Imputing missing values with variants of IterativeImputer

Column Transformer with Mixed Types