PartialDependenceDisplay#
- class sklearn.inspection.PartialDependenceDisplay(pd_results, *, features, feature_names, target_idx, deciles, kind='average', subsample=1000, random_state=None, is_categorical=None)[source]#
- Partial Dependence Plot (PDP) and Individual Conditional Expectation (ICE). - It is recommended to use - from_estimatorto create a- PartialDependenceDisplay. All parameters are stored as attributes.- For general information regarding - scikit-learnvisualization tools, see the Visualization Guide. For guidance on interpreting these plots, refer to the Inspection Guide.- For an example on how to use this class, see the following example: Advanced Plotting With Partial Dependence. - Added in version 0.22. - Parameters:
- pd_resultslist of Bunch
- Results of - partial_dependencefor- features.
- featureslist of (int,) or list of (int, int)
- Indices of features for a given plot. A tuple of one integer will plot a partial dependence curve of one feature. A tuple of two integers will plot a two-way partial dependence curve as a contour plot. 
- feature_nameslist of str
- Feature names corresponding to the indices in - features.
- target_idxint
- In a multiclass setting, specifies the class for which the PDPs should be computed. Note that for binary classification, the positive class (index 1) is always used. 
- In a multioutput setting, specifies the task for which the PDPs should be computed. 
 - Ignored in binary classification or classical regression settings. 
- decilesdict
- Deciles for feature indices in - features.
- kind{‘average’, ‘individual’, ‘both’} or list of such str, default=’average’
- Whether to plot the partial dependence averaged across all the samples in the dataset or one line per sample or both. - kind='average'results in the traditional PD plot;
- kind='individual'results in the ICE plot;
- kind='both'results in plotting both the ICE and PD on the same plot.
 - A list of such strings can be provided to specify - kindon a per-plot basis. The length of the list should be the same as the number of interaction requested in- features.- Note - ICE (‘individual’ or ‘both’) is not a valid option for 2-ways interactions plot. As a result, an error will be raised. 2-ways interaction plots should always be configured to use the ‘average’ kind instead. - Note - The fast - method='recursion'option is only available for- kind='average'and- sample_weights=None. Computing individual dependencies and doing weighted averages requires using the slower- method='brute'.- Added in version 0.24: Add - kindparameter with- 'average',- 'individual', and- 'both'options.- Added in version 1.1: Add the possibility to pass a list of string specifying - kindfor each plot.
- subsamplefloat, int or None, default=1000
- Sampling for ICE curves when - kindis ‘individual’ or ‘both’. If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to be used to plot ICE curves. If int, represents the maximum absolute number of samples to use.- Note that the full dataset is still used to calculate partial dependence when - kind='both'.- Added in version 0.24. 
- random_stateint, RandomState instance or None, default=None
- Controls the randomness of the selected samples when subsamples is not - None. See Glossary for details.- Added in version 0.24. 
- is_categoricallist of (bool,) or list of (bool, bool), default=None
- Whether each target feature in - featuresis categorical or not. The list should be same size as- features. If- None, all features are assumed to be continuous.- Added in version 1.2. 
 
- Attributes:
- bounding_ax_matplotlib Axes or None
- If - axis an axes or None, the- bounding_ax_is the axes where the grid of partial dependence plots are drawn. If- axis a list of axes or a numpy array of axes,- bounding_ax_is None.
- axes_ndarray of matplotlib Axes
- If - axis an axes or None,- axes_[i, j]is the axes on the i-th row and j-th column. If- axis a list of axes,- axes_[i]is the i-th item in- ax. Elements that are None correspond to a nonexisting axes in that position.
- lines_ndarray of matplotlib Artists
- If - axis an axes or None,- lines_[i, j]is the partial dependence curve on the i-th row and j-th column. If- axis a list of axes,- lines_[i]is the partial dependence curve corresponding to the i-th item in- ax. Elements that are None correspond to a nonexisting axes or an axes that does not include a line plot.
- deciles_vlines_ndarray of matplotlib LineCollection
- If - axis an axes or None,- vlines_[i, j]is the line collection representing the x axis deciles of the i-th row and j-th column. If- axis a list of axes,- vlines_[i]corresponds to the i-th item in- ax. Elements that are None correspond to a nonexisting axes or an axes that does not include a PDP plot.- Added in version 0.23. 
- deciles_hlines_ndarray of matplotlib LineCollection
- If - axis an axes or None,- vlines_[i, j]is the line collection representing the y axis deciles of the i-th row and j-th column. If- axis a list of axes,- vlines_[i]corresponds to the i-th item in- ax. Elements that are None correspond to a nonexisting axes or an axes that does not include a 2-way plot.- Added in version 0.23. 
- contours_ndarray of matplotlib Artists
- If - axis an axes or None,- contours_[i, j]is the partial dependence plot on the i-th row and j-th column. If- axis a list of axes,- contours_[i]is the partial dependence plot corresponding to the i-th item in- ax. Elements that are None correspond to a nonexisting axes or an axes that does not include a contour plot.
- bars_ndarray of matplotlib Artists
- If - axis an axes or None,- bars_[i, j]is the partial dependence bar plot on the i-th row and j-th column (for a categorical feature). If- axis a list of axes,- bars_[i]is the partial dependence bar plot corresponding to the i-th item in- ax. Elements that are None correspond to a nonexisting axes or an axes that does not include a bar plot.- Added in version 1.2. 
- heatmaps_ndarray of matplotlib Artists
- If - axis an axes or None,- heatmaps_[i, j]is the partial dependence heatmap on the i-th row and j-th column (for a pair of categorical features) . If- axis a list of axes,- heatmaps_[i]is the partial dependence heatmap corresponding to the i-th item in- ax. Elements that are None correspond to a nonexisting axes or an axes that does not include a heatmap.- Added in version 1.2. 
- figure_matplotlib Figure
- Figure containing partial dependence plots. 
 
 - See also - partial_dependence
- Compute Partial Dependence values. 
- PartialDependenceDisplay.from_estimator
- Plot Partial Dependence. 
 - Examples - >>> import numpy as np >>> import matplotlib.pyplot as plt >>> from sklearn.datasets import make_friedman1 >>> from sklearn.ensemble import GradientBoostingRegressor >>> from sklearn.inspection import PartialDependenceDisplay >>> from sklearn.inspection import partial_dependence >>> X, y = make_friedman1() >>> clf = GradientBoostingRegressor(n_estimators=10).fit(X, y) >>> features, feature_names = [(0,)], [f"Features #{i}" for i in range(X.shape[1])] >>> deciles = {0: np.linspace(0, 1, num=5)} >>> pd_results = partial_dependence( ... clf, X, features=0, kind="average", grid_resolution=5) >>> display = PartialDependenceDisplay( ... [pd_results], features=features, feature_names=feature_names, ... target_idx=0, deciles=deciles ... ) >>> display.plot(pdp_lim={1: (-1.38, 0.66)}) <...> >>> plt.show()   - classmethod from_estimator(estimator, X, features, *, sample_weight=None, categorical_features=None, feature_names=None, target=None, response_method='auto', n_cols=3, grid_resolution=100, percentiles=(0.05, 0.95), custom_values=None, method='auto', n_jobs=None, verbose=0, line_kw=None, ice_lines_kw=None, pd_line_kw=None, contour_kw=None, ax=None, kind='average', centered=False, subsample=1000, random_state=None)[source]#
- Partial dependence (PD) and individual conditional expectation (ICE) plots. - Partial dependence plots, individual conditional expectation plots, or an overlay of both can be plotted by setting the - kindparameter. This method generates one plot for each entry in- features. The plots are arranged in a grid with- n_colscolumns. For one-way partial dependence plots, the deciles of the feature values are shown on the x-axis. For two-way plots, the deciles are shown on both axes and PDPs are contour plots.- For general information regarding - scikit-learnvisualization tools, see the Visualization Guide. For guidance on interpreting these plots, refer to the Inspection Guide.- For an example on how to use this class method, see Partial Dependence and Individual Conditional Expectation Plots. - Note - PartialDependenceDisplay.from_estimatordoes not support using the same axes with multiple calls. To plot the partial dependence for multiple estimators, please pass the axes created by the first call to the second call:- >>> from sklearn.inspection import PartialDependenceDisplay >>> from sklearn.datasets import make_friedman1 >>> from sklearn.linear_model import LinearRegression >>> from sklearn.ensemble import RandomForestRegressor >>> X, y = make_friedman1() >>> est1 = LinearRegression().fit(X, y) >>> est2 = RandomForestRegressor().fit(X, y) >>> disp1 = PartialDependenceDisplay.from_estimator(est1, X, ... [1, 2]) >>> disp2 = PartialDependenceDisplay.from_estimator(est2, X, [1, 2], ... ax=disp1.axes_) - Warning - For - GradientBoostingClassifierand- GradientBoostingRegressor, the- 'recursion'method (used by default) will not account for the- initpredictor of the boosting process. In practice, this will produce the same values as- 'brute'up to a constant offset in the target response, provided that- initis a constant estimator (which is the default). However, if- initis not a constant estimator, the partial dependence values are incorrect for- 'recursion'because the offset will be sample-dependent. It is preferable to use the- 'brute'method. Note that this only applies to- GradientBoostingClassifierand- GradientBoostingRegressor, not to- HistGradientBoostingClassifierand- HistGradientBoostingRegressor.- Added in version 1.0. - Parameters:
- estimatorBaseEstimator
- A fitted estimator object implementing predict, predict_proba, or decision_function. Multioutput-multiclass classifiers are not supported. 
- X{array-like, dataframe} of shape (n_samples, n_features)
- Xis used to generate a grid of values for the target- features(where the partial dependence will be evaluated), and also to generate values for the complement features when the- methodis- 'brute'.
- featureslist of {int, str, pair of int, pair of str}
- The target features for which to create the PDPs. If - features[i]is an integer or a string, a one-way PDP is created; if- features[i]is a tuple, a two-way PDP is created (only supported with- kind='average'). Each tuple must be of size 2. If any entry is a string, then it must be in- feature_names.
- sample_weightarray-like of shape (n_samples,), default=None
- Sample weights are used to calculate weighted means when averaging the model output. If - None, then samples are equally weighted. If- sample_weightis not- None, then- methodwill be set to- 'brute'. Note that- sample_weightis ignored for- kind='individual'.- Added in version 1.3. 
- categorical_featuresarray-like of shape (n_features,) or shape (n_categorical_features,), dtype={bool, int, str}, default=None
- Indicates the categorical features. - None: no feature will be considered categorical;
- boolean array-like: boolean mask of shape - (n_features,)indicating which features are categorical. Thus, this array has the same shape has- X.shape[1];
- integer or string array-like: integer indices or strings indicating categorical features. 
 - Added in version 1.2. 
- feature_namesarray-like of shape (n_features,), dtype=str, default=None
- Name of each feature; - feature_names[i]holds the name of the feature with index- i. By default, the name of the feature corresponds to their numerical index for NumPy array and their column name for pandas dataframe.
- targetint, default=None
- In a multiclass setting, specifies the class for which the PDPs should be computed. Note that for binary classification, the positive class (index 1) is always used. 
- In a multioutput setting, specifies the task for which the PDPs should be computed. 
 - Ignored in binary classification or classical regression settings. 
- response_method{‘auto’, ‘predict_proba’, ‘decision_function’}, default=’auto’
- Specifies whether to use predict_proba or decision_function as the target response. For regressors this parameter is ignored and the response is always the output of predict. By default, predict_proba is tried first and we revert to decision_function if it doesn’t exist. If - methodis- 'recursion', the response is always the output of decision_function.
- n_colsint, default=3
- The maximum number of columns in the grid plot. Only active when - axis a single axis or- None.
- grid_resolutionint, default=100
- The number of equally spaced points on the axes of the plots, for each target feature. This parameter is overridden by - custom_valuesif that parameter is set.
- percentilestuple of float, default=(0.05, 0.95)
- The lower and upper percentile used to create the extreme values for the PDP axes. Must be in [0, 1]. This parameter is overridden by - custom_valuesif that parameter is set.
- custom_valuesdict
- A dictionary mapping the index of an element of - featuresto an array of values where the partial dependence should be calculated for that feature. Setting a range of values for a feature overrides- grid_resolutionand- percentiles.- Added in version 1.7. 
- methodstr, default=’auto’
- The method used to calculate the averaged predictions: - 'recursion'is only supported for some tree-based estimators (namely- GradientBoostingClassifier,- GradientBoostingRegressor,- HistGradientBoostingClassifier,- HistGradientBoostingRegressor,- DecisionTreeRegressor,- RandomForestRegressorbut is more efficient in terms of speed. With this method, the target response of a classifier is always the decision function, not the predicted probabilities. Since the- 'recursion'method implicitly computes the average of the ICEs by design, it is not compatible with ICE and thus- kindmust be- 'average'.
- 'brute'is supported for any estimator, but is more computationally intensive.
- 'auto': the- 'recursion'is used for estimators that support it, and- 'brute'is used otherwise. If- sample_weightis not- None, then- 'brute'is used regardless of the estimator.
 - Please see this note for differences between the - 'brute'and- 'recursion'method.
- n_jobsint, default=None
- The number of CPUs to use to compute the partial dependences. Computation is parallelized over features specified by the - featuresparameter.- Nonemeans 1 unless in a- joblib.parallel_backendcontext.- -1means using all processors. See Glossary for more details.
- verboseint, default=0
- Verbose output during PD computations. 
- line_kwdict, default=None
- Dict with keywords passed to the - matplotlib.pyplot.plotcall. For one-way partial dependence plots. It can be used to define common properties for both- ice_lines_kwand- pdp_line_kw.
- ice_lines_kwdict, default=None
- Dictionary with keywords passed to the - matplotlib.pyplot.plotcall. For ICE lines in the one-way partial dependence plots. The key value pairs defined in- ice_lines_kwtakes priority over- line_kw.
- pd_line_kwdict, default=None
- Dictionary with keywords passed to the - matplotlib.pyplot.plotcall. For partial dependence in one-way partial dependence plots. The key value pairs defined in- pd_line_kwtakes priority over- line_kw.
- contour_kwdict, default=None
- Dict with keywords passed to the - matplotlib.pyplot.contourfcall. For two-way partial dependence plots.
- axMatplotlib axes or array-like of Matplotlib axes, default=None
- If a single axis is passed in, it is treated as a bounding axes and a grid of partial dependence plots will be drawn within these bounds. The - n_colsparameter controls the number of columns in the grid.
- If an array-like of axes are passed in, the partial dependence plots will be drawn directly into these axes. 
- If - None, a figure and a bounding axes is created and treated as the single axes case.
 
- kind{‘average’, ‘individual’, ‘both’}, default=’average’
- Whether to plot the partial dependence averaged across all the samples in the dataset or one line per sample or both. - kind='average'results in the traditional PD plot;
- kind='individual'results in the ICE plot.
 - Note that the fast - method='recursion'option is only available for- kind='average'and- sample_weights=None. Computing individual dependencies and doing weighted averages requires using the slower- method='brute'.
- centeredbool, default=False
- If - True, the ICE and PD lines will start at the origin of the y-axis. By default, no centering is done.- Added in version 1.1. 
- subsamplefloat, int or None, default=1000
- Sampling for ICE curves when - kindis ‘individual’ or ‘both’. If- float, should be between 0.0 and 1.0 and represent the proportion of the dataset to be used to plot ICE curves. If- int, represents the absolute number samples to use.- Note that the full dataset is still used to calculate averaged partial dependence when - kind='both'.
- random_stateint, RandomState instance or None, default=None
- Controls the randomness of the selected samples when subsamples is not - Noneand- kindis either- 'both'or- 'individual'. See Glossary for details.
 
- Returns:
- displayPartialDependenceDisplay
 
- display
 - See also - partial_dependence
- Compute Partial Dependence values. 
 - Examples - >>> import matplotlib.pyplot as plt >>> from sklearn.datasets import make_friedman1 >>> from sklearn.ensemble import GradientBoostingRegressor >>> from sklearn.inspection import PartialDependenceDisplay >>> X, y = make_friedman1() >>> clf = GradientBoostingRegressor(n_estimators=10).fit(X, y) >>> PartialDependenceDisplay.from_estimator(clf, X, [0, (0, 1)]) <...> >>> plt.show()   
 - plot(*, ax=None, n_cols=3, line_kw=None, ice_lines_kw=None, pd_line_kw=None, contour_kw=None, bar_kw=None, heatmap_kw=None, pdp_lim=None, centered=False)[source]#
- Plot partial dependence plots. - Parameters:
- axMatplotlib axes or array-like of Matplotlib axes, default=None
- If a single axis is passed in, it is treated as a bounding axes
- and a grid of partial dependence plots will be drawn within these bounds. The - n_colsparameter controls the number of columns in the grid.
 
- If an array-like of axes are passed in, the partial dependence
- plots will be drawn directly into these axes. 
 
- If None, a figure and a bounding axes is created and treated
- as the single axes case. 
 
- If 
 
- n_colsint, default=3
- The maximum number of columns in the grid plot. Only active when - axis a single axes or- None.
- line_kwdict, default=None
- Dict with keywords passed to the - matplotlib.pyplot.plotcall. For one-way partial dependence plots.
- ice_lines_kwdict, default=None
- Dictionary with keywords passed to the - matplotlib.pyplot.plotcall. For ICE lines in the one-way partial dependence plots. The key value pairs defined in- ice_lines_kwtakes priority over- line_kw.- Added in version 1.0. 
- pd_line_kwdict, default=None
- Dictionary with keywords passed to the - matplotlib.pyplot.plotcall. For partial dependence in one-way partial dependence plots. The key value pairs defined in- pd_line_kwtakes priority over- line_kw.- Added in version 1.0. 
- contour_kwdict, default=None
- Dict with keywords passed to the - matplotlib.pyplot.contourfcall for two-way partial dependence plots.
- bar_kwdict, default=None
- Dict with keywords passed to the - matplotlib.pyplot.barcall for one-way categorical partial dependence plots.- Added in version 1.2. 
- heatmap_kwdict, default=None
- Dict with keywords passed to the - matplotlib.pyplot.imshowcall for two-way categorical partial dependence plots.- Added in version 1.2. 
- pdp_limdict, default=None
- Global min and max average predictions, such that all plots will have the same scale and y limits. - pdp_lim[1]is the global min and max for single partial dependence curves.- pdp_lim[2]is the global min and max for two-way partial dependence curves. If- None(default), the limit will be inferred from the global minimum and maximum of all predictions.- Added in version 1.1. 
- centeredbool, default=False
- If - True, the ICE and PD lines will start at the origin of the y-axis. By default, no centering is done.- Added in version 1.1. 
 
- Returns:
- displayPartialDependenceDisplay
- Returns a - PartialDependenceDisplayobject that contains the partial dependence plots.
 
- display
 
 
Gallery examples#
 
Partial Dependence and Individual Conditional Expectation Plots
 
     
 
 
 
 
 
