ComplementNB#
- class sklearn.naive_bayes.ComplementNB(*, alpha=1.0, force_alpha=True, fit_prior=True, class_prior=None, norm=False)[source]#
- The Complement Naive Bayes classifier described in Rennie et al. (2003). - The Complement Naive Bayes classifier was designed to correct the “severe assumptions” made by the standard Multinomial Naive Bayes classifier. It is particularly suited for imbalanced data sets. - Read more in the User Guide. - Added in version 0.20. - Parameters:
- alphafloat or array-like of shape (n_features,), default=1.0
- Additive (Laplace/Lidstone) smoothing parameter (set alpha=0 and force_alpha=True, for no smoothing). 
- force_alphabool, default=True
- If False and alpha is less than 1e-10, it will set alpha to 1e-10. If True, alpha will remain unchanged. This may cause numerical errors if alpha is too close to 0. - Added in version 1.2. - Changed in version 1.4: The default value of - force_alphachanged to- True.
- fit_priorbool, default=True
- Only used in edge case with a single class in the training set. 
- class_priorarray-like of shape (n_classes,), default=None
- Prior probabilities of the classes. Not used. 
- normbool, default=False
- Whether or not a second normalization of the weights is performed. The default behavior mirrors the implementations found in Mahout and Weka, which do not follow the full algorithm described in Table 9 of the paper. 
 
- Attributes:
- class_count_ndarray of shape (n_classes,)
- Number of samples encountered for each class during fitting. This value is weighted by the sample weight when provided. 
- class_log_prior_ndarray of shape (n_classes,)
- Smoothed empirical log probability for each class. Only used in edge case with a single class in the training set. 
- classes_ndarray of shape (n_classes,)
- Class labels known to the classifier 
- feature_all_ndarray of shape (n_features,)
- Number of samples encountered for each feature during fitting. This value is weighted by the sample weight when provided. 
- feature_count_ndarray of shape (n_classes, n_features)
- Number of samples encountered for each (class, feature) during fitting. This value is weighted by the sample weight when provided. 
- feature_log_prob_ndarray of shape (n_classes, n_features)
- Empirical weights for class complements. 
- n_features_in_int
- Number of features seen during fit. - Added in version 0.24. 
- feature_names_in_ndarray of shape (n_features_in_,)
- Names of features seen during fit. Defined only when - Xhas feature names that are all strings.- Added in version 1.0. 
 
 - See also - BernoulliNB
- Naive Bayes classifier for multivariate Bernoulli models. 
- CategoricalNB
- Naive Bayes classifier for categorical features. 
- GaussianNB
- Gaussian Naive Bayes. 
- MultinomialNB
- Naive Bayes classifier for multinomial models. 
 - References - Rennie, J. D., Shih, L., Teevan, J., & Karger, D. R. (2003). Tackling the poor assumptions of naive bayes text classifiers. In ICML (Vol. 3, pp. 616-623). https://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf - Examples - >>> import numpy as np >>> rng = np.random.RandomState(1) >>> X = rng.randint(5, size=(6, 100)) >>> y = np.array([1, 2, 3, 4, 5, 6]) >>> from sklearn.naive_bayes import ComplementNB >>> clf = ComplementNB() >>> clf.fit(X, y) ComplementNB() >>> print(clf.predict(X[2:3])) [3] - fit(X, y, sample_weight=None)[source]#
- Fit Naive Bayes classifier according to X, y. - Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
- Training vectors, where - n_samplesis the number of samples and- n_featuresis the number of features.
- yarray-like of shape (n_samples,)
- Target values. 
- sample_weightarray-like of shape (n_samples,), default=None
- Weights applied to individual samples (1. for unweighted). 
 
- Returns:
- selfobject
- Returns the instance itself. 
 
 
 - get_metadata_routing()[source]#
- Get metadata routing of this object. - Please check User Guide on how the routing mechanism works. - Returns:
- routingMetadataRequest
- A - MetadataRequestencapsulating routing information.
 
 
 - get_params(deep=True)[source]#
- Get parameters for this estimator. - Parameters:
- deepbool, default=True
- If True, will return the parameters for this estimator and contained subobjects that are estimators. 
 
- Returns:
- paramsdict
- Parameter names mapped to their values. 
 
 
 - partial_fit(X, y, classes=None, sample_weight=None)[source]#
- Incremental fit on a batch of samples. - This method is expected to be called several times consecutively on different chunks of a dataset so as to implement out-of-core or online learning. - This is especially useful when the whole dataset is too big to fit in memory at once. - This method has some performance overhead hence it is better to call partial_fit on chunks of data that are as large as possible (as long as fitting in the memory budget) to hide the overhead. - Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
- Training vectors, where - n_samplesis the number of samples and- n_featuresis the number of features.
- yarray-like of shape (n_samples,)
- Target values. 
- classesarray-like of shape (n_classes,), default=None
- List of all the classes that can possibly appear in the y vector. - Must be provided at the first call to partial_fit, can be omitted in subsequent calls. 
- sample_weightarray-like of shape (n_samples,), default=None
- Weights applied to individual samples (1. for unweighted). 
 
- Returns:
- selfobject
- Returns the instance itself. 
 
 
 - predict(X)[source]#
- Perform classification on an array of test vectors X. - Parameters:
- Xarray-like of shape (n_samples, n_features)
- The input samples. 
 
- Returns:
- Cndarray of shape (n_samples,)
- Predicted target values for X. 
 
 
 - predict_joint_log_proba(X)[source]#
- Return joint log probability estimates for the test vector X. - For each row x of X and class y, the joint log probability is given by - log P(x, y) = log P(y) + log P(x|y),where- log P(y)is the class prior probability and- log P(x|y)is the class-conditional probability.- Parameters:
- Xarray-like of shape (n_samples, n_features)
- The input samples. 
 
- Returns:
- Cndarray of shape (n_samples, n_classes)
- Returns the joint log-probability of the samples for each class in the model. The columns correspond to the classes in sorted order, as they appear in the attribute classes_. 
 
 
 - predict_log_proba(X)[source]#
- Return log-probability estimates for the test vector X. - Parameters:
- Xarray-like of shape (n_samples, n_features)
- The input samples. 
 
- Returns:
- Carray-like of shape (n_samples, n_classes)
- Returns the log-probability of the samples for each class in the model. The columns correspond to the classes in sorted order, as they appear in the attribute classes_. 
 
 
 - predict_proba(X)[source]#
- Return probability estimates for the test vector X. - Parameters:
- Xarray-like of shape (n_samples, n_features)
- The input samples. 
 
- Returns:
- Carray-like of shape (n_samples, n_classes)
- Returns the probability of the samples for each class in the model. The columns correspond to the classes in sorted order, as they appear in the attribute classes_. 
 
 
 - score(X, y, sample_weight=None)[source]#
- Return accuracy on provided data and labels. - In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted. - Parameters:
- Xarray-like of shape (n_samples, n_features)
- Test samples. 
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
- True labels for - X.
- sample_weightarray-like of shape (n_samples,), default=None
- Sample weights. 
 
- Returns:
- scorefloat
- Mean accuracy of - self.predict(X)w.r.t.- y.
 
 
 - set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') ComplementNB[source]#
- Configure whether metadata should be requested to be passed to the - fitmethod.- Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with - enable_metadata_routing=True(see- sklearn.set_config). Please check the User Guide on how the routing mechanism works.- The options for each parameter are: - True: metadata is requested, and passed to- fitif provided. The request is ignored if metadata is not provided.
- False: metadata is not requested and the meta-estimator will not pass it to- fit.
- None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
- str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
 - The default ( - sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.- Added in version 1.3. - Parameters:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
- Metadata routing for - sample_weightparameter in- fit.
 
- Returns:
- selfobject
- The updated object. 
 
 
 - set_params(**params)[source]#
- Set the parameters of this estimator. - The method works on simple estimators as well as on nested objects (such as - Pipeline). The latter have parameters of the form- <component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
- Estimator parameters. 
 
- Returns:
- selfestimator instance
- Estimator instance. 
 
 
 - set_partial_fit_request(*, classes: bool | None | str = '$UNCHANGED$', sample_weight: bool | None | str = '$UNCHANGED$') ComplementNB[source]#
- Configure whether metadata should be requested to be passed to the - partial_fitmethod.- Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with - enable_metadata_routing=True(see- sklearn.set_config). Please check the User Guide on how the routing mechanism works.- The options for each parameter are: - True: metadata is requested, and passed to- partial_fitif provided. The request is ignored if metadata is not provided.
- False: metadata is not requested and the meta-estimator will not pass it to- partial_fit.
- None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
- str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
 - The default ( - sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.- Added in version 1.3. - Parameters:
- classesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
- Metadata routing for - classesparameter in- partial_fit.
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
- Metadata routing for - sample_weightparameter in- partial_fit.
 
- Returns:
- selfobject
- The updated object. 
 
 
 - set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') ComplementNB[source]#
- Configure whether metadata should be requested to be passed to the - scoremethod.- Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with - enable_metadata_routing=True(see- sklearn.set_config). Please check the User Guide on how the routing mechanism works.- The options for each parameter are: - True: metadata is requested, and passed to- scoreif provided. The request is ignored if metadata is not provided.
- False: metadata is not requested and the meta-estimator will not pass it to- score.
- None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
- str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
 - The default ( - sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.- Added in version 1.3. - Parameters:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
- Metadata routing for - sample_weightparameter in- score.
 
- Returns:
- selfobject
- The updated object. 
 
 
 
Gallery examples#
 
Sample pipeline for text feature extraction and evaluation
 
Classification of text documents using sparse features
