fetch_kddcup99#
- sklearn.datasets.fetch_kddcup99(*, subset=None, data_home=None, shuffle=False, random_state=None, percent10=True, download_if_missing=True, return_X_y=False, as_frame=False, n_retries=3, delay=1.0)[source]#
 Load the kddcup99 dataset (classification).
Download it if necessary.
Classes
23
Samples total
4898431
Dimensionality
41
Features
discrete (int) or continuous (float)
Read more in the User Guide.
Added in version 0.18.
- Parameters:
 - subset{‘SA’, ‘SF’, ‘http’, ‘smtp’}, default=None
 To return the corresponding classical subsets of kddcup 99. If None, return the entire kddcup 99 dataset.
- data_homestr or path-like, default=None
 Specify another download and cache folder for the datasets. By default all scikit-learn data is stored in ‘~/scikit_learn_data’ subfolders.
Added in version 0.19.
- shufflebool, default=False
 Whether to shuffle dataset.
- random_stateint, RandomState instance or None, default=None
 Determines random number generation for dataset shuffling and for selection of abnormal samples if
subset='SA'. Pass an int for reproducible output across multiple function calls. See Glossary.- percent10bool, default=True
 Whether to load only 10 percent of the data.
- download_if_missingbool, default=True
 If False, raise an OSError if the data is not locally available instead of trying to download the data from the source site.
- return_X_ybool, default=False
 If True, returns
(data, target)instead of a Bunch object. See below for more information about thedataandtargetobject.Added in version 0.20.
- as_framebool, default=False
 If
True, returns a pandas Dataframe for thedataandtargetobjects in theBunchreturned object;Bunchreturn object will also have aframemember.Added in version 0.24.
- n_retriesint, default=3
 Number of retries when HTTP errors are encountered.
Added in version 1.5.
- delayfloat, default=1.0
 Number of seconds between retries.
Added in version 1.5.
- Returns:
 - data
Bunch Dictionary-like object, with the following attributes.
- data{ndarray, dataframe} of shape (494021, 41)
 The data matrix to learn. If
as_frame=True,datawill be a pandas DataFrame.- target{ndarray, series} of shape (494021,)
 The regression target for each sample. If
as_frame=True,targetwill be a pandas Series.- framedataframe of shape (494021, 42)
 Only present when
as_frame=True. Containsdataandtarget.- DESCRstr
 The full description of the dataset.
- feature_nameslist
 The names of the dataset columns
- target_names: list
 The names of the target columns
- (data, target)tuple if 
return_X_yis True A tuple of two ndarray. The first containing a 2D array of shape (n_samples, n_features) with each row representing one sample and each column representing the features. The second ndarray of shape (n_samples,) containing the target samples.
Added in version 0.20.
- data