imblearn.under_sampling.CondensedNearestNeighbour¶
- 
class imblearn.under_sampling.CondensedNearestNeighbour(ratio='auto', return_indices=False, random_state=None, size_ngh=None, n_neighbors=None, n_seeds_S=1, n_jobs=1)[source][source]¶
- Class to perform under-sampling based on the condensed nearest neighbour method. - Parameters: - ratio : str, dict, or callable, optional (default=’auto’) - Ratio to use for resampling the data set. - If str, has to be one of: (i)'minority': resample the minority class; (ii)'majority': resample the majority class, (iii)'not minority': resample all classes apart of the minority class, (iv)'all': resample all classes, and (v)'auto': correspond to'all'with for over-sampling methods and'not minority'for under-sampling methods. The classes targeted will be over-sampled or under-sampled to achieve an equal number of sample with the majority or minority class.
- If dict, the keys correspond to the targeted classes. The values correspond to the desired number of samples.
- If callable, function taking yand returns adict. The keys correspond to the targeted classes. The values correspond to the desired number of samples.
 - return_indices : bool, optional (default=False) - Whether or not to return the indices of the samples randomly selected from the majority class. - random_state : int, RandomState instance or None, optional (default=None) - If int, - random_stateis the seed used by the random number generator; If- RandomStateinstance, random_state is the random number generator; If- None, the random number generator is the- RandomStateinstance used by- np.random.- size_ngh : int, optional (default=None) - Size of the neighbourhood to consider to compute the average distance to the minority point samples. - Deprecated since version 0.2: - size_nghis deprecated from 0.2 and will be replaced in 0.4 Use- n_neighborsinstead.- n_neighbors : int or object, optional (default=KNeighborsClassifier(n_neighbors=1)) - If - int, size of the neighbourhood to consider to compute the average distance to the minority point samples. If object, an object inherited from- sklearn.neigbors.KNeighborsClassifiershould be passed.- n_seeds_S : int, optional (default=1) - Number of samples to extract in order to build the set S. - n_jobs : int, optional (default=1) - The number of threads to open if possible. - Notes - The method is based on [R35]. - Supports mutli-class resampling. - References - [R35] - (1, 2) P. Hart, “The condensed nearest neighbor rule,” In Information Theory, IEEE Transactions on, vol. 14(3), pp. 515-516, 1968. - Examples - >>> from collections import Counter >>> from sklearn.datasets import fetch_mldata >>> from imblearn.under_sampling import CondensedNearestNeighbour >>> pima = fetch_mldata('diabetes_scale') >>> X, y = pima['data'], pima['target'] >>> print('Original dataset shape {}'.format(Counter(y))) Original dataset shape Counter({1: 500, -1: 268}) >>> cnn = CondensedNearestNeighbour(random_state=42) >>> X_res, y_res = cnn.fit_sample(X, y) >>> print('Resampled dataset shape {}'.format( ... Counter(y_res))) Resampled dataset shape Counter({-1: 268, 1: 227}) - Methods - fit(X, y)- Find the classes statistics before to perform sampling. - fit_sample(X, y)- Fit the statistics and resample the data directly. - get_params([deep])- Get parameters for this estimator. - sample(X, y)- Resample the dataset. - set_params(**params)- Set the parameters of this estimator. - 
__init__(ratio='auto', return_indices=False, random_state=None, size_ngh=None, n_neighbors=None, n_seeds_S=1, n_jobs=1)[source][source]¶
 - Methods - __init__([ratio, return_indices, ...])- fit(X, y)- Find the classes statistics before to perform sampling. - fit_sample(X, y)- Fit the statistics and resample the data directly. - get_params([deep])- Get parameters for this estimator. - sample(X, y)- Resample the dataset. - set_params(**params)- Set the parameters of this estimator. - 
fit(X, y)[source]¶
- Find the classes statistics before to perform sampling. - Parameters: - X : ndarray, shape (n_samples, n_features) - Matrix containing the data which have to be sampled. - y : ndarray, shape (n_samples, ) - Corresponding label for each sample in X. - Returns: - self : object, - Return self. 
 - 
fit_sample(X, y)[source]¶
- Fit the statistics and resample the data directly. - Parameters: - X : ndarray, shape (n_samples, n_features) - Matrix containing the data which have to be sampled. - y : ndarray, shape (n_samples, ) - Corresponding label for each sample in X. - Returns: - X_resampled : ndarray, shape (n_samples_new, n_features) - The array containing the resampled data. - y_resampled : ndarray, shape (n_samples_new) - The corresponding label of X_resampled 
 - 
get_params(deep=True)[source]¶
- Get parameters for this estimator. - Parameters: - deep : boolean, optional - If True, will return the parameters for this estimator and contained subobjects that are estimators. - Returns: - params : mapping of string to any - Parameter names mapped to their values. 
 - 
sample(X, y)[source]¶
- Resample the dataset. - Parameters: - X : ndarray, shape (n_samples, n_features) - Matrix containing the data which have to be sampled. - y : ndarray, shape (n_samples, ) - Corresponding label for each sample in X. - Returns: - X_resampled : ndarray, shape (n_samples_new, n_features) - The array containing the resampled data. - y_resampled : ndarray, shape (n_samples_new) - The corresponding label of X_resampled 
 
- If