Release history¶
Changelog¶
Bug fixes¶
- Fixed a bug in
under_sampling.NearMiss
version 3. The indices returned were wrong. By Guillaume Lemaitre. - fixed bug for
ensemble.BalanceCascade
andcombine.SMOTEENN
and :class:`SMOTETomek. By `Guillaume Lemaitre`_.`
New features¶
- Turn off steps in
pipeline.Pipeline
using the None object. By Christos Aridas. - Add a fetching function datasets.fetch_datasets in order to get some imbalanced datasets useful for benchmarking. By Guillaume Lemaitre.
Enhancement¶
- All the unit tests have been factorized and a check_estimators has been derived from scikit-learn. By Guillaume Lemaitre.
- Script for automatic build of conda packages and uploading. By Guillaume Lemaitre
- Remove seaborn dependence and improve the examples. By Guillaume Lemaitre.
- adapt all classes to multi-class resampling. By Guillaume Lemaitre
API changes summary¶
- __init__ has been removed from the
base.SamplerMixin
to create a real mixin class. By Guillaume Lemaitre. - creation of a module exceptions to handle consistant raising of errors. By Guillaume Lemaitre.
- creation of a module utils.validation to make checking of recurrent patterns. By Guillaume Lemaitre.
- move the under-sampling methods in prototype_selection and prototype_generation submodule to make a clearer dinstinction. By Guillaume Lemaitre.
- change ratio such that it can adapt to multiple class problems. By Guillaume Lemaitre.
Deprecation¶
- deprecate the use of float as ratio in favor of dictionary, string, or callable. By Guillaume Lemaitre.
Version 0.2¶
Changelog¶
Bug fixes¶
- Fixed a bug in
under_sampling.NearMiss
which was not picking the right samples during under sampling for the method 3. By Guillaume Lemaitre. - Fixed a bug in
ensemble.EasyEnsemble
, correction of the random_state generation. By Guillaume Lemaitre and Christos Aridas. - Fixed a bug in
under_sampling.RepeatedEditedNearestNeighbours
, add additional stopping criterion to avoid that the minority class become a majority class or that a class disappear. By Guillaume Lemaitre. - Fixed a bug in
under_sampling.AllKNN
, add stopping criteria to avoid that the minority class become a majority class or that a class disappear. By Guillaume Lemaitre. - Fixed a bug in
under_sampling.CondensedNeareastNeigbour
, correction of the list of indices returned. By Guillaume Lemaitre. - Fixed a bug in
ensemble.BalanceCascade
, solve the issue to obtain a single array if desired. By Guillaume Lemaitre. - Fixed a bug in
pipeline.Pipeline
, solve to embed Pipeline in other Pipeline. By Christos Aridas . - Fixed a bug in
pipeline.Pipeline
, solve the issue to put to sampler in the same Pipeline. By Christos Aridas . - Fixed a bug in
under_sampling.CondensedNeareastNeigbour
, correction of the shape of sel_x when only one sample is selected. By Aliaksei Halachkin. - Fixed a bug in
under_sampling.NeighbourhoodCleaningRule
, selecting neighbours instead of minority class misclassified samples. By Aleksandr Loskutov. - Fixed a bug in
over_sampling.ADASYN
, correction of the creation of a new sample so that the new sample lies between the minority sample and the nearest neighbour. By Rafael Wampfler.
New features¶
- Added AllKNN under sampling technique. By Dayvid Oliveira.
- Added a module metrics implementing some specific scoring function for the problem of balancing. By Guillaume Lemaitre and Christos Aridas.
Enhancement¶
- Added support for bumpversion. By Guillaume Lemaitre.
- Validate the type of target in binary samplers. A warning is raised for the moment. By Guillaume Lemaitre and Christos Aridas.
- Change from cross_validation module to model_selection module for sklearn deprecation cycle. By Dayvid Oliveira and Christos Aridas.
API changes summary¶
- size_ngh has been deprecated in
combine.SMOTEENN
. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira . - size_ngh has been deprecated in
under_sampling.EditedNearestNeighbors
. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira. - size_ngh has been deprecated in
under_sampling.CondensedNeareastNeigbour
. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira. - size_ngh has been deprecated in
under_sampling.OneSidedSelection
. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira. - size_ngh has been deprecated in
under_sampling.NeighbourhoodCleaningRule
. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira. - size_ngh has been deprecated in
under_sampling.RepeatedEditedNearestNeighbours
. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira. - size_ngh has been deprecated in
under_sampling.AllKNN
. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira. - Two base classes
BaseBinaryclassSampler
andBaseMulticlassSampler
have been created to handle the target type and raise warning in case of abnormality. By Guillaume Lemaitre and Christos Aridas. - Move random_state to be assigned in the
SamplerMixin
initialization. By Guillaume Lemaitre. - Provide estimators instead of parameters in
combine.SMOTEENN
andcombine.SMOTETomek
. Therefore, the list of parameters have been deprecated. By Guillaume Lemaitre and Christos Aridas. - k has been deprecated in
over_sampling.ADASYN
. Use n_neighbors instead. By Guillaume Lemaitre. - k and m have been deprecated in
over_sampling.SMOTE
. Use k_neighbors and m_neighbors instead. By Guillaume Lemaitre. - n_neighbors accept KNeighborsMixin based object for
under_sampling.EditedNearestNeighbors
,under_sampling.CondensedNeareastNeigbour
,under_sampling.NeighbourhoodCleaningRule
,under_sampling.RepeatedEditedNearestNeighbours
, andunder_sampling.AllKNN
. By Guillaume Lemaitre.
Documentation changes¶
- Replace some remaining UnbalancedDataset occurences. By Francois Magimel.
- Added doctest in the documentation. By Guillaume Lemaitre.
Version 0.1¶
Changelog¶
API¶
- First release of the stable API. By Fernando Nogueira, Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira.
New methods¶
- Under-sampling
- Random majority under-sampling with replacement
- Extraction of majority-minority Tomek links
- Under-sampling with Cluster Centroids
- NearMiss-(1 & 2 & 3)
- Condensend Nearest Neighbour
- One-Sided Selection
- Neighboorhood Cleaning Rule
- Edited Nearest Neighbours
- Instance Hardness Threshold
- Repeated Edited Nearest Neighbours
- Over-sampling
- Random minority over-sampling with replacement
- SMOTE - Synthetic Minority Over-sampling Technique
- bSMOTE(1 & 2) - Borderline SMOTE of types 1 and 2
- SVM SMOTE - Support Vectors SMOTE
- ADASYN - Adaptive synthetic sampling approach for imbalanced learning
- Over-sampling followed by under-sampling
- SMOTE + Tomek links
- SMOTE + ENN
- Ensemble sampling
- EasyEnsemble
- BalanceCascade