Multiclass classification with under-sampling¶

Some balancing methods allow for balancing dataset with multiples classes. We provide an example to illustrate the use of those methods which do not differ from the binary case.

Out:

pre       rec       spe        f1       geo       iba       sup

          0       1.00      1.00      1.00      1.00      1.00      1.00         8
          1       1.00      0.73      1.00      0.84      0.93      0.88        11
          2       0.80      1.00      0.84      0.89      0.89      0.78        12

avg / total       0.92      0.90      0.94      0.90      0.94      0.87        31

# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>
# License: MIT

from sklearn.datasets import load_iris
from sklearn.svm import LinearSVC
from sklearn.model_selection import train_test_split

from imblearn.under_sampling import NearMiss
from imblearn.pipeline import make_pipeline
from imblearn.metrics import classification_report_imbalanced

print(__doc__)

RANDOM_STATE = 42

# Create a folder to fetch the dataset
iris = load_iris()
# Make the dataset imbalanced
# Select only half of the first class
iris.data = iris.data[25:-1, :]
iris.target = iris.target[25:-1]

X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target,
                                                    random_state=RANDOM_STATE)

# Create a pipeline
pipeline = make_pipeline(NearMiss(version=2, random_state=RANDOM_STATE),
                         LinearSVC(random_state=RANDOM_STATE))
pipeline.fit(X_train, y_train)

# Classify and report the results
print(classification_report_imbalanced(y_test, pipeline.predict(X_test)))

Total running time of the script: ( 0 minutes 0.294 seconds)

Generated by Sphinx-Gallery