imbalanced-learn
0.3.0.dev0

User Documentation

  • Getting Started
  • Support

API Documentation

  • imbalanced-learn API

Tutorial - Examples

  • General examples
  • Examples based on real world datasets
  • Dataset examples
  • Evaluation examples
  • Model Selection
    • Plotting Validation Curves
    • Example using over-sampling class methods
    • Pipeline examples
    • Example using under-sampling class methods
      • Tomek links
      • One-sided selection
      • Random under-sampling
      • Neighbourhood Cleaning Rule
      • Condensed nearest-neighbour
      • Cluster centroids
      • Instance Hardness Threshold
      • Nearmiss 1 & 2 & 3
      • ENN, RENN, All-KNN

Addtional information

  • Release history
  • To Do list
  • About us
imbalanced-learn
  • Docs »
  • General examples »
  • Tomek links
  • View page source

Tomek linksΒΆ

An illustration of the Tomek links method.

../../_images/sphx_glr_plot_tomek_links_001.png
# Authors: Andreas Mueller
#          Christos Aridas
#          Guillaume Lemaitre <g.lemaitre58@gmail.com>
# License: MIT


import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.utils import shuffle

from imblearn.under_sampling import TomekLinks

print(__doc__)

rng = np.random.RandomState(0)
n_samples_1 = 500
n_samples_2 = 50
X_syn = np.r_[1.5 * rng.randn(n_samples_1, 2),
              0.5 * rng.randn(n_samples_2, 2) + [2, 2]]
y_syn = np.array([0] * (n_samples_1) + [1] * (n_samples_2))
X_syn, y_syn = shuffle(X_syn, y_syn)
X_syn_train, X_syn_test, y_syn_train, y_syn_test = train_test_split(X_syn,
                                                                    y_syn)

# remove Tomek links
tl = TomekLinks(return_indices=True)
X_resampled, y_resampled, idx_resampled = tl.fit_sample(X_syn, y_syn)

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)

idx_samples_removed = np.setdiff1d(np.arange(X_syn.shape[0]),
                                   idx_resampled)
idx_class_0 = y_resampled == 0
plt.scatter(X_resampled[idx_class_0, 0], X_resampled[idx_class_0, 1],
            alpha=.8, label='Class #0')
plt.scatter(X_resampled[~idx_class_0, 0], X_resampled[~idx_class_0, 1],
            alpha=.8, label='Class #1')
plt.scatter(X_syn[idx_samples_removed, 0], X_syn[idx_samples_removed, 1],
            alpha=.8, label='Removed samples')

# make nice plotting
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.get_xaxis().tick_bottom()
ax.get_yaxis().tick_left()
ax.spines['left'].set_position(('outward', 10))
ax.spines['bottom'].set_position(('outward', 10))
ax.set_xlim([-5, 5])
ax.set_ylim([-5, 5])
plt.yticks(range(-5, 6))
plt.xticks(range(-5, 6))

plt.title('Under-sampling removing Tomek links')
plt.legend()

plt.show()

Total running time of the script: ( 0 minutes 0.130 seconds)

Download Python source code: plot_tomek_links.py
Download Jupyter notebook: plot_tomek_links.ipynb

Generated by Sphinx-Gallery

Next Previous

© Copyright 2016, G. Lemaitre, F. Nogueira, D. Oliveira, C. Aridas.

Built with Sphinx using a theme provided by Read the Docs.