PyConDE & PyData Berlin 2022
Inspect an try to interpret your scikit-learn
machine-learning models
Abstract: This tutorial is subdivided into three parts. First, we focus on
the family of linear models and present the common pitfalls to be aware of when
interpreting the coefficients of such models. Then, we look at a larger range
of models (e.g. gradient-boosting) and put into practice available inspection
techniques developed in scikit-learn
to inspect such models. Finally, we
present other tools to interpret models (i.e. shap
), not currently available
in scikit-learn
, but widely used in practice.
PyLadies Paris 2022
Inspecting your predictive model in Python
Abstract: This presentation intends to present the available tools allowing
to inspect your predictive model in Python. We will first quickly present
what we mean by predictive model and what it implies when one wants to explain
the decision of such a model. We will provide a quick taxonomy of the current
methods intending to explain predictive model. Finally, we will give an
overview of the available tools in scikit-learn
and shap
.
Euler Hermes 2019
Learning from imbalanced datasets: state of the art
Abstract: This presentation gives an overview of the state of the art of predictive modelling with imbalanced datasets.
Euroscipy 2019
Rapid Analytics & Model Prototyping (RAMP)
Abstract: We will give an overview of the RAMP framework, which provides a platform to organize reproducible and transparent data challenges. RAMP workflow is a python package used to define and formalize the data science problem to be solved. It can be used as a standalone package and allows a user to prototype different solutions. In addition to RAMP workflow, a set of packages have been developed allowing to share and collaborate around the developer solutions. Therefore, RAMP database provides a database structure to store the solutions of different users and the performance of these solutions. RAMP engine is the package to run the user solutions (possibly on the cloud) and populate the database. Finally, RAMP frontend is the web frontend where users can upload their solutions and which shows the leaderboard of the challenge. The project is open-source and can be deployed on any local server. The framework has been used at the Paris-Saclay Center for Data Science for setting up and solving about twenty scientific problems, for organizing collaborative data challenges, for organizing scientific sub-communities around these events, and for training novice data scientists.
Slides RAMP board RAMP workflow
Introduction to scikit-learn
: from model fitting to model interpretation
Abstract: Our introduction to scikit-learn will be subdivided into 2 parts. We will give a general introduction to scikit-learn presenting basic concepts around cross-validation, pipeline estimator, and hyperparameter search. Then, we will focus on model interpretation presenting the challenges and the available tools to understand a trained machine-learning model: partial independence plot, features importance, LIME, shapley values, etc.
Euroscipy 2018
Imbalanced-learn: A scikit-learn-contrib to tackle learning from imbalanced data set
Abstract: The curse of imbalanced data set refers to data sets in which the
number of samples in one class is less than in others. This issue is often
encountered in real world data sets such as medical imaging applications
(e.g. cancer detection), fraud detection, etc. In such particular condition,
machine learning algorithms learn sub-optimal models which will generally favor
the class having the largest number of samples. In this talk, we review the
different available strategy to learn a statistical model under those specific
condition. Then, we will present imbalanced-learn
package and the new
features which will be released in the new version 0.4.
CDS Pitching Day 2017
RAMP on predicting autism from resting-state functional MRI and anatomical MRI
Abstract: This talk will present the ongoing preparation of a RAMP aiming at distinguishing subjects with Autism Spectrum Disorder (ASD) from typical control subjects. This analysis will use the Autism Brain Imaging Data Exchange (ABIDE I & II) database and data from Robert Debre Hospital based on R-fMRI and anatomical MRI. We will particularly focus on presenting the problematic, the typical pipeline answering this problem, and the current status of this RAMP. This work is in collaboration with the Pasteur Institute (Neuroanatomy group of the Unit of Human Genetics and Cognitive Functions).
Euroscipy 2017
Leverage knowledge from under-represented classes in machine learning: imbalanced-learn release 0.3.0
Abstract: The curse of imbalanced data set refers to data sets in which the number of samples in one class is less than in others. This issue is often encountered in real world data sets such as medical imaging applications (e.g. cancer detection), fraud detection, etc. In such particular condition, machine learning algorithms learn sub-optimal models which will generally favor the class having the largest number of samples. In this talks, we present the new feature which are available in the release 0.3.0.
PyParis 2017
Leverage knowledge from under-represented classes in machine learning: an introduction to imbalanced-learn
Abstract: The curse of imbalanced data set refers to data sets in which the number of samples in one class is less than in others. This issue is often encountered in real world data sets such as medical imaging applications (e.g. cancer detection), fraud detection, etc. In such particular condition, machine learning algorithms learn sub-optimal models which will generally favor the class having the largest number of samples. In this talk, we will present the imbalanced-learn package which implement some of the state-of-the-art algorithms, tackling the class imbalance problem.