Introduction to machine learning#

This tutorial introduces how to use scikit-learn to craft predictive models using machine learning.

Static version of the course#

You can preview the course as a JupyterBook. The table of contents is available below. If you want to execute the notebooks yourself, refer to the next section.

Table of contents#

Hyperparameter tuning

Predicting with uncertainty

Data challenge problem

Executable version of the course#

The following dependencies are required for the course:

  • jupyterlab

  • jupytext

  • notebook

  • numpy

  • scipy

  • scikit-learn

  • skrub

  • pandas

  • pyarrow

  • matplotlib

  • seaborn

We offer several ways to run the course locally. Depending on your favorite package manager, you can use one of the following options:

  • JupyterLite: if you want to avoid installing anything on your computer.

  • pixi: if you want the latest cutting-edge technology.

  • conda: if you want to stick to a more traditional approach.

  • pip: if you want to use the standard Python package manager.

Use JupyterLite#

JupyterLite is JupyterLab distribution running in the browser. It uses the Pyodide kernel. In short, you can click on the badge below to start the course in your browser. The lecture notes are located in content/notebooks.

Launch JupyterLite

Here, we describe the pros and cons of this approach.

Pros:

  • No installation required

  • Fast to start

  • No need to configure Python environment

Cons:

  • The execution of the first cell is always slow because it requires to potentially download the package and intialize the kernel.

  • You will witness that we need to call %pip install to install a couple of packages in addition of the import statements in the notebook.

  • We need to use pyodide-http to load some datasets when fetching from the internet.

  • We need to make some defensive import when those are optional dependencies of some libraries, e.g. importing matplotlib when using pandas plot.

Use pixi, conda or pip#

Prerequisites#

First clone the repository:

git clone https://github.com/glemaitre/traces-sklearn.git

Alternatively, download an archive at the following link.

Install the package manager#

For pixi, refer to the official website for installation.

For conda, download and install the latest version of miniforge from the official website.

For pip, it is already installed if you have Python.

Install the dependencies#

For pixi, you don’t need to do anything. It will be automatically installed in the next step.

For conda, you can install the dependencies using the environment.yml file:

conda env create --file environment.yml

For pip, you can install the dependencies using the requirements.txt file:

pip install -r requirements.txt

Launching Jupyter Lab#

To launch Jupyter Lab, run the following command:

pixi run jupyter lab

The Python environment and necessary packages will be automatically installed for you.

For conda, you need to activate the environment:

conda activate traces-sklearn

Then, for conda and pip, you can launch Jupyter Lab with:

jupyter lab

Opening lecture notes#

The lecture notes are available in the python_files directory. To open the Python file as notebook, you need to right click on the file and select Open with -> Notebook. This is using jupytext to interpret those files as notebooks.

Alternatively, you convert those files into notebooks.

With pixi, you can run:

pixi run -e docs convert-to-notebooks

With conda and pip, you can run the jupytext command:

jupytext --to notebook ./content/python_files/*.py
mkdir -p ./content/notebooks
mv ./content/python_files/*.ipynb ./content/notebooks