ML4 - Advanced course: S2

Record linkage with supervised ML


In this session, we will introduce the taxonomy of supervised ML methods, with a focus on binary classification methods. Like in ML3, we will present a real-case scenario, this time based on an analysis of CIDRZ data. Different to ML3, focus will be put on three key steps (rather than the full pipeline), namely schema matching, full name sorting and comparison vector classification. Python code (but not data) will be fully shared with participants through the course GitHub repository.

Learning outcomes

It is impossible to cover in one session the plethora of supervised ML methods available today. Rather, participants will see in detail, and in a real case, how three supervised ML methods are applied to RL, including means to validate them.