Day 1 Breakout

Activities for the Day 1 Breakout Session

Pre-activity 1: Ice-breaker

Take a few minutes to get to know the members of your breakout group. Take turns to cover the following points about yourself:

  1. Your name, research institution and field of research.

  2. How you currently use computational tools for your research, if any, and specifically AI-based tools.

  3. How you think data science tools can contribute to your research going forward.

Lastly, select a scribe for your breakout group as well as someone else who will provide some feedback on your group’s discussion for the next two tasks during the feedback session.

Task 1: Data Cleaning

The Community for Open-Antimicrobial Drug Discovery (CO-ADD) has curated a database of compounds that have been tested for activity against a set of infectious bacteria known as the ESKAPE pathogens. This is a great source of data for training models that can predict a compound’s activity against bacteria. However, we will need to clean the raw data first.

This activity is split into three parts:

  1. Follow the task 1 guidance (day 1 breakout slidesarrow-up-right) to download a dataset from CO-ADD.

  2. Answer questions 1 to 5.

  3. Deliverable: Compile the list of data issues (question 4) that your group identified within the raw dataset which needs to be addressed to clean the dataset.

Questions:

  1. What is the assay(s) in this dataset measuring?

  2. Why should datasets have each of the following properties before we use them to train models

    1. Completeness

    2. Consistency

    3. Accuracy

    4. Relevancy

  3. What problem(s) might result from a dataset that does not have each of these properties when used to train a model?

  4. Deliverable: Find examples of data inconsistencies that are causing the dataset to not be complete, consistent, accurate, or relevant for modelling A. baumanni activity. How could you address each of the points you found in (b) to clean the dataset?

  5. Were there any other problems that you thought to look for but were not a problem in this dataset?

Deliverable:

Place your answers to question 4 in a word document. You will email these answers after adding the task 2 deliverable to this document.

Task 2: Chemical Space Exploration

Let’s imagine we have trained a model to predict antiplasmodium activity and we have two potential compound libraries that we now could virtually screen for new chemical hits. We want to start by selecting just one of these two libraries based on how relevant the training data of our model might is to each library.

Steps to complete this activity:

  1. Follow the task 2 guidance (day 1 breakout slidesarrow-up-right) to download a dataset from ChEMBL.

  2. Answer questions 1 to 6.

  3. Deliverable (question 5): a) A screenshot of your UMAP chemical space, b) the library you have chosen to screen and a short reason why you choose this library.

Questions:

  1. What do you understand by the term 'chemical space’?

  2. How does this differ from the concept of 'drug-like molecules’?

  3. Why should the compounds we use to train models be similar to the compounds we want to obtain predictions for?

  4. What type of training data is needed for each of the following scenarios?:

    1. Virtual screening of broad chemical space for novel chemical hits?

    2. Ranking closely related analogues within a chemical series to prioritize compounds for synthesis?

  5. Deliverable: Follow the guidance for task 2 to download and plot the chemical space of several antiplasmodial screening libraries. Do you think that a model that has been trained on the St Jude 3D7 dataset would be better at predicting activity in the MMV Malaria Box or the Open Source Malaria libraries? Why?

  6. How could we improve our model once we’ve experimentally tested the first set of compounds that were selected from the virtual screening?

Deliverable:

Add to your deliverable document from Task 1 a) A screenshot of your UMAP chemical space, b) the library you have chosen to screen and a short reason why you choose this library. Then email this document to the facilitator by the end of the breakout session.

Last updated