Day 2 Breakout
Activities for the Day 2 Breakout Session
Introduction
A virtual screening cascade allows us to mimic in the computer some of the experimental steps we must do to identify new drug leads. By filtering out molecules with predicted low activities, or undesired side effects, we can lower the cost and time to find new drug candidates. Ideally, we can build virtual screening cascades based off our own data, but for many assays we do not have readily available experimental data. In these situations, we can leverage models developed by third parties and apply them to our problem.
Virtual screening cascades are not meant to substitute experimental testing, but act as a decision-making support tool.
A.baumannii activity prediction
In this activity, we will replicate the work described in Liu et al, 2023, where they build an ML model to identify novel A.baumannii inhibitors and use it to filter the Drug Repurposing Hub.
During the skills development session, we have done a deep dive into ML model building using the A.baumannii model as an example. Now, you have to download the list of compounds available in the Drug Repurposing Hub and continue the "virtual screening" similar to the original author's work. To that end, we suggest running predictions against A.baumannii activity and a few accessory models available through Ersilia to select the best candidates. In short, the steps to follow are:
Download the Drug Repurposing Hub data for your group from this link.
Look at the pre-selected Ersilia Model Hub models available online in our platform.
Select which models would be relevant to your exercise and why, and take notes on model interpretation, expected results, priority level etc.
Run predictions for the Drug Repurposing Hub molecules using the selected models.
Select the best molecule candidates based on your defined filters of activity, ADME properties and other considerations.
To simplify the exercise, we have prepared 5 subsets of data from the Drug Repurposing Hub. Each group should use its assigned dataset only
Task 1: Model selection
In order to limit the exercise, please limit your screening to the following models:
A.baumannii Activity: eos3804
General Antibiotic Activity: eos4e40
Cardiotoxicity: eos43at
Synthetic Accessibility: eos9ei3
ADME properties: eos7d58
Natural Product Likeness: eos9yui
For each model, think about the following questions:
What type of model is it (classification or regression)?
What is the training dataset? (refer to the original publication listed above)
What is the interpretation of the model outcome?
What cut-off, if any, we should use for that particular model?
In addition, think about the following concepts:
Does the outcome of the model make sense? If it does not make sense, perhaps we have the wrong interpretation of the model output.
Is the cut-off I have selected too stringent (i.e, I am losing too many molecules and I should be more permissive?)
Is this model very relevant for the current dataset (i.e, is malaria activity equally important as natural product likeness?)
Deliverable
Fill in the following excel table and send it to gemma[at]ersilia.io.
Model information can be found in the metadata of the model available through Ersilia's GitHub repository, but it's best to read the original publication. Publications are available in this folder with highlighted sections to facilitate model understanding.
Task 2: Molecule prioritization
Next, let's use the models we have discussed to run some predictios!
Download the compound library corresponding to your groups from here.
Go to the Ersilia GUI and run evaluations.
Download all CSV files into a working directory. In case a model evaluation fails, feel free to download precalculations from here.
Use this simple app to merge your CSV files or, alternatively, use Excel to concatenate the CSV files.
Open the downloaded Excel file and use the Legends tab to learn more about each column.
Select up to 5 compounds, based on bioactivity, ADME and toxicity profiles.
Make a short slide deck. Use this template:
Model relevance and why: did you use all models? Which ones or which columns from each model do you consider more relevant?
Overview of the screening results. Were the results as expected?
Selected compounds and rationale. Which compounds did you choose, and why? Did you identify an optimal compound? What would you do next?
Present!
Deliverable
List of top 5 candidates, with their molecular structures visualised in the power point. Send your files to miquel[at]ersilia.io. Name your file: ai_workshop_green.pptx, ai_workshop_yellow.pptx, etc.
Feel free to use these tools:
AI Workshop GPT to ask about unclear concepts
Ersilia Model Hub to browse models
Simple app to merge Ersilia output files
Extension: if you finish the proposed activity, have a look at what else is available in the Ersilia Model Hub and what would you like to see deployed online!
Last updated