# Day 2 Breakout

## Introduction

A virtual screening cascade allows us to mimic in the computer some of the experimental steps we must do to identify new drug leads. By filtering out molecules with predicted low activities, or undesired side effects, we can lower the cost and time to find new drug candidates. Ideally, we can build virtual screening cascades based off our own data, but for many assays we do not have readily available experimental data. In these situations, we can leverage models developed by third parties and apply them to our problem.

{% hint style="warning" %}
Virtual screening cascades are not meant to substitute experimental testing, but act as a decision-making support tool.
{% endhint %}

### ***A.baumannii*****&#x20;activity prediction**

In this activity, we will replicate the work described in Liu et al, 2023, where they build an ML model to identify novel *A.baumannii* inhibitors and use it to filter the [Drug Repurposing Hub](https://www.broadinstitute.org/drug-repurposing-hub).

During the skills development session, we have done a deep dive into ML model building using the *A.baumannii* model as an example. Now, you have to download the list of compounds available in the Drug Repurposing Hub and continue the "virtual screening" similar to the original author's work. To that end, we suggest running predictions against *A.baumannii* activity and a few accessory models available through Ersilia to select the best candidates. In short, the steps to follow are:

1. Download the Drug Repurposing Hub data for your group from this [link](https://drive.google.com/drive/folders/121MigvZot9Dx0SCgw9cmGUseENDXkqjP?usp=drive_link).
2. Look at the pre-selected [Ersilia Model Hub](https://ersilia.io/model-hub) models available online in [our platform](https://hub.ersilia.io/).
3. Select which models would be relevant to your exercise and why, and take notes on model interpretation, expected results, priority level etc.
4. [Run predictions](https://hub.ersilia.io/) for the Drug Repurposing Hub molecules using the selected models.
5. Select the best molecule candidates based on your defined filters of activity, ADME properties and other considerations.

{% hint style="info" %}
To simplify the exercise, we have prepared 5 subsets of data from the Drug Repurposing Hub. Each group should use its assigned dataset only
{% endhint %}

## **Task 1: Model selection**

In order to limit the exercise, please limit your screening to the following models:

* A.baumannii Activity: eos3804
* General Antibiotic Activity: eos4e40
* Cardiotoxicity: eos43at
* Synthetic Accessibility: eos9ei3
* ADME properties: eos7d58
* Natural Product Likeness: eos9yui

For each model, think about the following questions:

* What type of model is it (classification or regression)?
* What is the training dataset? (refer to the original publication listed above)
* What is the interpretation of the model outcome?
* What cut-off, if any, we should use for that particular model?

In addition, think about the following concepts:

* Does the outcome of the model make sense? If it does not make sense, perhaps we have the wrong interpretation of the model output.
* Is the cut-off I have selected too stringent (i.e, I am losing too many molecules and I should be more permissive?)
* Is this model very relevant for the current dataset (i.e, is malaria activity equally important as natural product likeness?)

### **Deliverable**

Fill in the following [excel table](https://docs.google.com/spreadsheets/d/1Yq39mA3Xy2u7JqdwPHgEHfe2AEPziPHr_ztQAnN-JQg/edit?usp=sharing) and send it to gemma\[at]ersilia.io.

{% hint style="info" %}
Model information can be found in the metadata of the model available through Ersilia's GitHub repository, but it's best to read the original publication. Publications are available in [this folder](https://drive.google.com/drive/folders/1Fs_jYG_jh1g19P2T2wV7UBbg0lec1yHb) with highlighted sections to facilitate model understanding.
{% endhint %}

## **Task 2: Molecule prioritization**

Next, let's use the models we have discussed to run some predictios!

1. Download the compound library corresponding to your groups from [here](https://drive.google.com/drive/folders/121MigvZot9Dx0SCgw9cmGUseENDXkqjP?usp=drive_link).
2. Go to the [Ersilia GUI](https://hub.ersilia.io/) and run evaluations.
3. Download all CSV files into a working directory. In case a model evaluation fails, feel free to download precalculations from [here](https://drive.google.com/drive/folders/1yrQfvWExZCMWTfWN6vYIgEcRVe2eT2ZH?usp=drive_link).
4. Use [this simple app](https://ai2050-dd-day3b.streamlit.app/) to merge your CSV files or, alternatively, use Excel to concatenate the CSV files.
5. Open the downloaded Excel file and use the Legends tab to learn more about each column.
6. Select up to 5 compounds, based on bioactivity, ADME and toxicity profiles.
7. Make a short slide deck. Use this [template](https://docs.google.com/presentation/d/1oiGrAVlOkcYRDfNpWCv4zyTcHxLmNWvQ/edit?usp=sharing\&ouid=117577258963070079301\&rtpof=true\&sd=true):
   * Model relevance and why: did you use all models? Which ones or which columns from each model do you consider more relevant?
   * Overview of the screening results. Were the results as expected?
   * Selected compounds and rationale. Which compounds did you choose, and why? Did you identify an optimal compound? What would you do next?
8. Present!

### **Deliverable**

List of top 5 candidates, with their molecular structures visualised in the power point. Send your files to miquel\[at]ersilia.io. Name your file: `ai_workshop_green.pptx`, `ai_workshop_yellow.pptx`, etc.

Feel free to use these tools:

* [AI Workshop GPT](https://chatgpt.com/g/g-686d06e62bdc8191999ee66c42f082cc-ai2050-drug-discovery-workshop) to ask about unclear concepts
* [Ersilia Model Hub](https://ersilia.io/) to browse models
* [Simple app](https://ai2050-dd-day3b.streamlit.app/) to merge Ersilia output files

{% hint style="success" %}
Extension: if you finish the proposed activity, have a look at what else is available in the [Ersilia Model Hub](https://ersilia.io/model-hub) and what would you like to see deployed online!
{% endhint %}