Monday afternoon (breakout)

Hands-on exercise using AI/ML models from the Ersilia Model Hub

In this session, we will do a gropu exercise using AI/ML models from the Ersilia Model Hub to predict compound properties and their potential antimicrobial activity.

We will split into groups. Each group will be assigned a pathogen (Acinetobacter baumannii or Staphylococcus aureus) and a screening compound library, including ~1000 small molecules from the ChemDiv Anti-Infective Library. At the end of the activity, each group will be asked to present their compounds of choice.

Groups

Below is the pathogen of interest assigned to each group.

Acinetobacter baumannii

Green
Yellow
Blue

Staphylococcus aureus

Orange
Pink

Steps

Group creation (nominate a scribe)
Download the compound library corresponding to your groups from here.
Look at the table below and select up to 4 models relevant for your task.
Discuss the publications related to each model and note down what kind of model it is (classifier, regressor), which output will it give and how to interpret it.
Go to the Ersilia GUI and run evaluations for the selected models.
Download all CSV files into a working directory. In case a model evaluation fails, feel free to download precalculations from here.
Use Excel to concatenate the CSV files and this app to visualize compounds.
Select up to 5 compounds.
Make a short slide deck. We suggest 3 slides:
- Selected models and rationale. Why is your pathogen relevant? Which models and columns did you choose, and why?
- Overview of the screening results. Were the results as expected?
- Selected compounds and rationale. Which compounds did you choose, and why? Did you identify an optimal compound? What would you do next?
Present! Send your slides to miquel@ersilia.io. Name your file: ai_workshop_green.pptx, ai_workshop_yellow.pptx, etc.

Materials

Relevant models from the Ersilia Model Hub

Below is a list of relevant models from the Ersilia Model Hub. Click the model identifier to access the model repository, where you can find more information about each of the models, including a description of the columns.

To view information for each of the columns of a given molecule, you have to (a) access the model repository using the link in the table below, and (b) visit the /model/framework/columns/run_columns.csv .

Model ID

Slug

Title

Columns

Description

eos3804

chemprop-abaumannii

Inhibition of Acinetobacter baumannii growth

1 column

This model is a Chemprop neural network trained with a growth inhibition dataset. Authors screened ~7,500 molecules for those that inhibited the growth of A. baumannii in vitro. They discovered abaucin, an antibacterial compound with narrow-spectrum activity against A. baumannii.

eos42ez

antibiotics-ai-cytotox

Human cytotoxicity endpoints

3 columns

The authors tested the dataset of 39312 compounds used to train the antibiotics-ai model (eos18ie) against several cytotoxicity endpoints; human liver carcinoma cells (HepG2), human primary skeletal muscle cells (HSkMCs) and human lung fibroblast cells (IMR-90). Cellular viability was measured after 20133 days of treatment with each compound at 10 μM and activities were binarized using a 90% cell viability cut-off. 341 (8.5%), 490 (3.8%) and 447 (8.8%) compounds classified as cytotoxic for HepG2 cells, HSk-MCs and IMR-90 cells

eos2db3

chemical-space-projections-chemdiv

Chemical space 2D projections against ChemDiv

8 columns

This tool performs PCA, UMAP and tSNE projections taking a 100k ChemDiv diversity set as a chemical space of reference. The Ersilia Compound Embeddings are used as descriptors. Four PCA components and two UMAP and tSNE components are returned.

eos18ie

antibiotics-ai-saureus

Antibiotic activity prediction against Staphylococcus aureus

1 column

The authors use a mid-size dataset (more than 30k compounds) to train an explainable graph-based model to identify potential antibiotics with low cytotoxicity. The model uses a substructure-based approach to explore the chemical space. Using this method, they were able to screen 283 compounds and identify a candidate active against methicillin-resistant S. aureus (MRSA) and vancomycin-resistant enterococci.

eos37l0

chembl-kpneumoniae

Klebsiella pneumoniae activity prediction

22 columns

Klebsiella pneumoniae activity prediction based on phenotypic ChEMBL data. Each column corresponds to a specific bioactivity dataset derived from ChEMBL, encompassing multiple assays and binarization cut-offs. The global consensus score summarizes the probability of being active. Model developed by Ersilia.

eos5dti

chembl-abaumannii

Acinetobacter baumannii activity prediction

26 columns

Acinetobacter baumannii activity prediction based on phenotypic ChEMBL data. Each column corresponds to a specific bioactivity dataset derived from ChEMBL, encompassing multiple assays and binarization cut-offs. The global consensus score summarizes the probability of being active. Model developed by Ersilia.

eos2m0f

chembl-saureus

Staphylococcus aureus activity prediction

51 columns

Staphylococcus aureus activity prediction based on phenotypic ChEMBL data. Each column corresponds to a specific bioactivity dataset derived from ChEMBL, encompassing multiple assays and binarization cut-offs. The global consensus score summarizes the probability of being active. Model developed by Ersilia.

eos7m30

admet-ai-exact

ADMET properties prediction

49 columns

ADMET AI is a framework for carrying out fast batch predictions for ADMET properties. It is based on ensemble of five Chemprop-RDKit models and has been trained on 41 tasks from the ADMET group in Therapeutics Data Commons (v0.4.1). Out of these 41 tasks, there are 31 classification tasks and 10 regression tasks. In addition to that output also contains 8 physicochemical properties, namely, molecular weight, logP, hydrogen bond acceptors, hydrogen bond doners, Lipinskis Rule of 5, QED, stereo centers, and topological polar surface area. eos7d58 contains an implementation of the model that also produces the percentile based on DrugBank approved drugs.

eos9ei3

sa-score

Synthetic accessibility score

1 column

Estimation of synthetic accessibility score (SAScore) of drug-like molecules based on molecular complexity and fragment contributions. The fragment contributions are based on a 1M sample from PubChem and the molecular complexity is based on the presence/absence of non-standard structural features. It has been validated comparing the SAScore and the estimates of medicinal chemist experts for 40 molecules (r2 = 0.89). The SAScore has been contributed to the RDKit Package.

eos9yui

natural-product-likeness

Natural product likeness score

1 column

The model is a derivation of the natural product fingerprint (eos6tg8). In addition to generating specific natural product fingerprints, the activation value of the neuron that predicts if a molecule is a natural product or not can be used as a NP-likeness score. The method outperforms the NP_Score implemented in RDKit.

To explore the full list of models, visit the Ersilia Model Hub browser.

PreviousMonday afternoon (demo)NextTuesday morning (keynote)

Last updated 4 months ago

Good afternoon

hashtagGroups

hashtagAcinetobacter baumannii

hashtagStaphylococcus aureus

hashtagSteps

hashtagMaterials

hashtagRelevant models from the Ersilia Model Hub

Groups

Acinetobacter baumannii

Staphylococcus aureus

Steps

Materials

Relevant models from the Ersilia Model Hub