Monday afternoon (breakout)

Hands-on exercise using AI/ML models from the Ersilia Model Hub

In this session, we will do a gropu exercise using AI/ML models from the Ersilia Model Hub to predict compound properties and their potential antimicrobial activity.

We will split into groups. Each group will be assigned a pathogen (Acinetobacter baumannii or Staphylococcus aureus) and a screening compound library, including ~1000 small molecules from the ChemDiv Anti-Infective Libraryarrow-up-right. At the end of the activity, each group will be asked to present their compounds of choice.

Groups

Below is the pathogen of interest assigned to each group.

Acinetobacter baumannii

  • Green

  • Yellow

  • Blue

Staphylococcus aureus

  • Orange

  • Pink

Steps

  1. Group creation (nominate a scribe)

  2. Download the compound library corresponding to your groups from herearrow-up-right.

  3. Look at the table below and select up to 4 models relevant for your task.

  4. Discuss the publications related to each model and note down what kind of model it is (classifier, regressor), which output will it give and how to interpret it.

  5. Go to the Ersilia GUIarrow-up-right and run evaluations for the selected models.

  6. Download all CSV files into a working directory. In case a model evaluation fails, feel free to download precalculations from herearrow-up-right.

  7. Use Excel to concatenate the CSV files and this apparrow-up-right to visualize compounds.

  8. Select up to 5 compounds.

  9. Make a short slide deck. We suggest 3 slides:

    • Selected models and rationale. Why is your pathogen relevant? Which models and columns did you choose, and why?

    • Overview of the screening results. Were the results as expected?

    • Selected compounds and rationale. Which compounds did you choose, and why? Did you identify an optimal compound? What would you do next?

  10. Present! Send your slides to miquel@ersilia.ioenvelope. Name your file: ai_workshop_green.pptx, ai_workshop_yellow.pptx, etc.

Materials

Relevant models from the Ersilia Model Hub

Below is a list of relevant models from the Ersilia Model Hub. Click the model identifier to access the model repository, where you can find more information about each of the models, including a description of the columns.

circle-info

To view information for each of the columns of a given molecule, you have to (a) access the model repository using the link in the table below, and (b) visit the /model/framework/columns/run_columns.csv .

Model ID
Slug
Title
Columns
Description

chemprop-abaumannii

Inhibition of Acinetobacter baumannii growth

This model is a Chemprop neural network trained with a growth inhibition dataset. Authors screened ~7,500 molecules for those that inhibited the growth of A. baumannii in vitro. They discovered abaucin, an antibacterial compound with narrow-spectrum activity against A. baumannii.

antibiotics-ai-cytotox

Human cytotoxicity endpoints

The authors tested the dataset of 39312 compounds used to train the antibiotics-ai model (eos18ie) against several cytotoxicity endpoints; human liver carcinoma cells (HepG2), human primary skeletal muscle cells (HSkMCs) and human lung fibroblast cells (IMR-90). Cellular viability was measured after 20133 days of treatment with each compound at 10 μM and activities were binarized using a 90% cell viability cut-off. 341 (8.5%), 490 (3.8%) and 447 (8.8%) compounds classified as cytotoxic for HepG2 cells, HSk-MCs and IMR-90 cells

chemical-space-projections-chemdiv

Chemical space 2D projections against ChemDiv

This tool performs PCA, UMAP and tSNE projections taking a 100k ChemDiv diversity set as a chemical space of reference. The Ersilia Compound Embeddings are used as descriptors. Four PCA components and two UMAP and tSNE components are returned.

antibiotics-ai-saureus

Antibiotic activity prediction against Staphylococcus aureus

The authors use a mid-size dataset (more than 30k compounds) to train an explainable graph-based model to identify potential antibiotics with low cytotoxicity. The model uses a substructure-based approach to explore the chemical space. Using this method, they were able to screen 283 compounds and identify a candidate active against methicillin-resistant S. aureus (MRSA) and vancomycin-resistant enterococci.

chembl-kpneumoniae

Klebsiella pneumoniae activity prediction

Klebsiella pneumoniae activity prediction based on phenotypic ChEMBL data. Each column corresponds to a specific bioactivity dataset derived from ChEMBL, encompassing multiple assays and binarization cut-offs. The global consensus score summarizes the probability of being active. Model developed by Ersilia.

chembl-abaumannii

Acinetobacter baumannii activity prediction

Acinetobacter baumannii activity prediction based on phenotypic ChEMBL data. Each column corresponds to a specific bioactivity dataset derived from ChEMBL, encompassing multiple assays and binarization cut-offs. The global consensus score summarizes the probability of being active. Model developed by Ersilia.

chembl-saureus

Staphylococcus aureus activity prediction

Staphylococcus aureus activity prediction based on phenotypic ChEMBL data. Each column corresponds to a specific bioactivity dataset derived from ChEMBL, encompassing multiple assays and binarization cut-offs. The global consensus score summarizes the probability of being active. Model developed by Ersilia.

admet-ai-exact

ADMET properties prediction

ADMET AI is a framework for carrying out fast batch predictions for ADMET properties. It is based on ensemble of five Chemprop-RDKit models and has been trained on 41 tasks from the ADMET group in Therapeutics Data Commons (v0.4.1). Out of these 41 tasks, there are 31 classification tasks and 10 regression tasks. In addition to that output also contains 8 physicochemical properties, namely, molecular weight, logP, hydrogen bond acceptors, hydrogen bond doners, Lipinskis Rule of 5, QED, stereo centers, and topological polar surface area. eos7d58 contains an implementation of the model that also produces the percentile based on DrugBank approved drugs.

sa-score

Synthetic accessibility score

Estimation of synthetic accessibility score (SAScore) of drug-like molecules based on molecular complexity and fragment contributions. The fragment contributions are based on a 1M sample from PubChem and the molecular complexity is based on the presence/absence of non-standard structural features. It has been validated comparing the SAScore and the estimates of medicinal chemist experts for 40 molecules (r2 = 0.89). The SAScore has been contributed to the RDKit Package.

natural-product-likeness

Natural product likeness score

The model is a derivation of the natural product fingerprint (eos6tg8). In addition to generating specific natural product fingerprints, the activation value of the neuron that predicts if a molecule is a natural product or not can be used as a NP-likeness score. The method outperforms the NP_Score implemented in RDKit.

To explore the full list of models, visit the Ersilia Model Hub browserarrow-up-right.

Last updated