Monday afternoon (breakout)
Hands-on exercise using AI/ML models from the Ersilia Model Hub
In this session, we will do a gropu exercise using AI/ML models from the Ersilia Model Hub to predict compound properties and their potential antimicrobial activity.
We will split into groups. Each group will be assigned a pathogen (Acinetobacter baumannii or Staphylococcus aureus) and a screening compound library, including ~1000 small molecules from the ChemDiv Anti-Infective Library. At the end of the activity, each group will be asked to present their compounds of choice.
Groups
Below is the pathogen of interest assigned to each group.
Acinetobacter baumannii
Green
Yellow
Blue
Staphylococcus aureus
Orange
Pink
Steps
Group creation (nominate a scribe)
Download the compound library corresponding to your groups from here.
Look at the table below and select up to 4 models relevant for your task.
Discuss the publications related to each model and note down what kind of model it is (classifier, regressor), which output will it give and how to interpret it.
Go to the Ersilia GUI and run evaluations for the selected models.
Download all CSV files into a working directory. In case a model evaluation fails, feel free to download precalculations from here.
Use Excel to concatenate the CSV files and this app to visualize compounds.
Select up to 5 compounds.
Make a short slide deck. We suggest 3 slides:
Selected models and rationale. Why is your pathogen relevant? Which models and columns did you choose, and why?
Overview of the screening results. Were the results as expected?
Selected compounds and rationale. Which compounds did you choose, and why? Did you identify an optimal compound? What would you do next?
Present! Send your slides to miquel@ersilia.io. Name your file: ai_workshop_green.pptx, ai_workshop_yellow.pptx, etc.
Materials
Relevant models from the Ersilia Model Hub
Below is a list of relevant models from the Ersilia Model Hub. Click the model identifier to access the model repository, where you can find more information about each of the models, including a description of the columns.
To view information for each of the columns of a given molecule, you have to (a) access the model repository using the link in the table below, and (b) visit the /model/framework/columns/run_columns.csv .
chemprop-abaumannii
Inhibition of Acinetobacter baumannii growth
This model is a Chemprop neural network trained with a growth inhibition dataset. Authors screened ~7,500 molecules for those that inhibited the growth of A. baumannii in vitro. They discovered abaucin, an antibacterial compound with narrow-spectrum activity against A. baumannii.
antibiotics-ai-cytotox
Human cytotoxicity endpoints
The authors tested the dataset of 39312 compounds used to train the antibiotics-ai model (eos18ie) against several cytotoxicity endpoints; human liver carcinoma cells (HepG2), human primary skeletal muscle cells (HSkMCs) and human lung fibroblast cells (IMR-90). Cellular viability was measured after 20133 days of treatment with each compound at 10 μM and activities were binarized using a 90% cell viability cut-off. 341 (8.5%), 490 (3.8%) and 447 (8.8%) compounds classified as cytotoxic for HepG2 cells, HSk-MCs and IMR-90 cells
chemical-space-projections-chemdiv
Chemical space 2D projections against ChemDiv
This tool performs PCA, UMAP and tSNE projections taking a 100k ChemDiv diversity set as a chemical space of reference. The Ersilia Compound Embeddings are used as descriptors. Four PCA components and two UMAP and tSNE components are returned.
antibiotics-ai-saureus
Antibiotic activity prediction against Staphylococcus aureus
The authors use a mid-size dataset (more than 30k compounds) to train an explainable graph-based model to identify potential antibiotics with low cytotoxicity. The model uses a substructure-based approach to explore the chemical space. Using this method, they were able to screen 283 compounds and identify a candidate active against methicillin-resistant S. aureus (MRSA) and vancomycin-resistant enterococci.
chembl-kpneumoniae
Klebsiella pneumoniae activity prediction
Klebsiella pneumoniae activity prediction based on phenotypic ChEMBL data. Each column corresponds to a specific bioactivity dataset derived from ChEMBL, encompassing multiple assays and binarization cut-offs. The global consensus score summarizes the probability of being active. Model developed by Ersilia.
chembl-abaumannii
Acinetobacter baumannii activity prediction
Acinetobacter baumannii activity prediction based on phenotypic ChEMBL data. Each column corresponds to a specific bioactivity dataset derived from ChEMBL, encompassing multiple assays and binarization cut-offs. The global consensus score summarizes the probability of being active. Model developed by Ersilia.
chembl-saureus
Staphylococcus aureus activity prediction
Staphylococcus aureus activity prediction based on phenotypic ChEMBL data. Each column corresponds to a specific bioactivity dataset derived from ChEMBL, encompassing multiple assays and binarization cut-offs. The global consensus score summarizes the probability of being active. Model developed by Ersilia.
admet-ai-exact
ADMET properties prediction
ADMET AI is a framework for carrying out fast batch predictions for ADMET properties. It is based on ensemble of five Chemprop-RDKit models and has been trained on 41 tasks from the ADMET group in Therapeutics Data Commons (v0.4.1). Out of these 41 tasks, there are 31 classification tasks and 10 regression tasks. In addition to that output also contains 8 physicochemical properties, namely, molecular weight, logP, hydrogen bond acceptors, hydrogen bond doners, Lipinskis Rule of 5, QED, stereo centers, and topological polar surface area. eos7d58 contains an implementation of the model that also produces the percentile based on DrugBank approved drugs.
sa-score
Synthetic accessibility score
Estimation of synthetic accessibility score (SAScore) of drug-like molecules based on molecular complexity and fragment contributions. The fragment contributions are based on a 1M sample from PubChem and the molecular complexity is based on the presence/absence of non-standard structural features. It has been validated comparing the SAScore and the estimates of medicinal chemist experts for 40 molecules (r2 = 0.89). The SAScore has been contributed to the RDKit Package.
natural-product-likeness
Natural product likeness score
The model is a derivation of the natural product fingerprint (eos6tg8). In addition to generating specific natural product fingerprints, the activation value of the neuron that predicts if a molecule is a natural product or not can be used as a NP-likeness score. The method outperforms the NP_Score implemented in RDKit.
To explore the full list of models, visit the Ersilia Model Hub browser.
Last updated