Ersilia's ecosystem

Learn about the work of Ersilia and where to start using/contributing to our tools

Ersilia develops and implements AI/ML tools for infectious disease research. This documentation will be useful if you are...

  • A chemist or biologist looking to use some of our AI/ML platforms for your projects.

  • An open-source developer aiming to contribute to our tools.

  • A data scientist developing AI/ML tools and wishing to incorporate them in our platform.

  • An Ersilia enthusiast looking forward to learning more about our work.

All of our work is openly available through our GitHub organisation page. Below we will summarise the main repositories and where to find the most important software tools. For a complete catalog of all our organisation repositories, please see this table.

Tool repositories

The Ersilia Model Hub

The Ersilia Model Hub is our main platform. It serves ready-to-use AI models related to the drug discovery cascade. Models can be browsed in our website, and can run locally (see Installation instructions) and we also offer a selection of them for online inference (please select those available Online through our website) as well as an open service based on GitHub.

Detailed information about the Ersilia Model Hub, its components and how to use it and contribute to its backend as well as contribute models can be found in this section. Developers may look into the API documentation for an in-depth view of the code.

The repositories linked to the Ersilia Model Hub are:

  • ersilia: This is the main repository, corresponding to a CLI to fetch and run models locally.

  • eos-template: Template for new model incorporation. This template uses GitHub Actions workflows specified in ersilia-model-workflows.

  • ersilia-stats: Collection of statistics around the Hub and its usage, such as scientific publications, disease areas covered, etc.

  • ersilia-maintenance: GitHub Actions-based repository to check for integrity of the models within Ersilia.

  • ersilia-self-service: GitHub Action-based online inference for all models. Data and results are available publicly through GitHub issues. Please do not submit IP-sensitive data.

  • ersilia-assistant: LLM-based interface to easily interact with the Ersilia Model Hub.

  • ersilia-pack: Model packaging for serving through FastAPI.

  • ersilia-maintained-inputs: Standardised inputs for model testing.

  • model-inference-pipeline: Pipeline to store model inference results in AWS, creating an open database of pre-calculations (cache).

  • eos repositories: repositories labelled with an Ersilia (eos) identifier contain individual models. A full list of models, their identifiers and relevant information is available in this table.

ZairaChem

ZairaChem is an automated pipeline for ML model training. Read more about it in its dedicated section, as well as the associated publication and code repository (zaira-chem). Coupled to ZairaChem, we have developed Olinda, a model distillation framework to convert the high-performant, heavy ZairaChem models into portable ONNX models amenable for large-scale calculations and online deployment (olinda).

ChemSampler

ChemSampler is a pipeline based on the generative AI models available in the Ersilia Model Hub. Given a starting molecule, it performs several rounds of generative chemistry and produces a list of molecular candidates. ChemSampler can be constrained using several parameters. Please read its dedicated GitBook section or check the code repository (chem-sampler).

Workshops and courses

As part of our mission we provide training in AI and Data Science to researchers across the Global South. All our trainings are documented and freely available. Check out the Training Materials section if you are interested, and have a look at the following code repositories:

  • AI2050 courses: 2h introduction to Drug Discovery (ai2050-h3d-symposium-workshop) and full week course for more advanced students (ai2050-dd-workshop), developed in collaboration with the H3D Foundation.

  • Event Fund: A one-week course we developed in collaboration with the H3d Centre and the support of the Wellcome Trust and Code for Science and Society.

  • Python 101: An introduction to Python programming language geared to scientists (focusing on data analysis, plotting and basic pythonic operations; python101). Inspired by the Carpentries!

Research-associated analyses

In addition to our software tools, we have a number of repositories related to scientific research projects. Those repositories typically contain the necessary data and code to reproduce an analysis reported in a research paper. For a full overview of our research projects and publications please have a look at our website. Below are a few exemplary projects, finalized or in current development:

  • ADDA4TB: Targeted protein degradation for Mycobacterium tuberculosis, in collaboration with Stellenbosch University (mtb-targeted-protein-degradation).

  • GRADIENT Pharmacogenetics in Africa: Analysis of potential pharmacogenes related to antimalaria and anti-TB drugs, in collaboration with H3D (pharmacogx-embeddings).

  • SARS-CoV-2 Chemical Space: Analysis of the chemical space associated with curated COVID-19 therapy data, done in collaboration with UB-CeDD (sars-cov-2-chemical-space).

Last updated

Was this helpful?