Ersilia Book

Outreachy Summer 2022

Specific instructions for the summer Ersilia interns (June 2022 - September 2022)

Follow the Topic of the Week

Every Monday by 9 pm CET, one person of the team will suggest one topic and the rest of us will prioritize model search about this topic during the week. Topics can be related to a task of biomedical relevance, or to a particular family of algorithms. Valid topics could be:
By biomedical relevance:
  • Antimalarial activity prediction
  • Broad spectrum antibiotic activity prediction
  • Drug toxicity prediction
  • Synthetic accessibility of compounds
By algorithmic relevance:
  • Graph neural networks
  • Reinforcement learning methods
  • Chemistry language transformer models
Please note that Topics of the Week should be taken as a soft guideline. Discovery and selection of model should not be blocked by the existence of the Topic of the Week. If you find a model that is interesting but it is unrelated to the Topic of the Week, feel free to select it and work on it. It is still a valid choice!

Announcing the Topic of the Week

In the Slack #internships channel, one person (for example @Miquel) will write a message like this:
@channel This is the topic of the week!
Topic: Antimalarial activity prediction
Why? We are currently working with Medicines for Malaria Venture and they have asked for predictions on some antimalarial candidates.
Next: @Gemma
In this case, @Miquel should pin
the message so that everybody can find it easily during the week.
It would be great if the rest of the @channel can make comments, ask questions, or give feedback about the topic choice. Or even just confirm that you've read the message (
Note that @Miquel has nominated @Gemma. So @Gemma will be responsible for selecting a model next week. If you are eager to suggest a topic, simply contact the current responsible person so that they can nominate you
Why? bullet point should be short, and can be anything, really. "I haven't found any model of this kind in the hub" is a perfectly valid statement, as is "I want to learn about this family of models", or "I've read somewhere that there is no drug against this pathogen", etc.

Notify and keep track of models

It is important that you communicate your research to the rest of the team. We suggest the following three steps:

Ersilia Slack #literaturechannel

First, write a quick note in the Ersilia Slack #literature channel. Simply copy the link to the publication as soon as you discover the model, or even a link to a tweet. Before the link, add the
emoji so that we know it is about a model. For example:
Compound price prediction with deep learning! [link]
The Slack #literature channel contains models
and much more. We encourage you to check this channel regularly and be active in it! This is the best way to share our collective knowledge and avoid duplication of efforts. Naturally, this channel will not be very structured, and we feel it shouldn't be. Rather, it should be a space to quickly share findings and find inspiration for your research.

Ersilia Model Hub Spreadsheet

Once you've posted the model in the #literature channel, read about it in more detail and try to figure out if code is available. Then, add the model as a new entry in the Ersilia Model Hub Spreadsheet. Please request edit rights to @Gemma if you don't have them.
In the Spreadsheet, you will find two sheets:
  • Hub: Contains complete documentation about the models, including an Ersilia Open Source identifier (e.g. eos4e40), a slug (e.g. chemprop-antibiotic) and a status (e.g. Done).
  • Raw List: Contains a backlog of models that could be of interest to Ersilia. This sheet contains minimal information about the models. There is an Approved and a Selected tickbox. Approved means that Ersilia is willing to incorporate this model. Selected means that someone is already working on the model or the model has been successfully incorporated in the Ersilia Model Hub.
You should start by the Raw List sheet. You are always free to add models there (relevant to the Topic of the Week or not). @Miquel is responsible for curating this list and approving the models, he will use the Slack #internships channel to ask questions or discuss the relevance of the model before approving them.
Start by adding your model in the Raw List sheet and wait for approval. As soon as the model has been approved, you are ready to move forward. Please tick the Selected column so that nobody else picks your model of interest.
Now you are ready to start filling the Hub sheet. Fill in as much information as possible, and if something is missing make sure you provide relevant links and enough information for others to understand what the model is about.
Please write To do, In progress or Done in the Status column:
  • To do means that you have filled the information but you have not started the model incorporation per se. In other words, you have not started to work on the coding part yet.
  • In progress means that you are already working on the code.
  • Done means that the model has been successfully incorporated in the Ersilia Model Hub.
Try to provide a Title and a Description, and perhaps suggest a Slug. The Ersilia Open Source identifiers are already predefined, so no need to worry about it. Don't forget to write your name in the Contributor column.
@Gemma and @Ife are the maintainers of the Spreadsheet. Please reach out to them if you have questions or suggestions.
The goal of the Contributor colum is, simply, to know who is the person of reference for each model. Some models are more difficult to add than others, so you should not be stressed about the number of models that you contribute. Ersilia is a safe and collaborative space, we do not monitor this kind of metrics

Ersilia Model Hub AirTable

For now, @Miquel will be responsible for curating the models listed in the Spreadsheet. He will reach out to you if he has questions or needs more information about the model. He will copy the information from the Spreadsheet to an AirTable Base. This Base is accessed programmatically by the Ersilia CLI and our (provisional) hub interface.
You don't have to worry about the AirTable base for now. This database is fully managed by @Miquel.

Start coding!

As soon as you feel ready to start coding, you should change the Status in the Spreadsheet Hub sheet from To do to In progress. Please go to the next page to learn more about how to Incorporate models to the hub.


In brief, this is a suggested routine that you can follow:
  1. 1.
  2. 2.
    Search models related to the topic.
  3. 3.
    Post your findings in the #literature channel.
  4. 4.
    Choose one or few models and add them in the backlog (Raw List sheet of the Ersilia Model Hub Spreadsheet).
  5. 5.
    Wait for approval.
  6. 6.
    Select an approved model.
  7. 7.
    Move to the Hub sheet of the Spreadsheet and fill in as much information as you can. For now, set the Status to To do.
  8. 8.
    When you are ready to start coding, change the Status to In progress.
  9. 9.
    When the model is successfully added, change the Status to Done!