Outreachy Summer 2022

Specific instructions for the summer Ersilia interns (June 2022 - September 2022)

Follow the Topic of the Week

Every Monday by 9 pm CET, one person of the team will suggest one topic and the rest of us will prioritize model search about this topic during the week. Topics can be related to a task of biomedical relevance, or to a particular family of algorithms. Valid topics could be:

By biomedical relevance:

  • Antimalarial activity prediction

  • Broad spectrum antibiotic activity prediction

  • Drug toxicity prediction

  • Synthetic accessibility of compounds

By algorithmic relevance:

  • Graph neural networks

  • Reinforcement learning methods

  • Chemistry language transformer models

Please note that Topics of the Week should be taken as a soft guideline. Discovery and selection of model should not be blocked by the existence of the Topic of the Week. If you find a model that is interesting but it is unrelated to the Topic of the Week, feel free to select it and work on it. It is still a valid choice!

Announcing the Topic of the Week

In the Slack #internships channel, one person (for example @Miquel) will write a message like this:

Notify and keep track of models

It is important that you communicate your research to the rest of the team. We suggest the following three steps:

Ersilia Slack #literaturechannel

Ersilia Model Hub Spreadsheet

Once you've posted the model in the #literature channel, read about it in more detail and try to figure out if code is available. Then, add the model as a new entry in the Ersilia Model Hub Spreadsheet. Please request edit rights to @Gemma if you don't have them.

In the Spreadsheet, you will find two sheets:

  • Hub: Contains complete documentation about the models, including an Ersilia Open Source identifier (e.g. eos4e40), a slug (e.g. chemprop-antibiotic) and a status (e.g. Done).

  • Raw List: Contains a backlog of models that could be of interest to Ersilia. This sheet contains minimal information about the models. There is an Approved and a Selected tickbox. Approved means that Ersilia is willing to incorporate this model. Selected means that someone is already working on the model or the model has been successfully incorporated in the Ersilia Model Hub.

You should start by the Raw List sheet. You are always free to add models there (relevant to the Topic of the Week or not). @Miquel is responsible for curating this list and approving the models, he will use the Slack #internships channel to ask questions or discuss the relevance of the model before approving them.

Start by adding your model in the Raw List sheet and wait for approval. As soon as the model has been approved, you are ready to move forward. Please tick the Selected column so that nobody else picks your model of interest.

Now you are ready to start filling the Hub sheet. Fill in as much information as possible, and if something is missing make sure you provide relevant links and enough information for others to understand what the model is about.

Please write To do, In progress or Done in the Status column:

  • To do means that you have filled the information but you have not started the model incorporation per se. In other words, you have not started to work on the coding part yet.

  • In progress means that you are already working on the code.

  • Done means that the model has been successfully incorporated in the Ersilia Model Hub.

Try to provide a Title and a Description, and perhaps suggest a Slug. The Ersilia Open Source identifiers are already predefined, so no need to worry about it. Don't forget to write your name in the Contributor column.

@Gemma and @Ife are the maintainers of the Spreadsheet. Please reach out to them if you have questions or suggestions.

Ersilia Model Hub AirTable

For now, @Miquel will be responsible for curating the models listed in the Spreadsheet. He will reach out to you if he has questions or needs more information about the model. He will copy the information from the Spreadsheet to an AirTable Base. This Base is accessed programmatically by the Ersilia CLI and our (provisional) hub interface.

You don't have to worry about the AirTable base for now. This database is fully managed by @Miquel.

As soon as you feel ready to start coding, you should change the Status in the Spreadsheet Hub sheet from To do to In progress. Please go to the next page to learn more about how to Incorporate models to the hub.

TL;DR

In brief, this is a suggested routine that you can follow:

  1. Search models related to the topic.

  2. Post your findings in the #literature channel.

  3. Choose one or few models and add them in the backlog (Raw List sheet of the Ersilia Model Hub Spreadsheet).

  4. Wait for approval.

  5. Select an approved model.

  6. Move to the Hub sheet of the Spreadsheet and fill in as much information as you can. For now, set the Status to To do.

  7. When you are ready to start coding, change the Status to In progress.

Last updated