Outreachy Winter 2023
Please find here the guidelines for the Outreachy contribution period running from 2nd October 2023 to 27th October 2023
The Ersilia Model Hub is an open source platform of ready-to-use AI/ML models for biomedical research. With it, scientists can browse a collection of models, select the ones relevant to their research and run predictions! For example, to predict whether a molecule will be active against Malaria. The internship project will be focused on increasing the collection of available models in the Hub.
Specially important to Ersilia is the contribution period. We list below a number of tasks that must be completed in order. We will not judge interns on how many tasks they can complete, but on the quality of each contribution and the interest to learn, participate and help others in the community. The mentors are there and willing to help, but they also have a dedicated time slot for reviewing contributions, please be patient if it takes a few hours to get back to you. Almost all the information you will need to successfully complete the contribution period is enclosed in this document, please read it all before asking questions.
Please note that this internship requires knowledge of Python programming language.
Also, all text written by interns will be revised to ensure no plagiarism and no use of AI writing tools has been involved. While AI tools can be helpful in certain circumstances, here we need to learn about you in your own words. Using Chat-GPT or others for writing letters of interest, summaries of publications or similar will immediately disqualify you from further participating in the program.
Ersilia can only accept interns that have been approved by Outreachy and that comply with the necessary requirements. Please check your availability for the internship period fulfills Outreachy's requirements.
The contribution period runs from October 2nd to October 27th. During this time, interested applicants are welcome to contribute to Ersilia's project following the guidelines in this document.
The contribution period is organised in 4 weeks. Each week has a set of specific goals defined, with the objective that mentors can evaluate the intern's experience, interest in the community and team-building work. Once the week's objectives have been met, please focus on:
- Improving your contribution (there is always more publications to read, better bug reports to be written etc)
- Helping out other contributors (we really value group work)
Link this issue to your Outreachy application to submit the first contribution
The first week is focused on getting to know the Ersilia community, our mission and how we work to achieve it. We will be having tons of interactions during the contribution period, so the best is to get to know your peers, your mentors and a bit more about Ersilia.
Slack: we use Slack as our main communication platform, both for the contribution period and afterwards to work with the selected interns. If you have never used this tool, don't worry, is quite intuitive!
- 2.Introduce yourself in the #general channel. The general channel is for random questions, interactions with other fellow contributors, writing tips and suggestions...
- 3.Use the dedicated channels for questions about specific topics.
- 4.Contribute to your peers questions, this is about helping each other and we really value interns who work with the community.
We try to work as openly as possible, we encourage all contributors to post in the open channels rather than private conversations.
Please use a Slack name that is easy to identify with your GitHub handle to make it easy for mentors to review contributions and tag people.
- 📖 Getting familiar with the repository structure
- 🐛 Checking the issues to see what has the community been working on
- 👀 Watching the repository to receive notifications if you are mentioned
- ⭐ Starring the repository if you like the work Ersilia is doing!
Ersilia is an Open Source community with active contributing members. Please respect the work of others and specially if an issue is assigned to someone else, give them the time and space to work on it.
We will be using GitHub issues a lot, so if you have never worked with GitHub before, make sure you understand how issues work!
Community call: we will hold a community call on Friday 6th October at 5:00 pm CET to go over the contribution period tasks and answer any questions you might have! The link will be shared via Slack. Attendance is not compulsory, we have tried to find a time that is acceptable for most time zones, we apologise in advance if it means an early start or late end of your day.
Code of Conduct: Ersilia is adhered to the Contributor Covenant Code of Conduct. Any breaches of the code of conduct, specially harassment or lack of respect for fellow contributors, will mean disqualification as an applicant.
Each intern will have to open an issue on the Ersilia Model Hub repository using the Outreachy Winter 23 issue template. Use this issue as your personal thread of the contribution period. Everything you want the mentors to review, comment or take into account for the contribution period should be in that issue. You can also tag people to ask for help, debug together...
The issue has a set of tasks, mark them as complete as you go!
We will be using the Ersilia Model Hub throughout the internship. The software is in beta testing and therefore you might encounter errors while running it, don't worry this is why we are here!
Please follow the installation instructions. If you have a UNIX machine (Linux or MacOS) you can install Ersilia directly. If you are using a windows machine you will need a Virtual Machine or a Windows Subsystem Linux (WSL).
For Windows users, we recommend using a WSL with Visual Studio Code to access it.
A common mistake is to forget the installation of Git-LFS, which is required for many models. Please do so!
- 1.Testing that Ersilia works: we will first make sure ersilia works by running the following commands:
ersilia --help #this should output the command options for ersilia
- 2.Once we are sure ersilia is recognised in the CLI, we will test a very simple model
ersilia -v fetch eos3b5e
ersilia serve eos3b5e
ersilia -v api run -i "CCCC"
This is calculating the molecular weight of the molecules, the output should be printed in your CLI and look like:
These tests do not work, what now?! Write down the challenges you are facing in your github issue, and ask for support to your peers through the Slack channel.
These work just fine! Perfect, report the tasks as completed in your specific GitHub issue and help your peers achieve it as well!
Write, in a thread in your issue, your motivation for joinign Outreachy and, in particular, why are you interested in working at Ersilia. A good motivation letter will explain your current skills, your reasons to work in Ersilia's project, how this would advance your career and what are your plans during and after the internship.
If what you have seen and learnt in this first week is appealing and you want to continue working with us, open an application to Ersilia through the Outreachy website, and link the issue you have opened to track your contributions.
We will not be providing further feedback on your task if you do not record the first contribution on the Outreachy website. This is to ensure we focus our limited capacity for support of those candidates who want to make a final application to the organisation
The task of this week is to successfully install an ML model in your system (using third-party code). The goal of this week is to:
- Demonstrate your python knowledge
- Practise following documentation and implementing third party code
- Working with dependencies and conda environments
This task is very similar to what you will work on during the internship; finding open source code, testing it, debugging if necessary and finally, implementing it within Ersilia.
We propose one of the following models to be implemented. Please select one and explain in the GitHub issue why you have selected it (interest in the application, in the ML algorithm used...?)
These are Open Source models developed by academics mostly, and they have nothing to do with Ersilia or the Outreachy program, so please refrain from contacting the original authors without asking one of the Ersilia mentors first.
Follow the instructions on the specific model repository to install and run the model in your system. Report the issues you find along the process and how you solve them in the GitHub issue.
A good task here will detail each and every one of the steps taken to successfully run and implement the model in your local machine. If you simply mark the task as complete, the mentors will not be able to evaluate your knowledge.
Showing that you are not only able to run code but actually understand the outputs of the ML models is critical for the internship
Ersilia can run by downloading models from GitHub (using Git-LFS), from S3 buckets (our AWS backend) and by downloading models as Docker containers. Please read the instructions carefully and comment on your GitHub issue how are you currently running the models.
If you have not used Docker before, that is the moment to go for it - install DockerHub in your system and try to run a model from there! Report in the Git issue if you successfully fill this task!
The models we suggest are models that are already incorporated in the Ersilia Model Hub. Look for them, fetch them, run predictions and compare the output of the original model with the output of the original code.
A big part of what we do at Ersilia is to screen the scientific literature in search of new models and datasets of interest to our community. We are always looking for models that can help speed up drug discovery against infectious and neglected diseases.
In this week, focus on diving into the scientific literature, trying to find studies of interest to our community. To that end, we suggest using websites like PubMed, Google Scholar, BioRxiv and Papers with Code.
Find one publication that describes an open source ML model that could be of interest to Ersilia (activity against a specific pathogen, cytotoxicity, side effects...) and link it in the thread.
Explain what the model does (in your own words), why it would be relevant to Ersilia and how would you implement it (look at its code and whether it is ready to be used, e.g are the model checkpoints provided, is the underlying data available?)
Do not open a nw model request issue, or add it to our backlog. Only do so if mentors review the model suggestion and ask you to do it.
Continue your model search and suggest up to three studies/models/publications of interest, each one with detailed explanations.
Focus the last week ONLY in writing your final application to Outreachy. Mentors will not revise any contribution for the last week, only final applications, to ensure we can provide feedback on them.
Final applications must be submitted through the Outreachy website on time. We will not be able to provide help or support for last minute internet connection problems, late submissions and other issues. Please make sure you fill it in with time.