👩‍💻
Event Fund
  • Bringing data science and AI/ML tools to infectious disease research
  • Session 1
    • Skills: open chemistry datasets
    • Breakout: working with chemistry data
  • Session 2
    • Skills: building an ML model for chemistry
    • Breakout: the Ersilia Model Hub
  • Session 4
    • Skills: using OS models
    • Breakout: generative models
  • Extra content
    • Git and Github
  • Documents
    • Tools
    • Glossary
    • Code of Conduct
    • Image and media policy
Powered by GitBook
On this page
  • GitHub
  • Accessing the Ersilia Model Hub
  1. Session 4

Skills: using OS models

This section serves as a guideline for the Skills Development Session 4

PreviousBreakout: the Ersilia Model HubNextBreakout: generative models

Last updated 2 years ago

GitHub

GitHub is an internet hosting service for software development and version control using Git. It allows to easily collaborate between several developers and across organisations. Ersilia, like most organisations and computational laboratories, centralizes its open source code in GitHub.

Each project in GitHub is stored in a unique "Repository". Most academic publications cite code that is deposited in GitHub. When you land in a new GitHub repository, it is important to look at:

  • License file: whether the code is released under an approved OS and can be used to our problem of interest

  • Readme file: featured in the landing page of the repository, highlights the main information about the code you can find there, often also contains relevant links to publications and to how to cite the software

You can read more about how to work with GitHub, writing issues to authors and cloning repositories in the extra section about and its associated .

Accessing the Ersilia Model Hub

There is extensive on how to install and use all models in the Ersilia Model Hub through the published python package and CLI. It requires a UNIX system (MacOS or Linux), and Windows users need to install an WSL.

We have prepared a simpler environment in a that provides an easy to use API to prepare your data and access several predictions from the Hub.

The main steps featured in the notebook are:

  1. Connection to Google Drive (where we will centralize our data)

  2. Standardisation of the SMILES according to ChEMBL rules using the package (more information on the steps performed by the standardiser ).

  3. Selection of the model and running the basic commands:

    1. Fetch

    2. Serve

    3. Predict

  4. Visualise the model output in tabular format and, if possible, the distribution of the output variable in a histogram.

If the runtime disconnects, remember to run all the cells again

When writing paths and names (strings in python) you must take into account lower and upper case and other possible misspellings.

Git and GitHub
presentation
Google Colab notebook
standardiser
here
documentation