Ersilia Book
  • 🤗Welcome to Ersilia!
    • The Ersilia Open Source Initiative
    • Ten principles
    • Ersilia's ecosystem
  • 🚀Ersilia Model Hub
    • Getting started
    • Online inference
    • Local inference
    • Model contribution
      • Model template
      • Model incorporation workflow
      • Troubleshooting models
      • BioModels annotation
    • For developers
      • Command line interface
      • CI/CD workflows
      • Test command
      • Testing playground
      • Model packaging
      • Inputs
      • Codebase quality and consistency
      • Results caching
  • 💊Chemistry tools
    • Automated activity prediction models
      • Light-weight AutoML with LazyQSAR
      • Accurate AutoML with ZairaChem
      • Model distillation with Olinda
    • Sampling the chemical space
    • Encryption of AI/ML models
  • AMR chemical collections
  • 🙌Contributors
    • Communication channels
    • Tech stack
    • Internships
      • Outreachy Summer 2025
      • Outreachy Winter 2024
      • Outreachy Summer 2024
      • Outreachy Winter 2023
      • Outreachy Summer 2023
      • Outreachy Winter 2022
      • Outreachy Summer 2022
  • 📑Training materials
    • AI2050 intro workshop
    • AI2050 AI for Drug Discovery
    • Introduction to ML for Drug Discovery
    • Python 101
    • External resources
  • 🎨Styles
    • Brand guidelines
    • Slide and document templates
    • Scientific figures with Stylia
    • Coding style
  • 🌍About Us
    • Where to find us?
    • Diversity and inclusion statement
    • Code of conduct
    • Open standards and best practices
    • Ersilia privacy notice
    • Strategic Plan 2025-2027
    • Ersilia, the Invisible City
Powered by GitBook

2025, Ersilia Open Source Initiative

On this page
  • Background
  • Model Metadata
  • Why is Annotation Important?
  • Controlled Vocabularies and Ontology
  • How do we Annotate?
  • Tools for Annotation
  • Ontology Lookup Service
  • Zooma
  • Steps to Annotating a Model - An Example
  • 1. Identify a Model and its associated Information
  • 2. Assign the Metadata Entity
  • 3. Extract all information available for each metadata category.
  • 4. Map the Metadata to the right Ontology
  • 5. Link the Ontology to their values
  • 6. Associate the right qualifier to each annotation
  • 7. Contextualize the Computational Metadata by adding DOME
  • Resources & References

Was this helpful?

  1. Ersilia Model Hub
  2. Model contribution

BioModels annotation

This page describes how to annotate Ersilia models in the BioModels Tool contributing towards FAIRness.

Background

Sharing of machine learning models most importantly in the field of drug discovery is important in creating a FAIReR (Findable, Accessible, Interoperable, Reusable, and Reproducible) collection of machine learning models which in turn makes it easier to reproduce and reuse these models. This reduces the need to rebuild models from scratch, increases their usefulness in various applications, and speeds up progress in drug discovery.

In the process of making ML models FAIR and shareable, there are standards and protocols to follow which includes; sharing model training code, dataset information, reproduced figures, model evaluation metrics, trained models, Docker files, model metadata, and FAIR dissemination. Here, we will focus on the Model Metadata, and its annotation.

Model Metadata

To share ML models effectively, it’s important to provide relevant information about the models. The information being the Metadata. Metadata is organised information that describes, explains, or helps find, use, or manage a resource. In the context of Ersilia Models, metadata is data about the model and it is classified into three categories namely; Biological Metadata, Computational Metadata, and the Description Metadata. The metadata enables the findability and accessibility of the models based on its specific characteristics by other researchers and modellers.

  • Biological Metadata

    The biological relevance of a model is an important aspect of a model. This ranges from its bioactivity, biological processes explained by the model, biological system the model was trained on, tissue or cell type involved, assay type, biological entity, Ersilia model theme (ranges from infectious disease to ADME property) and Compartment in which biological process is happening.

  • Computational Metadata

    The metadata identifies the model based on its specific characteristics, such as the type of ML algorithm used, the modelling approach, its evaluation metrics, and the functional properties of the model such as input data type, and model output.

  • Description Metadata

    A model is described by its publication, a code base such as GitHub, data repository such as Zenodo, and lastly its deployment which could be in the form of a web server.

Why is Annotation Important?

An annotation is an association of a metadata with an ontology term. Annotating a model consists of mapping the identified model metadata with terms from controlled vocabularies and entries in data resources.

Annotation of a Model Metadata are crucial to:

  • Precisely identify model categories

  • Improve understanding of the model's structure

  • Make it easier to compare different models

  • Simplify model integration

  • Enable efficient searches

  • Add meaningful context to the model

  • Enhance understanding of the biology behind the model

  • Allow conversion and reuse of the model

  • Facilitate integration of the model with biological knowledge

Controlled Vocabularies and Ontology

Controlled vocabularies contain set terms that describe concepts in a specific domain. These terms have definitions that help us understand and agree on their meanings, and also helps with indexing and easy retrieval. Ontologies use controlled vocabularies to describe concepts and how they are related in a structured & computable format.

The following ontologies are preferred;

How do we Annotate?

This is the process of curating an annotation file with all essential information. Annotation is done by linking the right ontology, a cross-reference to a metadata with the addition of values and qualifiers in order to explicitly define the relationship between the model metadata, and the linked resources.

  1. Metadata is the data of a data. The latter being the model. This model is referred to as the entity in annotation.

    • Entity (eg. Model)

  2. Extract all information available for each metadata categories.

  3. Each annotation is linked to external data resources and values e.g., EDAM, STATO. An external data resource could be a database of ontology or the ontology itself.

    • This improves the model quality

    • Essential for the model search criteria

  4. Value enhances the accessibility and integrates a metadata with other data resources using a compact identifier.

    • Metadata (eg. Machine learning)

    • Ontology (Bioinformatics Concept EDAM: edam:topic_3474)

    • Value (https://identifiers.org/bptl/edam:topic_3474)

    • Qualifier (eg. bqbiol and bqmodel)

    • A relationship is either biological - bqbiol, or computational - bqmodel.

    The following qualifiers are used to describe relationships in the annotation;

    • bqbiol:hasTaxon - describes a relationship between a model and organism

    • bqbiol:occursIn - a compartment where a process occurs

    • bqbiol:hasProperty - general biological property

    • bqmodel:hasProperty - all model properties

    • bqmodel:isDescribedBy - the model resources

    • bqbiol:hasInput - model input data

    • bqbiol:hasDataset - the model training data

    • bqbiol:hasOutput - biological output of the model

    • D - Data ( this could be data source or the type of input data

    • O - Optimization (each model has their algorithm)

    • M - Model (the model source code and it’s executable form)

    • E - Evaluation metrics (all model performance are evaluated)

Tools for Annotation

OLS is a search and visualisation service that hosts 260+ biological and biomedical ontologies in one place.

Zooma is a ontology mapping tool that can be used to automatically map free text

Steps to Annotating a Model - An Example

1. Identify a Model and its associated Information

Associated information includes the model publication, repository and its source code.

Read the Publication

All model metadata are enclosed in its publication and it’s important to read the publication to understand the biological or chemical processes the model performs, its bioactivity, the algorithm of the model, its training data and the validation performed among other information. Go through the repository to validate the information in the publication.

Special Scenario

There are models that are built by fine-tuning other large models with different datasets, and performing several tasks. It’s important to understand the base models in this case, and all its properties.

Case-study

2. Assign the Metadata Entity

The metadata entity is the source of the metadata. It’s more of the Metadata Data which is its Model. Adding the metadata entity makes the table looks like this;

Entity

Model

Model

Model

3. Extract all information available for each metadata category.

Biological Metadata

These metadata are extracted from the abstract section of the publication and includes the disease, causative agents, data classification and the biological output of the model.

Computational Metadata

These includes the algorithm of the model, it’s evaluation method, and data type

Descriptive Metadata

Each model is described by a publication, source code and its implementation.

Entity
Model Metadata Categories
Metadata

Model

Biological Metadata

  1. Homo sapiens

  2. Plasmodium falciparum

  3. Malaria

  4. Antimalarial properties

  5. Active

  6. Inactive

  7. Antimalarial compounds prediction

Model

Computational Metadata

  1. Classification models

  2. Naïve Bayesian model

  3. AUC–ROC

  4. 5-fold cross validation

  5. Smiles descriptors

  6. malaria dataset

Model

Descriptive Metadata

  1. MAIP web platform - Source code

  2. Ersilia Incorporation URL

  3. PubMed URL

For the purpose of example, these are sample Metadata from this model and its classification.

P.S: Column 2 (Model Metadata Categories) is just for descriptive purpose. It's not part of the annotation

Special Scenario

Some models were validated after building experimentally. This validation is done either in-vivo or in-vitro. It occurs as a predicted compound from the model being further validated experimentally to confirm its bioactivity. This validation is important for models that undergo such and should be annotated for the model.

4. Map the Metadata to the right Ontology

To ensure standardization and interoperability, it's crucial to identify relevant ontologies through the Ontology Lookup Service (OLS). These ontologies will help in annotating the model components accurately. Search for your terms, for example, Machine Learning, in the search bar, and select the right term in the preferred ontology. If not found in the preferred ontology, look through other available options with the right meaning.

Sometimes, the exact term isn't found in the OLS, and in this case, the closest term can be used to replace the metadata.

In choosing the right ontology for a metadata, there are important things to consider.

  1. The ontology with the best metadata meaning

  2. Inclusive of a preferred ontology for better indexing

After mapping the metadata to the right ontology, we have a table like this;

Entity
Preferred Ontology
Metadata

Model

  1. NCBI Taxonomy

  2. NCBI Taxonomy

  3. Experimental Factor Ontology EFO

  4. NCI Thesaurus OBO Edition NCIT

  5. NCI Thesaurus OBO Edition NCIT

  6. NCI Thesaurus OBO Edition NCIT

  7. Chemical Entities of Biological Interest CHEBI

  1. Homo sapiens

  2. Plasmodium falciparum

  3. Malaria

  4. Antimalarial properties

  5. Active

  6. Inactive

  7. Antimalarial compounds prediction

Model

  1. STATO: the statistical methods ontology

  2. STATO: the statistical methods ontology

  3. STATO: the statistical methods ontology

  4. Ontology for Biomedical Investigations OBI

  5. Chemical information ontology (cheminf)

  6. NCI Thesaurus OBO Edition NCIT

  1. Classification models

  2. Naïve Bayesian model

  3. AUC–ROC

  4. 5-fold cross validation

  5. Smiles descriptors

  6. malaria dataset

Model

Special cases

  1. MAIP web platform - Source code

  2. Ersilia Incorporation URL

  3. PubMed URL

Note that the descriptive Medatada, like the PubMed URL of the model, or the specific Ersilia GitHub repository where the model is hosted, do not have an ontology (special cases).

5. Link the Ontology to their values

Values enhance accessibility and integrate metadata with other data resources in the form of a URL (Uniform Resource Locator).

Each ontology is linked to their respective value using the formula above, and the table looks like this;

Entity
Preferred Ontology
Values
Metadata

Model

  1. NCBI Taxonomy

  2. NCBI Taxonomy

  3. Experimental Factor Ontology EFO

  4. NCI Thesaurus OBO Edition NCIT

  5. NCI Thesaurus OBO Edition NCIT

  6. NCI Thesaurus OBO Edition NCIT

  7. Chemical Entities of Biological Interest CHEBI

  1. Homo sapiens

  2. Plasmodium falciparum

  3. Malaria

  4. Antimalarial properties

  5. Active

  6. Inactive

  7. Antimalarial compounds prediction

Model

  1. STATO: the statistical methods ontology

  2. STATO: the statistical methods ontology

  3. STATO: the statistical methods ontology

  4. Ontology for Biomedical Investigations OBI

  5. Chemical information ontology (cheminf)

  6. NCI Thesaurus OBO Edition NCIT

  1. Classification models

  2. Naïve Bayesian model

  3. AUC–ROC

  4. 5-fold cross validation

  5. Smiles descriptors

  6. malaria dataset

Model

  1. Online Web server

  2. Ersilia Model Hub

  3. PubMed Identification Number PMID

  1. MAIP web platform - Source code

  2. Ersilia Incorporation URL

  3. PubMed URL

6. Associate the right qualifier to each annotation

Each metadata as previously explained is either a biology component of the model or a computational component or a descriptive component. Here, we’d annotate the metadata based on the category the fall.

For example;

Metadata Category
Metadata
Qualifier

Biological Metadata

Malaria

bqbiol:hasProperty

Computational Metadata

naïve Bayesian model

bqmodel:hasProperty

Descriptive Metadata

Ersilia Incorporation URL

bqmodel:isDescribedBy

After adding qualifiers to each metadata, the table looks like this;

Entity
Qualifiers
Preferred Ontology
Values
Metadata

Model

  1. bqbiol:hasTaxon

  1. bqbiol:hasTaxon

  1. bqbiol:hasProperty

  1. bqbiol:hasProperty

  1. bqbiol:hasProperty

  1. bqbiol:hasProperty

  1. bqbiol:hasOutput

  1. NCBI Taxonomy

  2. NCBI Taxonomy

  3. Experimental Factor Ontology EFO

  4. NCI Thesaurus OBO Edition NCIT

  5. NCI Thesaurus OBO Edition NCIT

  6. NCI Thesaurus OBO Edition NCIT

  7. Chemical Entities of Biological Interest CHEBI

  1. Homo sapiens

  2. Plasmodium falciparum

  3. Malaria

  4. Antimalarial properties

  5. Active

  6. Inactive

  7. Antimalarial compounds prediction

Model

  1. bqmodel:hasProperty

  1. bqmodel:hasProperty

  1. bqmodel:hasProperty

  1. bqmodel:hasProperty

  1. bqbiol:hasInput

  1. bqbiol:hasDataset

  1. STATO: the statistical methods ontology

  2. STATO: the statistical methods ontology

  3. STATO: the statistical methods ontology

  4. Ontology for Biomedical Investigations OBI

  5. Chemical information ontology (cheminf)

  6. NCI Thesaurus OBO Edition NCIT

  1. Classification models

  2. Naïve Bayesian model

  3. AUC–ROC

  4. 5-fold cross validation

  5. Smiles descriptors

  6. malaria dataset

Model

  1. bqmodel:isDescribedBy

  1. bqmodel:isDescribedBy

  1. bqmodel:isDescribedBy

  1. Online Web server

  2. Ersilia Model Hub

  3. PubMed Identification Number PMID

  1. MAIP web platform - Source code

  2. Ersilia Incorporation URL

  3. PubMed URL

7. Contextualize the Computational Metadata by adding DOME

The DOME annotation provides more contexts to the computational metadata by identifying which section of the modelling the metadata belong to.

Adding DOME to the table shows this;

Metadata
DOME

classification models

naïve Bayesian model

AUC–ROC

5-fold cross validation

Optimization-Algorithm

Optimization-Algorithm

Evaluation-Performance Measure

Evaluation-Method

MAIP web platform - Source code

Ersilia Incorporation URL

Smiles descriptors

malaria dataset

predictions of potential Antimalarial compounds

Model-Executable form

Model-Executable form

Data-Input

Data-Source

Model-Output; Classification

Resources & References

PreviousTroubleshooting modelsNextFor developers

Last updated 5 months ago

Was this helpful?

Append qualifiers to each annotation. explain the relationship between a metadata and the model itself.

gives more context to the computational metadata

For the purpose of example, we're working with an with the tag from the Ersilia Model. In this case, this is a collaborative project between the EMBL-EBI and other big pharma and institutes. Here, an individual QSAR model to identify novel molecules that may have antimalarial properties built on private dataset was merged together to develop MAIP. A free web platform available for mass prediction of potential malaria inhibiting compounds.

To identify all model metadata associated with this model. We’d go through the publication, the web platform to understand its pipeline, its source code and Ersilia implementation process. This can be adapted to individual use and a sample visual can be seen below.

This is the main process of annotation, and it’s associating a metadata to the right ontology. Ontology can be identified through the . The Ontology Lookup Service (OLS) is a repository for biomedical ontologies that aims to provide a single point of access to the latest ontology versions.

Each ontology has its accession identifier and a value is formed using the ontology identifier with a compact identifier. The compact identifier is a resolution service that provides consistent access in form of

Value = + NCIT:C176231

Value =

🚀
NCI Thesaurus OBO Edition NCIT
BioAssay Ontology BAO
Bioinformatics Concept EDAM
STATO: the statistical methods ontology
The BRENDA Tissue Ontology (BTO)
Chemical Entities of Biological Interest CHEBI
Chemical Information Ontology (cheminf)
Chemical Methods Ontology CHMO
Drug-drug Interaction and Drug-drug Interaction Evidence Ontology (DIDEO)
Gene Ontology GO
Infectious Disease Ontology IDO
Model Card Ontology MCRO
Molecular Interactions MI
Mondo Disease Ontology MONDO
NCBI Taxonomy
OBCS: Ontology of Biological and Clinical Statistics
Ontology for Biomedical Investigations OBI
Ontology for MIRNA Target OMIT
Ontology for Parasite Lifecycle OPL
PATO - the Phenotype And Trait Ontology
PRotein Ontology PR
Uber-anatomy ontology UBERON
Experimental Factor Ontology EFO
Qualifiers
Dome Annotation
Ontology Lookup Service
Zooma
antimalarial model
eos4zfy
template
Ontology Lookup Service
https://identifiers.org/
https://identifiers.org/
https://identifiers.org/NCIT:C176231
The Use Case Published Annotation
BioModels Annotation SOP
Dome Annotation
Ersilia Model for Use Case
Associated Publication for the Use Case
Annotation Template
BioModelsML Publication
https://identifiers.org/taxonomy/9606
https://identifiers.org/taxonomy/5833
https://identifiers.org/EFO:0001068
https://identifiers.org/NCIT:C271
https://identifiers.org/NCIT:C45329
https://identifiers.org/NCIT:C154407
https://identifiers.org/CHEBI:38068
https://identifiers.org/STATO:0000031
https://identifiers.org/STATO:0000530
https://identifiers.org/STATO:0000274
https://identifiers.org/obi:OBI_0200032
https://identifiers.org/CHEMINF:000018
https://identifiers.org/NCIT:C47824
https://www.ebi.ac.uk/chembl/maip/
https://github.com/ersilia-os/eos4zfy
https://identifiers.org/pubmed:33618772
https://identifiers.org/taxonomy/9606
https://identifiers.org/taxonomy/5833
https://identifiers.org/EFO:0001068
https://identifiers.org/NCIT:C271
https://identifiers.org/NCIT:C45329
https://identifiers.org/NCIT:C154407
https://identifiers.org/CHEBI:38068
https://identifiers.org/STATO:0000031
https://identifiers.org/STATO:0000530
https://identifiers.org/STATO:0000274
https://identifiers.org/obi:OBI_0200032
https://identifiers.org/CHEMINF:000018
https://identifiers.org/NCIT:C47824
https://www.ebi.ac.uk/chembl/maip/
https://github.com/ersilia-os/eos4zfy
https://identifiers.org/pubmed:33618772