Ersilia Book
  • 🤗Welcome to Ersilia!
    • The Ersilia Open Source Initiative
    • Ten principles
    • Ersilia's ecosystem
  • 🚀Ersilia Model Hub
    • Getting started
    • Online inference
    • Local inference
    • Model contribution
      • Model template
      • Model incorporation workflow
      • Troubleshooting models
      • BioModels annotation
    • For developers
      • Command line interface
      • CI/CD workflows
      • Test command
      • Testing playground
      • Model packaging
      • Inputs
      • Codebase quality and consistency
      • Results caching
  • 💊Chemistry tools
    • Automated activity prediction models
      • Light-weight AutoML with LazyQSAR
      • Accurate AutoML with ZairaChem
      • Model distillation with Olinda
    • Sampling the chemical space
    • Encryption of AI/ML models
  • AMR chemical collections
  • 🙌Contributors
    • Communication channels
    • Tech stack
    • Internships
      • Outreachy Summer 2025
      • Outreachy Winter 2024
      • Outreachy Summer 2024
      • Outreachy Winter 2023
      • Outreachy Summer 2023
      • Outreachy Winter 2022
      • Outreachy Summer 2022
  • 📑Training materials
    • AI2050 intro workshop
    • AI2050 AI for Drug Discovery
    • Introduction to ML for Drug Discovery
    • Python 101
    • External resources
  • 🎨Styles
    • Brand guidelines
    • Slide and document templates
    • Scientific figures with Stylia
    • Coding style
  • 🌍About Us
    • Where to find us?
    • Diversity and inclusion statement
    • Code of conduct
    • Open standards and best practices
    • Ersilia privacy notice
    • Strategic Plan 2025-2027
    • Ersilia, the Invisible City
Powered by GitBook

2025, Ersilia Open Source Initiative

On this page
  • Getting started
  • How the distillation works?
  • Distillation customization

Was this helpful?

  1. Chemistry tools
  2. Automated activity prediction models

Model distillation with Olinda

Olinda is a model distillation tool for chemistry data

PreviousAccurate AutoML with ZairaChemNextSampling the chemical space

Last updated 2 years ago

Was this helpful?

is a generic cheminformatics model distillation library. It can automatically distill models from Pytorch, Tensorflow, ONNX and models fetched directly from the .

Getting started

Olinda is available on PyPi and can be installed using pip.

from olinda import distill

student_model = distill(your_model)

How the distillation works?

The distillation function first downloads a reference SMILES dataset if it is not already present. It then generates featurized inputs using the reference SMILES dataset for training the student model. Next it uses the provided teacher model to generate input-output pairs. The input-output pairs together with the featurized inputs constitute the training dataset for the student model. Finally a suitable architecture for the student model is selected using heuristics and the selected model is trained using the training dataset.

  1. Generate reference SMILES dataset

  2. Generate a training dataset using the given teacher model

  3. Search a suitable architecture for the student model

  4. Train student model

During the distillation process, helpful messages and progress bars are printed to keep the user informed. In the case of a crash or process interruption the distillation process can be resumed automatically. It caches all the intermediate results in a local directory (xdg_home() / olinda).

Distillation customization

The distillation API is very flexible and covers a wide varietry of use cases. User can easily customize the distillation behavior by passing parameters to the distill function.

def distill(
    model: Any,
    featurizer: Optional[Featurizer],
    working_dir: Path,
    clean: bool = False,
    tuner: ModelTuner = AutoKerasTuner(),
    reference_smiles_dm: Optional[ReferenceSmilesDM] = None,
    featurized_smiles_dm: Optional[FeaturizedSmilesDM] = None,
    generic_output_dm: Optional[GenericOutputDM] = None,
) -> pl.LightningModule:
    """Distill models.

    Args:
        model (Any): Teacher Model.
        featurizer (Optional[Featurizer]): Featurizer to use.
        working_dir (Path): Path to model workspace directory.
        clean (bool): Clean workspace before starting.
        tuner (ModelTuner): Tuner to use for selecting and optimizing student model.
        reference_smiles_dm (Optional[ReferenceSmilesDM]): Reference SMILES datamodules.
        featurized_smiles_dm (Optional[FeaturizedSmilesDM]): Reference Featurized SMILES datamodules.
        generic_output_dm (Optional[GenericOutputDM]): Precalculated training dataset for student model.

    Returns:
        pl.LightningModule: Student Model.
    """

💊
Olinda
Ersilia Model Hub