Model distillation with Olinda
Olinda is a model distillation tool for chemistry data
Olinda is a generic cheminformatics model distillation library. It can automatically distill models from Pytorch, Tensorflow, ONNX and models fetched directly from the Ersilia Model Hub.
Getting started
Olinda is available on PyPi and can be installed using pip.
How the distillation works?
The distillation function first downloads a reference SMILES dataset if it is not already present. It then generates featurized inputs using the reference SMILES dataset for training the student model. Next it uses the provided teacher model to generate input-output pairs. The input-output pairs together with the featurized inputs constitute the training dataset for the student model. Finally a suitable architecture for the student model is selected using heuristics and the selected model is trained using the training dataset.
Generate reference SMILES dataset
Generate a training dataset using the given teacher model
Search a suitable architecture for the student model
Train student model
During the distillation process, helpful messages and progress bars are printed to keep the user informed. In the case of a crash or process interruption the distillation process can be resumed automatically. It caches all the intermediate results in a local directory (xdg_home() / olinda
).
Distillation customization
The distillation API is very flexible and covers a wide varietry of use cases. User can easily customize the distillation behavior by passing parameters to the distill
function.
Last updated