git clone https://github.com/ersilia-os/lazy-qsar.git
cd lazy-qsar
python -m pip install -e .
from tdc.single_pred import Tox
data = Tox(name = 'hERG')
split = data.get_split()
Here we are selecting the hERG blockade toxicity dataset. Let's refactor data for convenience.
smiles_train = list(split["train"]["Drug"])
y_train = list(split["train"]["Y"])
smiles_valid = list(split["valid"]["Drug"])
y_valid = list(split["valid"]["Y"])
Now we can train (and validate) a model based on Morgan fingerprints.
import lazyqsar as lq
# train
model = lq.MorganBinaryClassifier()
model.fit(smiles_train, y_train)
# validate
from sklearn.metrics import roc_curve, auc
y_hat = model.predict_proba(smiles_valid)[:,1]
fpr, tpr, _ = roc_curve(y_valid, y_hat)
print("AUROC", auc(fpr, tpr))