# Glossary

## Day 1

**Bioavailability:** fraction of the active form of a drug that reaches systemic circulation unaltered

**Chemical space:** virtual space spanned by all possible molecules and chemical compounds adhering to a given set of construction principles and boundary conditions. Oftentimes, used also to refer to a group of molecules belonging to a particular subspace characterized by their physicochemical and structural features.

**Dimensionality reduction:** transformation of data from a high-dimensional space (for example, a vector of 10000 points) to a low-dimensional space (2 numbers) so that the low-dimensional representation still retains some meaningful properties of the original data.

## Day 2

**Classification:** type of supervised ML modelling where the model learns to categorize each input into a specific class.

**Contingency table:** a matrix table describing the distribution of variables across two or more categories. For a binary classification ML model, a contingency table counts the distribution of molecules across 2 variables (real values and predicted values).

**Featurization:** in chemioinformatics, the process of converting a molecule (usually represented by its SMILES string) into a vector of numbers that can be passed to the algorithm. The better we are able to represent a molecule as a numerical vector (i.e, not lose information) the more informative our model will be.

**Input:** data you pass onto an ML model (by convention represented by X)

**Output:** data you get from an ML model (by convention represented by Y)

**Precision:** measure of a classification ML model performance. It measures the proportion of predicted positives that are actually positive.

**Probability:** likelihihood that a proposition is true. Applied to the output of a classification ML model, how likely it is that a molecule belongs to a specific class.

**Recall:** measure of a classification ML model performance. It measures how many positives we were actually able to identify.

**Reinforcement Learning:** a subcategory of machine learning where the algorithm (agent) learns through a process of trial and error. In chemioinformatics, a generative model learning to predict new molecules is an RL method.

**Regression:** type of supervised machine learning where the algorithm learns to predict a continuous outcome.

**ROC curve**: graph that shows the performance of a classification ML model at all threshold levels. Each point in the ROC curve is the True Positive Rate (TPR) and False Positive Rate (FPR) for a specific threshold.

**Unsupervised machine learning:** subcategory of machine learning where the algorithm is trained on unlabelled data with the goal to identify patterns in it. For example, a UMAP representation of a chemistry dataset belongs to the unsupervised machine learning class.

**Supervised machine learning:** subcategory of machine learning where the algorithm is trained on a dataset of input-output pairs (labelled data) and it learns to map an input to a specific output based on the training set.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ersilia.gitbook.io/event-fund/documents/glossary.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
