Test command
The Ersilia test command is designed to automatize model testing and validation at incorporation time and during routine maintenance of the models.
TL:DR
The test command is a CLI command on the Ersilia Model Hub that automatically performs several checks on an individual model.
To run the test command you need to install Ersilia with the extra packages required for testing:
The test command has three levels of complexity:
Basic: high level tests to ensure all the necessary files are available and metadata is compliant with Ersilia standards. It does not actually run the model, and is designed to be a quick maintenance for models
Surface: performs all basic tests and a simple run of the model fetched from the specified source through Ersilia's CLI. Designed to be run as a quick check during model maintenance.
Shallow: will run all the tests from the surface command and then test the model more thoroughly by testing all input types (
string
,list
,csv
) and output types (csv
,json
,h5
). Also ensures the consistency of results between runs. Designed to be run by model contributors prior to incorporating the model and when any change is introduced in the model.Deep: performs all shallow tests and in addition calculates performance metrics for the model. Designed to be used only by Ersilia maintainers when a new model is incorporated or significant changes are introduced.
Hence, a test command could look like:
Inspect test
Usage
The model is downloaded (not fetched) from its online storage in S3 or Github. By default, if no flag is specified the model will be fetched from GitHub. In addition, the basic tests can also be performed from a local directory, for example when a contributor is incorporating a new model.
--from_github
True
Downloads the model from its repository on the ersilia-os organisation.
--from_s3
False
Downloads the model from its storage on the cloud (S3 Bucket)
--from_dir [path/to/dir]
False
Uses a model stored locally on the indicated path.
--from_dockerhub
False
It will default back to from_github, as the model needs to be downloaded
Tests performed
Metadata checks: the model metadata is available in the
metadata.json
ormetadata.yml
and in the correct format. If the model is not yet fully incorporated in Ersilia, fields like S3 URL or Docker Architecture will not exist. Importantly those are marked as Not Passed, instead of Failed.Model file checks: all required model files exist for either of Ersilia package modes:
BentoML packaging: Dockerfile, metadata.json, run.sh, service.py, pack.py, README.md, LICENSE.
FastAPI packaging: install.yml, metadata.yml, README.md, LICENSE files, model/framework/examples/run_input.csv, model/framework/examples/run_output.csv, model/framework/columns/run_columns.csv, model/framework/run.sh,
File validity check: checks that the following files have the expected structure:
Columns: the columns specified in the
run_columns.csv
and therun_output.csv
coincide.
Model directory size: calculates the size of the model directory (including model checkpoints)
Outputs
The terminal will print four tables, one per each type of test specified above, and whether each check has PASSED or FAILED. In the .json
file, the tests appear as True (passed) or False (failed). The -v
flag can always be used to see more information on the terminal.
Surface test
Usage
The model is downloaded (not fetched) from its online storage in S3 or Github, and fetched via the specified source (including Dockerhub). By default, if no flag is specified the model will be fetched from GitHub. In addition, the basic tests can also be performed from a local directory, for example when a contributor is incorporating a new model.
--from_github
True
Downloads the model from its repository on the ersilia-os organisation.
--from_s3
False
Downloads the model from its storage on the cloud (S3 Bucket)
--from_dir [path/to/dir]
False
Uses a model stored locally on the indicated path.
--from_dockerhub
False
Downloads the model from GitHub and fetches it via DockerHub for the simple run
Tests performed
All the basic tests and in addition
Model size check: it also calculates the environment size (if fetched
--from_github
,--from_s3
or--from_dir
) or image size (if fetched--from_dockerhub
)Model run check: fetches the model through Ersilia's CLI from the specified source and then runs the
run_input.csv
. Outputs the result in.csv
format and ensures that the result is not all None's and that the columns match the results inrun_output.csv
Outputs
The terminal will print five tables, one per each type of test specified above, and whether each check has PASSED or FAILED. In the .json
file, the tests appear as True (passed) or False (failed). The -v
flag can always be used to see more information on the terminal.
Shallow test
Usage
Similar to the surface but performs more tests once the model is fetched.
--from_github
True
Downloads the model from its repository on the ersilia-os organisation and then fetches it from the created folder.
--from_s3
False
Downloads the model from its storage on the cloud (S3 Bucket) and then fetches it from the created folder.
--from_dir [path/to/dir]
False
Fetches a model stored locally on the indicated path.
--from_dockerhub
False
Fetches the model from DockerHub, and in parallel downloads it from GitHub to perform the basic tests
Tests performed
In addition to the basic tests:
Input output check: the test module performs several runs and ensures that the output can be passed in all accepted formats (single molecule, list or
.csv
) saved in all available formats (.csv
,.json
,.h5
)Model output consistency check: the output is consistent between runs (for the same molecule, same result or small divergence in non-stochastic models). The consistency will be calculated for both string and numerical outputs using scores like
rmse
andspearmanr
(Spearman correlation coefficient with associated p-value). Theexample/run_input.csv
file is used for this check.Consistency summary between ersilia and bash execution: the output is consistent between running the model via Ersilia or directly from the
run.sh
bash file inmodel/framework
. Theexample/run_input.csv
file is used for this check.
The --shallow
test will first run all --surface
checks. If the simple run fails, the test command will exit early to save time as the subsequent tests would also fail.
Outputs
The terminal will print eight tables, one per each type of test specified above, and whether each check has PASSED or FAILED. In the .json
file, the tests appear as True (passed) or False (failed). The -v
flag can always be used to see more information on the terminal.
Deep test
Model source
The command will perform the basic and shallow tests and in addition run predictions for one, fifty and one hundred inputs and calculate the computational performance.
--from_github
True
Downloads the model from its repository on the ersilia-os organisation and then fetches it from the created folder.
--from_s3
False
Downloads the model from its storage on the cloud (S3 Bucket) and then fetches it from the created folder.
--from_dir [path/to/dir]
False
Fetches a model stored locally on the indicated path.
--from_dockerhub
False
Fetches the model from DockerHub, and in parallel downloads it from GitHub to perform the basic tests
Tests performed
Computational performance assessment: after serving the models, they get executed for 1, 10, 100, 1000 and 10000 inputs. Model performance (seconds) will be recorded and reported using
wall clock
. By default, adeterministic
flag is enabled for theexample
command, ensuring all models use the same list of molecules for the test command.
The --deep
test will first run all --surface
and --shallow
checks. If the simple run fails, or the consistency checks fail, the test command will exit early.
Outputs
The same tables as in the basic, surface and shallow test and an additional model performance table evaluating the performance (time (s)) taken to run inputs of each length.
Detailed Methods
The mechanism involves several services and classes working together to ensure the model's functionality and reliability. The main components are:
RunnerService
: Manages the execution of model tests and checks.InspectService
: Inspects models and their configurations.CheckService
: Performs various high level checks mentioned above checks on the model.IOService
: Handles input/output operations related to model testing.ModelTester
: A high-level class that orchestrates the testing process.
The process typically involves:
Setting up the environment and fetching the model repository.
Running various checks to ensure the model's integrity.
Generating tables and logs of the results.
Cleaning up temporary files and directories (model directory only if the
--remove
flag is enabled).
Last updated
Was this helpful?