Precalculation Store
Isaura is Ersiliaβs pre-calculation store: it stores model outputs in Ersilia output format, stores them efficiently in object storage, and serves them back later via exact or approximate lookup.
Benchmark: see Benchmarks.
How it works (mechanism): see How Isaura Works.
Quick start guide
Isaura uses uv for fast Python dependency management.
Start all services
Prerequisites
Docker installed and running
Docker Compose installed
Ubuntu: follow Dockerβs install docs
macOS:
brew install docker-compose
Fastest way
isaura engine --startOptional: Install MinIO Client (mc)
The MinIO Client (mc) is a command-line tool to manage MinIO/S3 storage.
Install (Linux/macOS)
curl -O https://dl.min.io/client/mc/release/linux-amd64/mc
chmod +x mc
sudo mv mc /usr/local/bin/Or with Homebrew (macOS)
brew install minio/stable/mcConfigure mc
mcmc alias set local http://localhost:9000 minioadmin123 minioadmin1234Example: list projects (buckets)
mc ls localMinIO web console:
Username:
minioadmin123Password:
minioadmin1234
More on mc: https://github.com/minio/mc?tab=readme-ov-file
Cloud functionality
Export the following environment variables.
Public cloud bucket (read/write)
export MINIO_CLOUD_AK=<Key here> # access key
export MINIO_CLOUD_SK=<Key here> # secret keyPrivate cloud bucket (read/write)
export MINIO_PRIV_CLOUD_AK=<Key here> # access key
export MINIO_PRIV_CLOUD_SK=<Key here> # secret keyHow Isaura works
An overview mechanism onto how Isaura operates to store and fetch calculations (more details here How Isaura Works.). It relies on four main different but related services.
Storage & query engines (summary)
DuckDB: used as the write engine (ingestion into Parquet chunks) and query engine (exact retrieval from Parquet).
MinIO: object storage backend for Parquet data + indexes.
Milvus: used only for approximate lookup (top-1 nearest input).
NNS: Go-based REST API server for high-performance Milvus ingest/query: NN-Search API
DuckDB object paths (βDuckDB URLβ)
Isaura organizes artifacts in MinIO and DuckDB reads/writes those objects through S3-compatible access.
Logical paths:
Parquet data:
s3://<bucket>/<eos_id>/<version>/data/chunk_*.parquet
Bloom filter:
s3://<bucket>/<eos_id>/<version>/bloom.pkl
Access metadata:
s3://<bucket>/<eos_id>/<version>/access.json
With MinIO, DuckDB is configured with an S3 endpoint (e.g.
http://localhost:9000) and credentials; the object keys remain unders3://....
Projects (buckets) and access
Isaura stores calculations in projects (MinIO buckets).
Default projects:
isaura-publicisaura-private
Folder structure
Example:
Each
chunk_{idx}.parquetcontains up to 2,000,000 rows (2M max) for DuckDB performance (row grouping / scanning efficiency).bloom.pklenables fast membership checks (βdoes this input exist?β).access.jsonstores inputs and their access classification (public/private).
Copying from custom projects into defaults
When you copy calculations for a given project + model + version:
Isaura reads
<custom_project>/<eos_id>/<version>/access.jsonIt routes each input/output into:
isaura-publicif access ispublicisaura-privateif access isprivate
Bloom filter(s) are updated accordingly.
Inputs are registered into Milvus at copy time (so they become available for approximate search).
Important: Milvus registration happens when copying from a custom project to the default projects, not necessarily at initial write into the custom project.
Approximate search (Milvus)
Approximate search is enabled when you request ANN/nearest-neighbor behavior.
Current behavior:
For each query input, Milvus returns the top-1 most similar stored input
Similarity metric: Jaccard similarity
Current input type supported for ANN: CPD inputs
Representation: 1024-bit Morgan fingerprints
Collection name:
{ersilia_eos_id}_{version}
Milvus stores input representations used for matching (not full model outputs). After the nearest stored input is found, Isaura fetches the corresponding cached outputs via DuckDB + MinIO and returns results in Ersilia output format.
Commands at a glance
Buckets are MinIO projects (storage directories) that hold model calculations.
write
β
-i/--input-file, -m/--model
-pn/--project-name, `--access [public
private
read
β
-i/--input-file, -m/--model
-pn/--project-name, --access, -v/--version, -o/--output-file, -nn
Read/download results for inputs in a CSV and optionally save as CSV/HDF5. Use -nn for approximate search (ANN).
copy
cp
-m/--model, -v/--version, -pn/--project-name, -o/--output-dir
β
Copy all artifacts for a model/version from a project to a local directory. If -o is omitted, logs counts; with -o it writes files.
move
mv
-m/--model, -v/--version, -pn/--project-name
β
Move/relocate server-side artifacts for a model/version within the project space.
remove
rm
-m/--model, -v/--version, -pn/--project-name, -y/--yes
β
Permanently delete artifacts for a model/version from a project. Safety-guarded by --yes.
inspect
β
-m/--model, -v/--version, -o/--output-file
-pn/--project-name, --access, -i/--input-file, --cloud
Inspect available items or validate inputs. With -i, validates inputs and writes a report; without -i, lists available entries.
catalog
β
-pn/--project-name
--cloud
List models present in a project (bucket).
Brief CLI usage examples
π§Ύ Write results calculation
isaura write -i data/ersilia_output.csv -m eos8a4x -v v2 -pn myproject --access public
Upload/write outputs (input column must be input) for a model + version using a CSV as input.
π₯ Read results (exact)
isaura read -i data/inputs.csv -m eos8a4x -v v2 -pn myproject -o data/outputs.csv
Read results for inputs and save to an output CSV file.
π Read results (approximate)
isaura read -i data/inputs.csv -m eos8a4x -v v2 -pn myproject -o data/outputs.csv -nn
Fetch results using approximate search (Milvus top-1 similar input).
π Copy buckets
isaura copy -m eos8a4x -v v1 -pn myproject-private -o ~/Documents/files/
Copy all model artifacts from a project to a local directory.
π Move buckets
isaura move -m eos9876 -v v1 -pn myproject-private
Move or relocate artifacts for a model/version within the project.
ποΈ Remove buckets
isaura remove -m eos8a4x -v v1 -pn myproject-private --yes
Permanently delete artifacts for a model/version from a project.
π Inspect inputs (validate)
isaura inspect inputs -m eos8a4x -v v1 -pn myproject -i data/inputs.csv -o reports/inspect_report.csv
Validate input data and output a report.
π List available model results
isaura inspect -m eos8a4x -v v1 -o reports/available.csv
List all available inputs/files for a model/version.
π Catalog project models
isaura catalog -pn myproject
Display all models within a project.
API usage examples
Last updated
Was this helpful?

