Precalculation Store

Isaura is Ersilia’s pre-calculation store: it stores model outputs in Ersilia output format, stores them efficiently in object storage, and serves them back later via exact or approximate lookup.


Quick start guide

Isaura uses uv for fast Python dependency management.

1

Clone and set up

git clone https://github.com/ersilia-os/isaura.git
cd isaura
uv sync
source .venv/bin/activate
2

Start all services

Prerequisites

  • Docker installed and running

  • Docker Compose installed

    • Ubuntu: follow Docker’s install docs

    • macOS: brew install docker-compose

Fastest way

isaura engine --start
3

Optional: Install MinIO Client (mc)

The MinIO Client (mc) is a command-line tool to manage MinIO/S3 storage.

Install (Linux/macOS)

curl -O https://dl.min.io/client/mc/release/linux-amd64/mc
chmod +x mc
sudo mv mc /usr/local/bin/

Or with Homebrew (macOS)

brew install minio/stable/mc

Configure mc

mc alias set local http://localhost:9000 minioadmin123 minioadmin1234

Example: list projects (buckets)

mc ls local

MinIO web console:

More on mc: https://github.com/minio/mc?tab=readme-ov-file


Cloud functionality

Export the following environment variables.

Public cloud bucket (read/write)

export MINIO_CLOUD_AK=<Key here>   # access key
export MINIO_CLOUD_SK=<Key here>   # secret key

Private cloud bucket (read/write)

export MINIO_PRIV_CLOUD_AK=<Key here>   # access key
export MINIO_PRIV_CLOUD_SK=<Key here>   # secret key

How Isaura works

An overview mechanism onto how Isaura operates to store and fetch calculations (more details here How Isaura Works.). It relies on four main different but related services.

Storage & query engines (summary)

  • DuckDB: used as the write engine (ingestion into Parquet chunks) and query engine (exact retrieval from Parquet).

  • MinIO: object storage backend for Parquet data + indexes.

  • Milvus: used only for approximate lookup (top-1 nearest input).

  • NNS: Go-based REST API server for high-performance Milvus ingest/query: NN-Search API

DuckDB object paths (β€œDuckDB URL”)

Isaura organizes artifacts in MinIO and DuckDB reads/writes those objects through S3-compatible access.

Logical paths:

  • Parquet data:

    • s3://<bucket>/<eos_id>/<version>/data/chunk_*.parquet

  • Bloom filter:

    • s3://<bucket>/<eos_id>/<version>/bloom.pkl

  • Access metadata:

    • s3://<bucket>/<eos_id>/<version>/access.json

With MinIO, DuckDB is configured with an S3 endpoint (e.g. http://localhost:9000) and credentials; the object keys remain under s3://....


Projects (buckets) and access

Isaura stores calculations in projects (MinIO buckets).

Default projects:

  • isaura-public

  • isaura-private

Folder structure

Example:

  • Each chunk_{idx}.parquet contains up to 2,000,000 rows (2M max) for DuckDB performance (row grouping / scanning efficiency).

  • bloom.pkl enables fast membership checks (β€œdoes this input exist?”).

  • access.json stores inputs and their access classification (public / private).

Copying from custom projects into defaults

When you copy calculations for a given project + model + version:

  1. Isaura reads <custom_project>/<eos_id>/<version>/access.json

  2. It routes each input/output into:

    • isaura-public if access is public

    • isaura-private if access is private

  3. Bloom filter(s) are updated accordingly.

  4. Inputs are registered into Milvus at copy time (so they become available for approximate search).

Important: Milvus registration happens when copying from a custom project to the default projects, not necessarily at initial write into the custom project.


Approximate search (Milvus)

Approximate search is enabled when you request ANN/nearest-neighbor behavior.

Current behavior:

  • For each query input, Milvus returns the top-1 most similar stored input

  • Similarity metric: Jaccard similarity

  • Current input type supported for ANN: CPD inputs

  • Representation: 1024-bit Morgan fingerprints

  • Collection name:

    • {ersilia_eos_id}_{version}

Milvus stores input representations used for matching (not full model outputs). After the nearest stored input is found, Isaura fetches the corresponding cached outputs via DuckDB + MinIO and returns results in Ersilia output format.


Commands at a glance

Buckets are MinIO projects (storage directories) that hold model calculations.

Command
Alias
Required options
Optional options
What it does

write

β€”

-i/--input-file, -m/--model

-pn/--project-name, `--access [public

private

read

β€”

-i/--input-file, -m/--model

-pn/--project-name, --access, -v/--version, -o/--output-file, -nn

Read/download results for inputs in a CSV and optionally save as CSV/HDF5. Use -nn for approximate search (ANN).

copy

cp

-m/--model, -v/--version, -pn/--project-name, -o/--output-dir

β€”

Copy all artifacts for a model/version from a project to a local directory. If -o is omitted, logs counts; with -o it writes files.

move

mv

-m/--model, -v/--version, -pn/--project-name

β€”

Move/relocate server-side artifacts for a model/version within the project space.

remove

rm

-m/--model, -v/--version, -pn/--project-name, -y/--yes

β€”

Permanently delete artifacts for a model/version from a project. Safety-guarded by --yes.

inspect

β€”

-m/--model, -v/--version, -o/--output-file

-pn/--project-name, --access, -i/--input-file, --cloud

Inspect available items or validate inputs. With -i, validates inputs and writes a report; without -i, lists available entries.

catalog

β€”

-pn/--project-name

--cloud

List models present in a project (bucket).


Brief CLI usage examples

Example
Command
Description

🧾 Write results calculation

isaura write -i data/ersilia_output.csv -m eos8a4x -v v2 -pn myproject --access public

Upload/write outputs (input column must be input) for a model + version using a CSV as input.

πŸ“₯ Read results (exact)

isaura read -i data/inputs.csv -m eos8a4x -v v2 -pn myproject -o data/outputs.csv

Read results for inputs and save to an output CSV file.

πŸ” Read results (approximate)

isaura read -i data/inputs.csv -m eos8a4x -v v2 -pn myproject -o data/outputs.csv -nn

Fetch results using approximate search (Milvus top-1 similar input).

πŸ“‚ Copy buckets

isaura copy -m eos8a4x -v v1 -pn myproject-private -o ~/Documents/files/

Copy all model artifacts from a project to a local directory.

🚚 Move buckets

isaura move -m eos9876 -v v1 -pn myproject-private

Move or relocate artifacts for a model/version within the project.

πŸ—‘οΈ Remove buckets

isaura remove -m eos8a4x -v v1 -pn myproject-private --yes

Permanently delete artifacts for a model/version from a project.

πŸ” Inspect inputs (validate)

isaura inspect inputs -m eos8a4x -v v1 -pn myproject -i data/inputs.csv -o reports/inspect_report.csv

Validate input data and output a report.

πŸ“‹ List available model results

isaura inspect -m eos8a4x -v v1 -o reports/available.csv

List all available inputs/files for a model/version.

πŸ“š Catalog project models

isaura catalog -pn myproject

Display all models within a project.


API usage examples

Last updated

Was this helpful?