Evaluation library for latent representations

Project description

Latentverse: Evaluation library for latent representations

Latentverse is a library for evaluating the quality and reliability of latent representations. In includes a variety of evaluation tests to measure the following properties of latent representations:

Clusterability: How well the representations form distinct clusters
Predictability: The ability to use representations for downstream prediction tasks
Disentanglement: The extent to which latent dimensions capture independent factors of variation
Robustness: The resilience of representations under perturbations
Expressiveness: How well latent representations capture relevant information

Installation

Latentverse is available on PyPI. You can install it using:

pip install ml4h-latentverse

Alternatively, if you are developing or modifying the package, clone the repository and install it in editable mode:

git clone https://github.com/broadinstitute/ml4h-latentverse.git
cd ml4h-latentverse
pip install -e .

Setting up the environment

It is recommended to use a virtual environment:

python -m venv venv
source venv/bin/activate  # for MacOS/Linux
venv\Scripts\activate  # for Windows

Then, install the required dependencies:

pip install -r requirements.txt

Evaluating representations

1. Clusterability Test

This test evaluates how well representations cluster given a specified number of clusters. The test computes metrics such as Normalized Mutual Information (NMI), Silhouette Score, Davies-Bouldin Index, and Cluster Learnability.

Example usage below:

from ml4h_latentverse.tests.clustering import run_clustering

representations = ...  # Load or generate your latent representations
labels = ...  # Corresponding labels for evaluation

results = run_clustering(representations=representations, labels=labels, num_clusters=2, plots=True)
print(results)

Expected inputs:

representations (ndarray): Feature representations for clustering
num_clusters (int, optional): Number of clusters (ignored if labels are provided)
labels (ndarray, optional): True labels (used for evaluation)
plots (bool, optional): Whether to generate clustering visualization

2. Disentanglement Test

This test measures how well the latent dimensions capture independent factors of variation. It computes metrics such as DCI Disentanglement, Mutual Information Gap (MIG), Total Correlation (TC), and SAP Score.

Example usage:

from ml4h_latentverse.tests.disentanglement import run_disentanglement

data, labels = ...  # Load or generate latent representations and labels
results = run_disentanglement(data, labels)
print(results)

Expected inputs:

representations: A (N, D) array of latent space representation
labels: A (N,) array of ground truth labels

3. Expressiveness Test

This test evaluates how much information the representations contain about labels. It assesses the impact of removing highly correlated features on prediction performance using AUC or R² scores.

Example usage:

from ml4h_latentverse.tests.expressiveness import run_expressiveness

data, labels = ...  # Load or generate data
results = run_expressiveness(data, labels, percent_to_remove_list=[0, 10, 20, 50], plots=True)
print(results)

Expected inputs:

representations: (N, D) array of feature representations
labels: (N, P) array of target labels (P phenotypes)
folds: Number of cross-validation folds
train_ratio: Ratio of data used for training
percent_to_remove_list: List of percentages of highly correlated dimensions to remove
verbose: If True, prints training details
plots: If True, generates and saves a performance plot

4. Robustness Test

This test examines how well representations withstand perturbations by introducing Gaussian noise and measuring clustering or probing performance.

Example usage:

from ml4h_latentverse.tests.robustness import run_robustness

data, labels = ...  # Load or generate data
results = run_robustness(data, labels, noise_levels=[0.1, 0.5, 1.0, 1.5], metric="clustering", plots=True)
print(results)

Expected inputs:

representations: (N, D) matrix (or DataFrame) of latent representations
labels: (N,) array (or DataFrame) of target labels
noise_levels: List of noise magnitudes to apply
metric: "clustering" or "probing"
plots: If True, generate a performance plot

5. Probing Test

This test measures representation quality by training classifiers or regressors of varying complexity.

Example usage:

from ml4h_latentverse.tests.probing import run_probing

data, labels = ...  # Load or generate data
results = run_probing(data, labels)
print(results)

Expected inputs:

representations (ndarray or DataFrame): Feature representations
labels (ndarray or DataFrame): Labels for probing
train_ratio (float): Ratio of train to test data

Testing the Evaluation Suite

To validate the test suite, you can use the provided test cases:

Example usage:

from ml4h_latentverse.tests.test import test_clusterability, test_disentanglement, test_expressiveness, test_robustness

test_clusterability()
test_disentanglement()
test_expressiveness()
test_robustness()

Project details

Release history Release notifications | RSS feed

This version

0.1.2

Mar 31, 2025

0.1.1

Mar 31, 2025

0.1.0

Mar 31, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ml4h_latentverse-0.1.2.tar.gz (26.3 kB view details)

Uploaded Mar 31, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ml4h_latentverse-0.1.2-py3-none-any.whl (34.2 kB view details)

Uploaded Mar 31, 2025 Python 3

File details

Details for the file ml4h_latentverse-0.1.2.tar.gz.

File metadata

Download URL: ml4h_latentverse-0.1.2.tar.gz
Upload date: Mar 31, 2025
Size: 26.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.18

File hashes

Hashes for ml4h_latentverse-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`7b1df2389684103da0664efe0b48e3753fe1c7164d566b06f47c44a9667fedd3`
MD5	`8809154216174ef8b386bd10367e44e0`
BLAKE2b-256	`c8bf1a4455f23783488ea226a1c4fd7e112d361e6f5f50f73641e3dfc80f0d97`

See more details on using hashes here.

File details

Details for the file ml4h_latentverse-0.1.2-py3-none-any.whl.

File metadata

Download URL: ml4h_latentverse-0.1.2-py3-none-any.whl
Upload date: Mar 31, 2025
Size: 34.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.18

File hashes

Hashes for ml4h_latentverse-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2580d90d14713c0d730ee2e29b3f6c673bdc4ed1658302e448b0194ed4858818`
MD5	`cba58c42864b11dec3b7011bd703bc08`
BLAKE2b-256	`5ea837791763ec8f405e395ce3b072edba6ab58c47b38be1b4cffe3d841b836d`

See more details on using hashes here.

ml4h-latentverse 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Latentverse: Evaluation library for latent representations

Installation

Setting up the environment

Evaluating representations

1. Clusterability Test

2. Disentanglement Test

3. Expressiveness Test

4. Robustness Test

5. Probing Test

Testing the Evaluation Suite

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes