Skip to main content

Evaluation library for latent representations

Project description

Latentverse: Evaluation library for latent representations

Latentverse is a library for evaluating the quality and reliability of latent representations. In includes a variety of evaluation tests to measure the following properties of latent representations:

  • Clusterability: How well the representations form distinct clusters
  • Predictability: The ability to use representations for downstream prediction tasks
  • Disentanglement: The extent to which latent dimensions capture independent factors of variation
  • Robustness: The resilience of representations under perturbations
  • Expressiveness: How well latent representations capture relevant information

Installation

Latentverse is available on PyPI. You can install it using:

pip install ml4h-latentverse

Alternatively, if you are developing or modifying the package, clone the repository and install it in editable mode:

git clone https://github.com/broadinstitute/ml4h-latentverse.git
cd ml4h-latentverse
pip install -e .

Setting up the environment

It is recommended to use a virtual environment:

python -m venv venv
source venv/bin/activate  # for MacOS/Linux
venv\Scripts\activate  # for Windows

Then, install the required dependencies:

pip install -r requirements.txt

Evaluating representations

1. Clusterability Test

This test evaluates how well representations cluster given a specified number of clusters. The test computes metrics such as Normalized Mutual Information (NMI), Silhouette Score, Davies-Bouldin Index, and Cluster Learnability.

Example usage below:

from ml4h_latentverse.tests.clustering import run_clustering

representations = ...  # Load or generate your latent representations
labels = ...  # Corresponding labels for evaluation

results = run_clustering(representations=representations, labels=labels, num_clusters=2, plots=True)
print(results)

Expected inputs:

  • representations (ndarray): Feature representations for clustering
  • num_clusters (int, optional): Number of clusters (ignored if labels are provided)
  • labels (ndarray, optional): True labels (used for evaluation)
  • plots (bool, optional): Whether to generate clustering visualization

2. Disentanglement Test

This test measures how well the latent dimensions capture independent factors of variation. It computes metrics such as DCI Disentanglement, Mutual Information Gap (MIG), Total Correlation (TC), and SAP Score.

Example usage:

from ml4h_latentverse.tests.disentanglement import run_disentanglement

data, labels = ...  # Load or generate latent representations and labels
results = run_disentanglement(data, labels)
print(results)

Expected inputs:

  • representations: A (N, D) array of latent space representation
  • labels: A (N,) array of ground truth labels

3. Expressiveness Test

This test evaluates how much information the representations contain about labels. It assesses the impact of removing highly correlated features on prediction performance using AUC or R² scores.

Example usage:

from ml4h_latentverse.tests.expressiveness import run_expressiveness

data, labels = ...  # Load or generate data
results = run_expressiveness(data, labels, percent_to_remove_list=[0, 10, 20, 50], plots=True)
print(results)

Expected inputs:

  • representations: (N, D) array of feature representations
  • labels: (N, P) array of target labels (P phenotypes)
  • folds: Number of cross-validation folds
  • train_ratio: Ratio of data used for training
  • percent_to_remove_list: List of percentages of highly correlated dimensions to remove
  • verbose: If True, prints training details
  • plots: If True, generates and saves a performance plot

4. Robustness Test

This test examines how well representations withstand perturbations by introducing Gaussian noise and measuring clustering or probing performance.

Example usage:

from ml4h_latentverse.tests.robustness import run_robustness

data, labels = ...  # Load or generate data
results = run_robustness(data, labels, noise_levels=[0.1, 0.5, 1.0, 1.5], metric="clustering", plots=True)
print(results)

Expected inputs:

  • representations: (N, D) matrix (or DataFrame) of latent representations
  • labels: (N,) array (or DataFrame) of target labels
  • noise_levels: List of noise magnitudes to apply
  • metric: "clustering" or "probing"
  • plots: If True, generate a performance plot

5. Probing Test

This test measures representation quality by training classifiers or regressors of varying complexity.

Example usage:

from ml4h_latentverse.tests.probing import run_probing

data, labels = ...  # Load or generate data
results = run_probing(data, labels)
print(results)

Expected inputs:

  • representations (ndarray or DataFrame): Feature representations
  • labels (ndarray or DataFrame): Labels for probing
  • train_ratio (float): Ratio of train to test data

Testing the Evaluation Suite

To validate the test suite, you can use the provided test cases:

Example usage:

from ml4h_latentverse.tests.test import test_clusterability, test_disentanglement, test_expressiveness, test_robustness

test_clusterability()
test_disentanglement()
test_expressiveness()
test_robustness()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ml4h_latentverse-0.1.2.tar.gz (26.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ml4h_latentverse-0.1.2-py3-none-any.whl (34.2 kB view details)

Uploaded Python 3

File details

Details for the file ml4h_latentverse-0.1.2.tar.gz.

File metadata

  • Download URL: ml4h_latentverse-0.1.2.tar.gz
  • Upload date:
  • Size: 26.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.18

File hashes

Hashes for ml4h_latentverse-0.1.2.tar.gz
Algorithm Hash digest
SHA256 7b1df2389684103da0664efe0b48e3753fe1c7164d566b06f47c44a9667fedd3
MD5 8809154216174ef8b386bd10367e44e0
BLAKE2b-256 c8bf1a4455f23783488ea226a1c4fd7e112d361e6f5f50f73641e3dfc80f0d97

See more details on using hashes here.

File details

Details for the file ml4h_latentverse-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for ml4h_latentverse-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2580d90d14713c0d730ee2e29b3f6c673bdc4ed1658302e448b0194ed4858818
MD5 cba58c42864b11dec3b7011bd703bc08
BLAKE2b-256 5ea837791763ec8f405e395ce3b072edba6ab58c47b38be1b4cffe3d841b836d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page